Grounding media in the human body in the age of AI
Q&A with Visar Berisha discusses why authenticated proof of human presence in media can restore trust

With the acceleration of generative artificial intelligence, or Gen AI, deceptive audio and visual media, often referred to as “deepfakes,” are becoming increasingly more advanced and prevalent across media. With the ability to scam, spread false claims or influence public opinion, AI-generated content threatens to further erode trust in communication and media.
Visar Berisha, an associate dean of research and commercialization and professor in the School of Electrical, Computer and Energy Engineering, part of the Ira A. Fulton Schools of Engineering at Arizona State University, is working to authenticate media using the human body’s natural signals.
Berisha, who also holds a joint appointment in the university’s College of Health Solutions, developed OriginStory with an interdisciplinary team of ASU researchers to establish a chain of trust that ensures a voice is authentically human from the moment of recording to when it is listened to.
Isabella Lenz, an alum and postdoctoral researcher, developed the initial hardware prototype and the data-collection software used to gather a dataset of about 100 subjects, including ASU students, staff, faculty and residents of Mirabella, a senior living community on ASU’s Tempe campus. This dataset was then used to build the foundational algorithms that let researchers verify a live human is producing the speech.
For Lenz, the project was not just about applying her expertise, but also an opportunity to address a growing problem that also hit close to home.
“AI is making impersonation attempts incredibly convincing and digital scams are becoming more common,” she says. “I have seen the impact personally because my grandparents lost a large sum of money to someone pretending to be my cousin. Experiences like that made the work feel meaningful, and being able to contribute to a technology that could help protect people is something I care about deeply.”
In this Q&A, Berisha discusses the increasing importance of grounding media authentication in the human body, why detecting fakes in potentially synthetic media is problematic and how authenticated human presence in communication can impact the media we consume.
Question: How has the acceleration of Gen AI across the media landscape amplified the importance of your work?
Answer: Generative AI has made it very challenging to distinguish between human and machine-generated content by eye or ear alone. AI systems can now convincingly mimic voices and faces, pass CAPTCHAs and even hold real-time conversations that can pass as human. This has created a fundamental trust gap in digital communication. We literally don’t know who or what we’re talking to online. Our work aims to address this problem at its root by developing a way to verify, in real time, that there is a live human on the other end of a communications channel.
Question: Can you provide an example of current authentication efforts and why these methods addressing potentially fake content post-hoc are ineffective or problematic?
Answer: Right now, it mostly falls on our own eyes and ears to tell whether something online is real, and recent studies show that we can’t reliably do that anymore. As a result, the community has turned to AI tools designed to detect other AI-generated content. These algorithms analyze a video or recording after it’s been created and try to decide whether it’s real or synthetic. But this has become a constant cat-and-mouse game because of how quickly generative AI is improving.
As a concrete example, just a few months ago, there was a video making the rounds of someone interviewing for a job using a face-swapping AI filter and posing as someone else. The video went viral after an interviewer exposed the fake by asking the guest to wave their hand across their face. The AI glitched because it couldn’t handle the occlusion and revealed the true face. Two months later, that same detection trick no longer worked because the next generation of AI models had already fixed the flaw.
That viral moment wasn’t about an AI detector; it was a human one, but it illustrates the problem: detection tricks that work today will fail tomorrow. What makes this challenge unique is the unprecedented pace of technological progress. Generative models are evolving faster than our ability to develop and deploy countermeasures. As a result, AI-based detectors will always carry vulnerabilities that can be exploited.
Question: If AI can outpace every new detector, what’s the next step?
Our approach is fundamentally different. Instead of trying to catch what’s fake after the fact, we verify what’s real at the moment of recording, using physical signals from the human body that AI can’t easily replicate. The system integrates a radar sensor and a microphone to measure articulatory motion, vocal-fold vibration, heartbeat and respiration, all synchronized with the speaker’s voice. By matching these physiological signals to the audio signal, it confirms that a live person is speaking in real time.
This means that authenticity no longer depends on guessing whether a recording “looks” or “sounds” real. Instead, it’s grounded in direct evidence of a living, breathing human, producing the signal. Because verification happens locally and continuously, it can’t be easily transferred, replayed or faked at scale by an AI system. It also preserves privacy since no raw biometric data ever leaves the device. In essence, we’ve built a tool that uses the human speech-production mechanism to provide proof of personhood and to bind every word to the body that produced it at the exact moment of communication.
Question: You previously worked with other ASU faculty members on the development of the instrument OriginStory. Can you tell us a little about this hardware-software system and has this tool or related research evolved since its initial introduction?
Answer: This project grew out of interdisciplinary collaborations at ASU that combined expertise in AI, speech science and radar signal processing. Isabella Lenz, a former doctoral student advised by electrical engineering Professor Dan Bliss, had been working on radar for vital-sign monitoring. She took an interest in the proof-of-personhood concept, and I began co-advising her during her PhD as she developed the foundational algorithms for what became the OriginStory technology.
I have been collaborating on speech AI with Julie Liss, a professor in the College of Health Solutions who is an expert on the production and perception of speech, for over a decade. The way humans produce speech is deeply tied to the body: a coordinated act of motion, vibration and breath that’s uniquely human. As AI began to blur the boundaries of media authenticity, we realized that these same physiological signals could serve as a kind of fingerprint of personhood — and that’s exactly what the OriginStory prototype senses to provide proof of a live human for communications.
The prototype, developed in our lab, has already shown strong performance in controlled settings. We’re now focused on making the system smaller, enabling real-time operation, and expanding its use beyond the lab into practical contexts where verifying human presence really matters.
Question: Why is it important to address authentication using an interdisciplinary approach?
Answer: Knowing that a real person is speaking matters wherever communications can trigger action or decision. Today, that includes executive meetings, financial approvals and telemedicine consultations — places where a convincing voice clone could lead to real harm. But as AI systems become more autonomous, the implications grow much larger. Voice will increasingly serve as the interface for controlling intelligent AI agents, operating machines and accessing systems that can make or execute decisions on their own.
In that future, proof of personhood becomes a form of digital governance. It establishes ownership over AI agents by ensuring that only verified humans can issue commands, authorize actions or override machine behavior. It also creates a natural system of checks and balances between human and artificial decision-making by embedding accountability directly into how we communicate with technology. Our work provides the technical foundation for that: a way to ensure that the voice guiding an intelligent system truly comes from an authorized living human being.
Question: Are there other research, innovations or tools in the media authentication space aimed at addressing this issue?
Answer: There’s exciting work happening on watermarking and cryptographic signatures, as well as proof-of-personhood systems like World that verify identity once through a biometric enrollment. However, these are one-time checks and cannot confirm whether a real person is continuously present in a live conversation.
Our technology complements those tools by offering continuous, on-device verification of human presence during a conversation. I think a complete solution to this problem will require a comprehensive security stack that combines many of these technologies together, spanning from content provenance to real-time personhood detection.
Question: What does deployment of OriginStory look like?
Answer: From a research standpoint, we’re studying how to integrate this technology into everyday communication environments: video calls, mobile devices or enterprise headsets. The first prototypes are standalone units that pair with conferencing platforms like Zoom. Over time, we envision embedding the sensing and verification algorithms directly into devices we already use for work and communication, with all processing happening locally to preserve privacy.
Question: How do you think authentication grounded in the human body will affect media and journalism?
Answer: In journalism, where authenticity is very important, having verifiable proof that a voice or interview came from a live human is as fundamental as citing a source. It can help news organizations, regulators and the public distinguish between verified human reporting and synthetic impersonations, hopefully restoring confidence in human-authored media.
Question: How will your research and other authentication methods need to evolve over time to combat new faking techniques?
Answer: One of the benefits of our approach is that it doesn’t depend on keeping up with improvements in generative AI. Instead of trying to outpace deepfake models, our system is grounded in the physical world — in the physiological signals that only a live human body can produce. As generative models become more capable, the focus will continue to shift from detecting digital artifacts to confirming the causal origin of human-generated media — in other words, the real, time-synchronized evidence that the words were spoken by a living person.
The ongoing challenge is to make these systems seamless, privacy-preserving and widely accessible, so that authentication becomes a natural part of communication rather than an obstacle to it.

