Pindrop Security Raises $100 Million to Expand Deepfake Detection Technology – SecurityWeek

4 minutes, 57 seconds Read

Pindrop Security, a specialist voice fraud detection firm, has raised $100 million in debt financing from Hercules Capital. The funds will further develop Pindrop’s AI-assisted detection of AI-generated deepfakes.

This new debt financing – effectively a loan – is additional to but separate from the $218.3 million that Pindrop has raised in venture equity funding since it was founded in 2011. The implication is that the firm is now sufficiently stable to support guaranteed repayments and interest fees without needing to trade equity for growth funding.

The additional funds will be used to further the development of new tools to counter the expanding threat of AI-generated voice deepfakes following the arrival of ChatGPT and the ready availability of gen-AI from November 2022. Voice deepfakes existed before then, but the emergence of gen-AI in 2023 and 2024 has dramatically changed the landscape, demanding new deepfake voice detection capabilities.

“Last year (2023) we were seeing around one attack per month. For the last four months we’ve seen at least one attack every day for every customer, and four separate gen-AI based deepfake voice attacks every day for some of our customers,” explains Vijay Balasubramaniyan, CEO and co-founder at Pindrop. This is the anticipated effect of the malicious use of gen-AI: an increase in both the scale and sophistication of existing attacks. This new threat demands new processes and techniques to detect the increased sophistication of fake voice generation.

Better quality deepfakes defeat our first line of defense: suspicion. Then, once our guard is down, we begin to believe the voice is genuine. This is just how the human brain works: we see what we believe we see, we read what we expect to read, and we hear what we expect to hear.

There are hundreds of audible clues that an expected voice is fake, but the human ear doesn’t detect them nor does the brain register them. This is why we need specialist systems to guard against the new wave of deepfake voice attacks, through the ability to determine whether a voice is generated by a human or a machine.

We also need to counter the growing scale of deepfake voice, so we need rapid detection. “When we started detecting deepfakes,” explains Balasubramaniyan, “there was just one tool that would take 20 hours to generate one fake voice. Now there are many different tools that take just three to five seconds – or 15 seconds if you want a high quality voice clone. So, a TikTok video is all that is necessary to generate a fake voice able to fool grandparents into ‘recognizing’ that their children or grandchildren are at the other end, and in need of assistance.”

The speed and ease of production, and the quality of the output, allows attackers to automate large scale attacks. “There are instances where all the senior citizens in a county have been hit with faked voices from relatives asking for help [usually money] in a single Friday to Sunday period.” After that, the attackers repeat the process in a different county.

Advertisement. Scroll to continue reading.

Targeting corporate executives – or using deepfake messages apparently from them – is just as easy. It would just need a recorded telephone message, or for better quality, a recorded presentation, to create the deepfake.

Surprisingly, however, detection of deepfake voices, if not easy, is more than possible. The audible clues can be detected. Balasubramaniyan explained: “There are 8,000 samples of your voice every single second even in the lowest fidelity audio channel, which is telephony. So, there are 8,000 times for a machine to make a mistake every single second. And that’s what we look for.”

He gave an example: the way a human voice generates fricatives. Fricatives are consonant sounds created by air, the shape of the mouth and the positioning of the tongue to create a sound. Humans have developed this ability over thousands of years of evolution. Machines are not so good; and mistakes can be found within the 8,000 samples of voice in every second of speech. Pindrop’s tools can detect such errors.

He gave another example, based on timing. “When you say, ‘Hello Paul’, your mouth, your oral cavity and your nasal cavities are all wide open when you say the ‘o’ in ‘hello’. When you say, ‘Paul’, they close down. There’s a certain speed with which you can do that. Machines don’t care about that speed – you care because you have physical limitations. Detailed analysis of these characteristics can lead us to the conclusion: ‘There is no way this audio could have been produced by a real human, because it’s going through these configurations in too rapid a fashion.’”

There’s a deepfake that his firm detects and calls ‘giraffe man’. The voice sounds fine to the human ear, but Pindrop’s analysis concludes it could only be produced by someone ‘with a 12 feet long neck with vocal cords thrashing around in those 12 feet’.

Using such detection techniques, Balasubramaniyan claims he can catch 99% of all deepfake voices. That puts AI-assisted detection ahead of AI-assisted aggression – and he is confident that this will continue with continued research and development. The primary purpose is, after all, the same as all security: simply to make the malicious activity too difficult or expensive for the attackers to continue. If you make the environment hard, the attackers will move elsewhere.

There is an interesting side effect of the firm’s success in detecting deepfake voice. Deepfake image and video gets the lion share of media attention, and Pindrop is doing its own research and development into deepfake video. But video is meaningless for fraud without an associated voice. Being able to detect the fake voice automatically flags the video as most likely a deepfake.

Related: Deepfake of Principal’s Voice Is the Latest Case of AI Being Used for Harm

Related: LastPass Employee Targeted With Deepfake Calls

Related: Bipartisan Bill Would Require Online Identification, Labeling of AI-Generated Videos and Audio

Related: Defeating the Deepfake Danger

This post was originally published on the 3rd party site mentioned in the title of this this site

Similar Posts