Issue 87 - Is that Really You?
Issue 87, is that really you? Welcome to 500 Words. This is interrogating AI. How can you be sure I'm me and you're you? I'm asking because a person's vocal identity is hard to identify online even if you know their voice well.
Lee Schneider:Synthetic speech generation has gotten so good it's close to impossible to tag a voice as real just by listening to it. Earlier this year, New Hampshire voters received a robo call that sounded like Joe Biden. Biden's voice told them to save your vote for November, encouraging them not to vote in the presidential primary. Doesn't sound like something that Joe would say and it wasn't. It was a deep fake of his voice.
Lee Schneider:Deep fakes may sound like pranks, but the Biden call shows that things can get serious. We need a deep fake detection system that can validate voices. I posed my technical questions about this to the folks at Pindrop, a group working to develop the future of voice authentication. Here's the breakdown of what they said. Money is easy to validate but not your voice.
Lee Schneider:If you have a $20 bill in your wallet pull it out and take a look. You'll see color shifting ink and if you hold it up to the light a watermark. This is to guard against counterfeiting. Some researchers have proposed something similar for voices. Every voice recording online could carry a digital watermark.
Lee Schneider:Scan the recording, detect the watermark, and you'd know whether the voice is authentic. But speech occupies a fairly narrow set of audio wavelengths. It's a lot less spectrally rich than music for example, so there's less space to hide a watermark without detection. And even if you were able to surreptitiously watermark all the words on the web Everyone who posted voice recordings would have to cooperate with your watermarking effort. It's pretty easy to get cooperation when watermarking paper money.
Lee Schneider:Only the Bureau of Engraving and Printing, part of the US Treasury Department, is allowed to print it and the Federal Reserve decides how much to print. Tight controls there. No way that's happening online. The only known way to test questionable audio is to subject it to deep analysis. Pindrop analyzed the fake Biden robocall.
Lee Schneider:They filtered out any sound that wasn't speech, broke the audio into a 155 sections of 250 milliseconds each, and then scored the sections for authenticity. Turns out that words with fricatives, a type of consonant sound produced by forcing air through a narrow space in the vocal tract, can be spectral identifiers for deepfakes. Examples are words like preference or difference. Pindrop's researchers also checked the deep fake call for phrases that Joe had said before and found that some appeared to be new. Not only were they able to confirm that it was not Joe on the recording, but since they are building a database of what they call fake prints, they were able to make a good guess at what app was used to make the deep fake.
Lee Schneider:In this case, they believe the app was made by 11 Labs. I've written about 11 Labs before in 500 words. They are partnering with HarperCollins to record audio books in different languages. I bring that up to point out that they are not a bad company, but they can't control who uses their product and there are plenty of trolls out there willing to clone voices that we need to trust. I really really did this recording.
Lee Schneider:I did an interview with Marco Ciappelli for his audio signals podcast. We talked about foundations for storytelling, writing trilogies, how to write dystopia without getting in a bad mood, and the time I read a $1,000 worth of paperbacks one summer when I was a kid. This summer my youngest son has probably read as many books as I did back then, but luckily for us we mostly get them from the library. If you like what you're hearing about my writing, buy my book. It's available on Amazon and bookshop.org or you can ask your local bookshop to order it for you.
Lee Schneider:Thanks for reading and listening. Lee signing off.