OU researchers create technology to spot fake audio
“OK Google, take my midterms for me, please.”
Khalid Mahmood Malik, Ph.D., an associate professor in the department of computer science and engineering, is currently working with research students to pinpoint original voices from recorded or artificial audio.
Working since 2017, the group has created a mission-learning model to detect and differentiate whether or not a voice is original or modified, and how many times it has been replayed.
For example, as Roland Baumann, student in the cyber security program, points out, someone could potentially play a recorded voice over another person’s webcam — if they are able to access it — straight to a Google Home. Thus, the recording is played multiple times from one device to another.
How would someone be able to do this?
Well, all of these devices are connected through Wi-Fi, which is nothing new to hacking, according to Malik. Wi-Fi also has a weak spot in its security, stated by Consumer Reports in 2017 in which researchers had discovered its flaw.
Malik and company’s research has led to uncover that devices such as Google Home and Amazon Alexa have vulnerabilities as well.
“[These devices] are not able to understand if it is a recorded voice or original, artificially generated voice or authentic,” Malik said.
The focus of these devices is usability for consumers. Baumann first expressed interest into Malik’s research due to the fact that “home devices are so new that the security ramifications still haven’t been fully considered.”
In fact, experiments by Malik and his students has found that, Google Home specifically, uses speaker verification when a person says either “OK Google” or “Hey Google,” but then does not authenticate the rest of the command. Baumann played a male’s voice, already previously linked to the device, to say “OK Google” and then used a female’s voice to complete the command, which was approved and responded to.
Malik also said that voice cloning has been maturing in advances in the last few years, including software such as iSpeech and Lyrebird. The more data feed to a computer, the easier it is for it to recognize a pattern and clone a voice, although this is changing.
“At the moment, it is not a challenge to create a cloned voice,” Malik said. “The challenge is how to use less data to train your machine to generate [a cloned voice].”
Why does this all matter?
While they may appear harmless, voice assistants can do more than play the latest Billie Eilish song. They can also control your thermostat, open and close your garage, and even manage your home security.
Last month, artificial intelligence (AI) voice technology was used to scam a CEO into a fraudulent transfer of €220,000, or $243,000. The CEO was convinced he was speaking to his boss, but was deceived by AI voice technology, according to The Wall Street Journal.
“We’re not trying to scare people, but to educate them,” Malik said.
Malik pointed out that their research and model are for educational purposes, rather than commercial, but they did apply for a patent for the specific feature created by them. With all of the advancements in voice cloning, they still hope to improve and update their solutions further.
The researchers are currently preparing for a principal investigator (PI) meeting with the National Science Foundation (NSF).
“Our goal is not commercialization, but to focus on good research problems and finding appropriate solutions and publishing them while educating and training students,” Malik said.