AI Voice Cloning Threats and How to Protect Against Them

[Post 24 – 30 in 30]

Voice cloning technology, an emerging risk to organizations, represents an evolution in the convergence of artificial intelligence (AI) threats. This technology is currently being abused by threat actors in the wild, capable of defeating voice-based multi-factor authentication (MFA), enabling the spread of misinformation and disinformation, and increasing the effectiveness of social engineering.

Recorded Future released a great threat analysis document that has a deeper dive on this. I recommend having a look, if you are interested.


  1. Banking Fraud: Voice cloning can be used to trick voice-based MFA software used by banks, providing access to account information, including balances and a list of recent transactions and transfers.
  2. Disinformation: Voice cloning technology can be used to spread disinformation by creating realistic audio recordings of public figures appearing to say things they never actually said.
  3. Executive Impersonation: Conversations among threat actors often reference executive impersonation, callback scams, voice phishing (“vishing”), and other attacks that rely on the human voice.

Voice Cloning as a Service and Tools Used

Voice cloning as a service has become a reality with advanced AI models capable of mimicking human voices with high accuracy. Several tools and services that allow for voice cloning are:

  1. Respeecher: This service requires a voice actor to read text, which is then modified to sound like the target voice. This was used by researchers at MIT to generate a fake video of Richard Nixon announcing the failure of the Apollo 11 Moon landing. The cost for this service can range from four to six digits in USD, and projects usually take several weeks​​.
  1. This is a text-to-speech service that requires training on the target voice. Once trained, it can generate realistic human speech from written text in a few minutes. This service can be used free of charge, with commercial plans starting at $39 per month​​.
  1. Descript: This tool started as an automatic transcription service. It acquired a voice-cloning startup called Lyrebird and integrated its technology. This allows users to add words to a transcript and have Descript generate realistic audio of the target voice, a feature called Overdub. It also enhances audio in other ways, such as improving the quality of the recorded sound or adding subtle background noise to match the tone of the surrounding audio​.

Protection Strategies

1. Implement Real-Time Voice Analysis Software:

This software can detect anomalies in voice recordings and distinguish between real and cloned voices.

Companies and organizations that specialize in voice biometrics and speaker recognition technology include:

  • Voice Biometrics Group (VBG): VBG provides a suite of voice biometric identification technologies. Again, while not explicitly stated, their technology could theoretically be applied to the problem of distinguishing real vs. cloned voices.
  • Verint Systems: Verint offers a voice biometrics solution that is used to prevent fraud in call centers by identifying unique voiceprints.

2. Use Anti-Spoofing Technology:

Technologies such as “liveness” detection can prevent fraudulent actors from using pre-recorded or synthetic voices to impersonate customers.

Here are a few companies that offer solutions in this space:

  • Pindrop: Pindrop is a leader in voice security and authentication. They provide anti-fraud technology for call centers that includes voice biometrics and “Phoneprinting”, which analyzes over 1,300 features of a call’s audio to detect fraudulent activity.
  • NICE: NICE’s Real-Time Authentication (RTA) solution includes voice biometrics and other technologies designed to authenticate callers and detect fraud in real-time.
  • Aculab: Aculab’s VoiSentry system provides voice biometric authentication with anti-spoofing features. It’s designed to detect and prevent voice-based fraud attempts.

3. Implement Biometric Authentication with Multiple Modalities:

This can provide an extra layer of security against voice cloning attacks.

Here are a few examples:

  • NEC: NEC offers various biometric solutions including facial, iris, fingerprint, and voice recognition technologies that can be combined for multi-modal authentication.
  • BioID: BioID offers a multi-modal biometric authentication solution with face, eye, and voice recognition

4. Train Employees

Organizations must train their employees on the risks associated with voice cloning and how to identify suspicious activity related to voice cloning attacks.

Here are some training examples:

  • Awareness Training: This is the most fundamental step. Employees need to be aware of the concept of voice cloning, how it works, and the potential risks it presents. This could be done through seminars, workshops, webinars, or e-learning courses.
  • Scenario-Based Training: Develop scenarios where voice cloning could be used in a harmful or fraudulent way. Have employees work through these scenarios to identify signs of voice cloning and to develop appropriate responses.
  • Regular Updates: As technology evolves, so too do the risks. Regularly update training to reflect the latest developments in voice cloning technology and potential new risks or threats.
  • Encourage Skepticism: Encourage employees to question and verify unexpected or suspicious phone calls or voice messages, even if they appear to be from known contacts. This could be especially important for employees in sensitive roles (like those handling financial transactions or sensitive data).

5. Develop a Rapid Response Plan

This plan should include clear guidelines on how to respond to suspected fraud incidents, as well as procedures for customer notification and remediation.

Some procedural examples are:

  • Demonstrations: If possible, demonstrate how voice cloning works. This could involve showing real examples of voice cloning, or even using voice cloning software to clone a familiar voice (like a senior executive’s voice) to make the threat more tangible.
  • Developing Protocols: Develop clear protocols on what employees should do if they suspect voice cloning is being used. This might include who to report suspicions to, and what information to record.

6. Launch Public Awareness Campaigns:

These campaigns should educate the general public about the risks and consequences of using voice cloning technology to manipulate public opinion. These campaigns can include simple explanations of what voice cloning is, how it works, and the potential risks it poses to individuals and society.

This could involve:

  • Infographics: Use visual data and charts to convey the complexity of voice cloning technology in a simple, engaging manner. This could be shared on social media platforms and websites.
  • Videos: Create short, informative videos explaining the risks of voice cloning. These could include expert interviews, animations, and real-world examples of voice cloning misuse.
  • Articles and Blogs: Publish articles and blogs on popular platforms explaining the risks associated with voice cloning and providing tips to stay safe.

7. Implement Voice Cloning Detection and Prevention Measures

Measures such as machine-learning algorithms and AI can help identify and prevent the use of voice cloning technology for disinformation.

  • Deepfake Detection Algorithms: Deepfakes, which include voice cloning, can be detected with AI and ML algorithms that are trained to identify discrepancies that are not easily noticed by humans. For example, such algorithms can be trained to detect subtle patterns in the audio that are typically associated with synthetic voices, such as certain anomalies or inconsistencies in the speech patterns or background noise. Companies like Google and Facebook are actively developing such technologies. An example of this is Google’s Deepfake Detection Dataset that provides a large amount of data for training deepfake detection models.
  • Text Analysis for Disinformation Detection: AI can also be used to analyze the content of the speech. This can be useful in identifying disinformation campaigns where voice cloning might be used. AI models can be trained to identify false information, inconsistencies in stories across different communications, and even the sentiment and emotional tone of the speech, all of which can be indicators of disinformation campaigns.

8. Enforce Content Moderation Policies:

The developers of voice cloning technologies must enforce content moderation policies that prohibit the dissemination of false or misleading information through voice recordings.

During the initial months of 2020, fraudsters exploited voice cloning technology to mimic the voice of a company’s director. They contacted a bank manager in Hong Kong, requesting the authorization of transfers totaling around $35 million, under the pretense of a company acquisition.

Similarly, in March 2019, criminals used the same technology to imitate the voice of a CEO, demanding a fraudulent transfer of €220,000 (equivalent to $243,000) from the CEO of a British energy firm.

Remember, in order to mitigate current and future threats, organizations must address the risks associated with voice cloning while such technologies are in their infancy. As these technologies will only get better over time, an industry-wide approach is required immediately in order to preempt further threats from future advances in voice cloning technology.

Thanks for reading this far.

My aim with this campaign is to provide readers with valuable content, insights, and inspiration that can help in their personal and professional lives. Whether you’re looking to improve your productivity, enhance your creative strategies, or simply stay up-to-date with the latest news and ideas in cybersecurity, I’ve got something for you.

But this campaign isn’t just about sharing our knowledge and expertise with you. It’s also about building a community of like-minded IT and security focused individuals who are passionate about learning, growing, and collaborating. By subscribing to the blog and reading every day, you’ll have the opportunity to engage with other readers, share your own insights and experiences, and connect with people in the industry.

So why should you read every day and subscribe? Well, for starters, you’ll be getting access to some great content that you won’t find anywhere else. From practical tips and strategies to thought-provoking insights and analysis, the blog has something for everyone that wants to get current and topical cybersecurity information. Plus, by subscribing, you’ll never miss a post, so you can stay on top of the latest trends and ideas in the field.

But perhaps the biggest reason to join the 30-in-30 campaign is that it’s a chance to be part of something bigger than yourself. By engaging with the community, sharing your thoughts and ideas, and learning from others, you’ll be able to grow both personally and professionally. So what are you waiting for? Subscribe, and for the next 30 days and beyond, let’s learn, grow, and achieve our goals together!

One thought on “AI Voice Cloning Threats and How to Protect Against Them

Comments are closed.