Engineering voice impersonation from machine learning

Text-to-speech (TTS) synthesis is the computer’s way of transforming text to audio. Most popular AI-driven personal assistants rely on TTS software to generate as natural-sounding speech as possible. Automation can happen once the computer performs the TTS “fluently” by pulling together words and phrases from pre-recorded files.

How is voice impersonation technology used? Google voice cloning and generative adversarial networks

Voice cloning is AI research from Google that allows a computer to read out loud messages using any voice. The system requires two inputs:

A text to be read A sample of the voice

Generative adversarial networks (GANs) can capture and modulate a voice signal’s audio properties. Open platforms such as WaveNet by Google apply GANs to create media that mimic voices and facial expressions to the extent that they become almost indistinguishable from how the impersonated person sounds and looks.

 As a rule of thumb, the voice-modeling technology improves the more you feed it with voice data. Nevertheless, advanced neural networks sometimes do not need to use a large dataset of recorded audio to pre-train the model.

Lyrebird AI

Tech companies such as the Canadian Lyrebird strive to design an AI system that can mimic a human

