Google’s Translatotron

Last Updated : 22 Sep, 2021

Translatotron is a speech to speech translation model made by Google AI team that can convert speech from one language to another with retaining the voice of the speaker!!

What’s so special about it?
Earlier models used to have three components.

Conversion from speech to text
Translating the text
Generate speech from the translated text using Text To Speech Engine

The major disadvantage of those models is that error in any one phase may lead to some undesired output.
Also, Text to Speech Engines has limited voice options available like Microsoft Ana, Siri, etc.

Translatotron translates speech to speech directly without using any intermediate text representation. Because of that, it is able to retain the voice of the original speaker.

Advantages & Uses

The biggest advantage of Translatotron is prevence of vocal characteristics of the speaker.
In future, it might be used for automatic dubbing of movies – With voice of original actors.
Video tutorials can be made be accessible in native languages.

Challenges

Quality of translation is lower than Speech to Text -> Text to Speech translation cascade model. Hopefully, quality might get improved in future.
It will be easier to spoof voice of other persons. Hence, voice based authentication systems need to improve.

Suggest improvement

How Google Updates Itself!

Share your thoughts in the comments

Google’s Translatotron

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?