Google’s Translatotron

Translatotron is a speech to speech translation model made by Google AI team that can convert speech from one language to another with retaining the voice of the speaker!!

What’s so special about it?
Earlier models used to have three components.

  1. Conversion from speech to text
  2. Translating the text
  3. Generate speech from the translated text using Text To Speech Engine

The major disadvantage of those models is that error in any one phase may lead to some undesired output.
Also, Text to Speech Engines has limited voice options available like Microsoft Ana, Siri, etc.

Translatotron translates speech to speech directly without using any intermediate text representation. Because of that, it is able to retain the voice of the original speaker.

Advantages & Uses

  • The biggest advantage of Translatotron is prevence of vocal characteristics of the speaker.
  • In future, it might be used for automatic dubbing of movies – With voice of original actors.
  • Video tutorials can be made be accessible in native languages.

Challenges

  • Quality of translation is lower than Speech to Text -> Text to Speech translation cascade model. Hopefully, quality might get improved in future.
  • It will be easier to spoof voice of other persons. Hence, voice based authentication systems need to imporve.
My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.