Translatotron is a speech to speech translation model made by Google AI team that can convert speech from one language to another with retaining the voice of the speaker!!
What’s so special about it?
Earlier models used to have three components.
- Conversion from speech to text
- Translating the text
- Generate speech from the translated text using Text To Speech Engine
The major disadvantage of those models is that error in any one phase may lead to some undesired output.
Also, Text to Speech Engines has limited voice options available like Microsoft Ana, Siri, etc.
Translatotron translates speech to speech directly without using any intermediate text representation. Because of that, it is able to retain the voice of the original speaker.
Advantages & Uses
- The biggest advantage of Translatotron is prevence of vocal characteristics of the speaker.
- In future, it might be used for automatic dubbing of movies – With voice of original actors.
- Video tutorials can be made be accessible in native languages.
- Quality of translation is lower than Speech to Text -> Text to Speech translation cascade model. Hopefully, quality might get improved in future.
- It will be easier to spoof voice of other persons. Hence, voice based authentication systems need to imporve.