How To Configure AWS Polly Text-To-Speech Service ?

Last Updated : 12 Feb, 2024

In today’s era, the use of text-to-speech (TTS) has become essential in different applications. It greatly enhances user experiences, with multimedia content. Makes accessibility easier. Amazon Polly, a service provided by Amazon Web Services (AWS) is a TTS solution that utilizes advanced deep-learning technologies to convert text into speech. This article aims to simplify and provide an explanation of how to set up AWS Polly.

Amazon Polly is a cloud-based service that utilizes deep learning techniques to transform written text into realistic speech. By incorporating into their applications developers can effortlessly integrate speech synthesis providing accessibility to more engaging user experiences and personalized interactions.

AWS Polly Text-To-Speech Service Components

Text-To-Speech: Process of converting text input to realistic speech by using advanced deep learning technologies.
Voice: Polly offers many kinds of voices. The voices have different accents, languages, and ways of talking. Each voice has a name like Joanna or Matthew. They are categorized based on gender, language, and region.
Speech Synthesis Markup Language (SSML): SSML allows users to control how Polly talks, like pronunciation, tone of voice, and speaking speed. It lets you make small changes to how Polly turns text into speech.
Lexicons: Custom pronunciation dictionaries that allow users to specify the pronunciation of specific words or phrases especially if the words are not correctly pronounced by default.

Step By Step Configuration Process Of AWS Polly Service

Step 1: Access the AWS Management Console

AWS Sign In console

Step 2: Navigate To Amazon Polly

From the services menu, select “Polly” Service by clicking on “Try Polly” as shown in the below screenshot.

Amazon Polly Service

Step 3: Configure Polly Settings

Choose Region: Select the AWS region where you want to configure Polly.
Enable Features: Enable features like Neural text-to-speech (NTTS) or time-stamped output for speech marks, based on your requirements.
Set Default Voice, Choose the default voice for speech synthesis.

Configuring the polly settings

Step 4: Generate Speech

Enter the text you want to synthesize into the provided text box.
Use SSML tags for fine-tune the output of Polly’s speech synthesis.(optional).
Select the desired voice and adjust any other parameters as needed.

Generating Speech with selecting desired parameters

Step 5: Integrate AWS Polly Into Your Application

Use AWS SDK or API to integrate Polly into your application code.
First you need to install AWS SDK. AWS SDK supports various programming languages including Java , Python, Java Script, (Node.js),Ruby etc.
Initialize an instance of Polly client in your application code using the AWS SDK. Use Polly client to synthesize speech from text. Polly supports several languages and voices.

Here is the basic example in Python using AWS SDK

Python

import boto3
 
# Initialize Polly client
polly_client = boto3.client('polly', region_name='us-east-1')  # Corrected AWS region
 
# Synthesize speech
response = polly_client.synthesize_speech(
    Text='Hello, this is a sample text.',
    OutputFormat='mp3',
    VoiceId='Joanna'  # Specify voice to use
)
 
# Save synthesized audio to a file
with open('output.mp3', 'wb') as file:
    file.write(response['AudioStream'].read())

Conclusion

Setting up Amazon Polly to enable text, to speech synthesis is a procedure that gives developers the ability to enhance their applications with speech. By following the instructions provided in this article and utilizing the frequently asked questions (FAQs) provided, you can seamlessly integrate Polly into your projects. Provide users with engaging experiences.

AWS Polly Service – FAQ’s

Can I Use Amazon Polly For Commercial Projects?

Yes, Amazon Polly is suitable for both commercial and noncommercial projects, as long as you adhere to the policies and pricing set by AWS.

Are There Any Restrictions On The Length Of Text That Can Be Synthesized?

Indeed there are limitations on the size of text inputs, for speech synthesis based on the chosen AWS region and voice type.

Am I Allowed To Personalize The Pronunciation Of Words?

Absolutely you have the option to create custom lexicons that define how individual words or phrases should be pronounced.

Does Amazon Polly Support Multiple Languages?

Yes, Amazon Polly supports a variety of languages and accents, allowing for multilingual speech synthesis.

Can I Control The Speech Rate And Pitch?

Yes, you can adjust parameters such as speech rate, pitch, and volume using SSML tags or API parameters.

Suggest improvement

How to Convert Text to Speech in Android using Kotlin?

Share your thoughts in the comments