How To use Cloud Speech-To-Text For Speech Recognition On GCP?

Google Cloud Platform is one of the famous cloud service providers in the market. With cloud features focusing on deployment and storage, GCP also provides features like speech recognition. This powerful and easy-to-use service is called Cloud speech-to-text. This functionality enables developers to convert spoken language into text with high accuracy. Speech-to-text can be integrated with applications to provide transcriptions, and businesses can use this to enhance their accessibility. In this article, we will be learning about this Cloud Speech-to-Text provided by GCP and how we can use this feature to get transcription of Speech.

Key Terminologies

Google Cloud Platform (GCP): Google Cloud Platform is a suite of cloud computing services provided by Google. The service provided by GCP are, computing, storage, machine learning, and more. Check out Google Cloud Platform Tutorial for tutorials on Google Cloud Platform.
Cloud Speech-to-Text: Cloud speech-to-text is a service on GCP that enables developers to convert audio input to text using Google’s speech recognition technology. This service can be integrated with other applications via API and helps in providing better accessibility.

Step To Use Cloud Speech-To-Text For Speech Recognition On GCP

Step 1: Open GCP Cloud Console

Open the log into Google Cloud Platform. In your web browser go to GCP Cloud and log in with your valid credentials.
You must have a valid subscription plan to use the services we are going to use.
Make sure that you have an active subscription or a trial plan.

Step 2: Enable Cloud Speech-To-Text API

Once you are logged into GCP Console, navigate to “API & Services” section.
Click on “Enable APIs and Services“.

This will open a search bar to search for required APIs. Search for “Cloud Speech-to-Text API“.

Click on it and this will show you details about this API. Click on enable to enable this API for your project.

Step 3: Create A Service Account

Next, we need to create a service account to generate Key which will help to authenticate our requests.
Service account is a special type of account used by applications and Virtual Machines to authenticate and interact with other GCP services and APIs.
Navigate to the “APIs & Services” and click on “Credentials“.

Click on “Create Credentials” and select “Service Account“.

Now give a name to this service account and click on “Create and continue“.

For Role, select owner and click on Continue.

Leave all other details as default/ pre-set. Click on Done.

Step 4: Create JSON Key

JSON Key, is also known as Service Account Key or Credentials File.
It is a JSON (JavaScript Object Notation) file format that contains authentication information for a service account in Google Cloud Platform.
To generate the JSON Key for our service account click on the newly created service account.

Now go to the Key section and select Create new key.

For Key Type select JSON and it will create the Key and a JSON file will be download automatically.

Step 5: Install Required Packages

Here we are going to implement this Cloud feature using Python.
Open any Python IDE available in your system to procced further. You can also use Google Colab to implement this.
We will first upgrade the google-cloud-speech package in Python. If this is not available, the package will be installed.

pip install --upgrade google-cloud-speech

Output

Step 6: Import Library

Let’s import the required library for our Cloud Speech-to-Text implementation.

 from google.cloud import speech

This module is a part of Google Cloud client library for Python, which provides convenient access to Google Cloud services, including the Cloud Speech-to-Text API.

Step 7: Connect With GCP

Now connect the python environment to the Google Cloud service account using the JSON key we have generated. First put the JSON file in the working directory and then execute the following line of code.

client = speech.SpeechClient.from_service_account_file('[file_name].json')

Step 8: Select Speech File

Get any audio file that contains some speech and paste it in the current directory.
Then specify the path for the audio file and then open it and store it in a variable.

Step 9: Perform Speech-to-Text Operation

First we will pass the binary data of audio file contained in the ‘mp3_data’ variable to the Cloud Speech-to-Text API for transcription.

audio_file = speech.RecognitionAudio(content = mp3_data)

Now, create a variable to define a configuration object for speech recognition request.
We will set the sample rate of the audio file which signifies the number of audio carried per second, in Hertz. Also enable automatic punctuation to get appropriate result including comma, question marks, etc.
Lastly define the language-code which is American English (en-US) in this case.

config = speech.RecognitionConfig(
    sample_rate_hertz=44100,
    enable_automatic_punctuation=True,
    language_code='en-US'
)

Store the transcription results obtained from the Google Cloud Speech-to-Text API in a response variable.
We will call the speech recognition process using specified configuration and audio data.

Output:

Step 10: Check Result

Let’s try printing the response we got and see what it shows

print(response)

Output:

The response has the following details including,

Transcript: This is the text generated by the speech recognition process.
Confidence: Confidence indicates the likelihood that the transcribed text accurately represents the spoken words.
result_end_time: Incdicates the end time of the audio segment.
Language Code: The language code specifies the language of the transcripted text.
Total Billed Time: This is the time billed for transcription process mesured in seconds.
Request Id: Request id is the unique identifier assigned to the speech recognition request by Google Cloud Speech-to-Text API.

Here, we need only the transcription as output, so let’s format the print statement to get only the transcription.

for result in response.results:
  print("Transcript : {} ".format(result.alternatives[0].transcript))

Output:

Conclusion

Google Cloud Speech-to-Text API offers a powerful and reliable solution for converting audio data into text with high accuracy. By using this Cloud feature, developers can easily integrate speech recognition functionality in their application. We can use this feature for cases like, transcription, voice-controlled interfaces, sentiment analysis and more. Google Cloud Speech-to-Text API provides the tools and functionality to provide accurate and efficient speech recognition as per requirements.

Speech Recognition In GCP – FAQ’s

How To Enable Cloud Speech-to-text Api In GCP?

To enable Cloud Speech-to-Text API in Google Cloud Platform, go to “API & Services” in Cloud Console and search for the Cloud Speech-to-Text API then click on the search result and in the next page, click on enable.

How Accurate Is Cloud Speech-to-text ?

Generally Cloud Speech-to-Text achieves high accuracy rates and Google is continuously improving it. However, the accuracy depends on various factors including, audio quality, background noise, accent, and language complexity.

How Many Languages Are Known To Cloud Speech-to-text ?

Cloud Speech-to-Text supports multiple language and variants including regional accents and dialects. It currently supports over 125 languages. Users can list up to three languages for automatic language recognition.

Is Cloud Speech-to-text Is A Free Service Or Paid ?

Cloud Speech-to-Text is a paid service provided by Google Cloud Platform. Pricing for Cloud Speech-to-Text is based on the duration of the audio and other factors. There are different pricing based on the number of seconds processed per month.

How To Use Cloud Speech-to-text For Free?

There is no option to use Cloud Speech-to-Text for free. However you can get a free trial of Google Cloud Platform which offers $300 credit to use any Google Cloud service, include Cloud Speech-to-Text.

Article Tags :

Dev Scripter

DevOps

Google Cloud Platform

Dev Scripter 2024

Google-Cloud-Platform