Audio files are a widespread means of transferring information. So, let’s see how to break down audio files (.wav files) into smaller chunks, and to recognize the content in them and store it to a text file. To know more about audio files and their formats, refer Audio_formats.
Need to break down an audio file?
When we do any processing on audio files, it takes a lot of time. Here, processing can mean anything. For example, we may want to increase or decrease the frequency of the audio, or as done in this article, recognize the content in the audio file. By breaking it down into small audio files called chunks, we can ensure that the processing happens fast.
pip3 install pydub pip3 install audioread pip3 install SpeechRecognition
There are majorly two steps in the program.
Step #1: It deals with slicing the audio files into small chunks of a constant interval. The slicing can be done with, or without overlap. Overlap means that the next chunk created will start from a constant time backward, so that during the slicing if any audio/word gets cut, it can be covered by this overlap. For example, if the audio file is 22 seconds, and the overlap is 1.5 seconds, the timing of these chunks will be:
chunk1 : 0 - 5 seconds chunk2 : 3.5 - 8.5 seconds chunk3 : 7 - 12 seconds chunk4 : 10.5 - 15.5 seconds chunk5 : 14 - 19.5 seconds chunk6 : 18 - 22 seconds
We can ignore this overlap by setting the overlap to 0.
Step #2: It deals with working with the sliced audio file to do whatever the user requires. Here, for demonstration purposes, the chunks have been passed through the Google Speech recognition module, and the text has been written to a separate file. To understand how to use the Google Speech Recognition module to recognize the audio from a microphone, refer this. In this article, we will be using the sliced audio files to recognize the content.
Step #2 is done in a loop inside Step #1. As soon as the audio file is sliced into the chunk, the chunk is recognized. This process continues till the end of the audio file.
Input : Geek.wav Output : Screenshot of cmd running the code: Text File: recognized
Below is the implementation:
As we can see in the above screenshot, all these chunks are stored in the local system. We have now successfully sliced the audio file with an overlap and recognized the content from the chunks.
Advantages of this method:
- The interval can be set to any length depending on how long we need the chunks to be.
- Overlap ensures that no data is lost even if any word is said precisely at the end of the interval.
- The chunks can all be stored in different audio files and used later if need be.
- Any processing which can be done on an audio file can be done in these chunks as well, as they are just audio files.
Disadvantages of this method:
- Using Google Speech Recognition requires an active internet connection.
- After the overlap, some text processing should be done to remove the duplicate words recognized.
- The accuracy of Google Speech Recognition varies on a lot of factors like background noise, speaker’s accent etc.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- List all the Microphones connected to System in Python using PyAudio and SpeechRecognition
- Working with wav files in Python using Pydub
- Introduction and Installation of Uberi/Speechrecognition in Python
- Point Processing in Image Processing using Python-OpenCV
- Python | Get a google map image of specified location using Google Static Maps API
- How to use Google Colaboratory for Video Processing
- Calculate distance and duration between two places using google distance matrix API in Python
- Speech Recognition in Python using Google Speech API
- Python | Get a set of places according to search query using Google Places API
- Python | Calculate geographic coordinates of places using google geocoding API
- Language Translator Using Google API in Python
- How to make a Google Translation API using Python?
- Extract and Add FLAC Audio Metadata using the mutagen module in Python
- Access metadata of various audio and video file formats using Python - tinytag library
- Parsing and Processing URL using Python - Regex
- YouTube Media/Audio Download using Python | pafy
- Video to Audio convert using Python
- Convert PDF File Text to Audio Speech using Python
- How to use Vision API from Google Cloud
- How to use Vision API from Google Cloud | Set-2
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.