Extract speech text from video in Python

Last Updated : 05 Jun, 2023

Nowadays, videos have become an integral part of our lives. Videos educate us and provide the necessary information. In this article, we will learn how to extract text speech from video using Python.

Extract Speech Text from the Video

To extract speech text from video in Python, we require the following modules to install, Here we are using Python PIP to install different modules.

Moviepy Module

The moviepy module in Python is used to perform basic operations on a video. It is used in the video editing process to perform functions like cutting, adding text, merging videos, and many more. You can install the moviepy module by writing the following command in your terminal.

pip install moviepy

Note: This module automatically installs FFmpeg. However, you might prompt to install it in some cases. You can refer to the links here to install FFmpeg on Linux and on Windows.

SpeechRecognition

The speechrecognition module in Python provides an easy way to interact with speech and audio files. You can install the SpeechRecognition module in Python using the following command:

pip install SpeechRecognition

Steps to Extract Speech Text from Video in Python

Step 1: Import the required modules

The first step is to import the required modules, i.e., moviepy and speech_recognition.

import moviepy.editor as mp
import speech_recognition as sr

Step 2: Load the video

The next step is to load the video who’s the speech we want to extract. For this, we will use the VideoFileClip() function of moviepy module.

mp.VideoFileClip("file_path")

Step 3: Extract audio from the video

Then extract the audio from the video using the audio attribute and then write the ‘.mp4’ file to the ‘.wav’ file using the write_audiofile() function.

audio.write_audiofile("fimename.wav")

Step 4: Load audio

Load the newly converted audio file using the AudioFile() function of the speech recognition module.

with sr.AudioFile("geeksforgeeks.wav") as source:
    data = r.record(source)

Step 5: Convert audio to text

The final step is to convert the data extracted from the audio to text format. This can be done using the recognize_google() function and passing the extracted data as the parameter.

text = r.recognize_google(data)

Code Implementation:

Now, let us see the full implementation of the code to extract speech text from a video in Python. We will take geeksforgeeks.mp4 as an example video for this problem statement. Make sure the associated video is present in the folder where the script is located.

Python3

import moviepy.editor as mp 
import speech_recognition as sr 
  
# Load the video 
video = mp.VideoFileClip("geeksforgeeks.mp4") 
  
# Extract the audio from the video 
audio_file = video.audio 
audio_file.write_audiofile("geeksforgeeks.wav") 
  
# Initialize recognizer 
r = sr.Recognizer() 
  
# Load the audio file 
with sr.AudioFile("geeksforgeeks.wav") as source: 
    data = r.record(source) 
  
# Convert speech to text 
text = r.recognize_google(data) 
  
# Print the text 
print("\nThe resultant text from video is: \n") 
print(text) 

Output:

Extract speech text from video in Python

Text extracted from geeksforgeeks.mp4

Suggest improvement

Interactive Data Visualization with Python and Bokeh

Dictionary Based Tokenization in NLP

Share your thoughts in the comments

Extract speech text from video in Python

Extract Speech Text from the Video

Steps to Extract Speech Text from Video in Python

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?