Build a Song Transcriptor App Using Python

In today's digital landscape, audio files play a significant role in various aspects of our lives, from entertainment to education. However, extracting valuable information or content from audio recordings can be challenging. In this article, we will learn how to build a Song Transcriber application using Python. Its primary function is to transcribe audio files into text format accurately and efficiently.

Song Transcriptor App Using Python

Below is the implementation of the Song Transcriptor App Using Python:

Create a Virtual Environment

First, create the virtual environment using the below commands

python -m venv env 
.\env\Scripts\activate.ps1

Install Necessary Library

First, it is essential to install the Whisper library and Streamlit. Streamlit is employed for creating the GUI, while Whisper is utilized for audio whispering. To install these libraries, execute the following commands.

pip install whisper
pip install streamlit

File Structure

file-

Writing Python Code (app.py)

Below, are the step-by-step explanation of the code:

Step 1 :Imports and Library Initialization

Below, code imports the necessary libraries. Streamlit is used for creating web applications with simple Python scripts. The 'os' module provides a way to interact with the operating system, and 'whisper' is presumably a library for handling audio-related tasks.

Python3

import streamlit as st
import os
import whisper

Step 2: Function to Save Uploaded File

below, code function takes an uploaded file and a path as input and saves the file to the specified path. It uses the 'os.path.join' method to create the full file path and writes the file using the 'write' method.

Python3

def save_uploaded_file(uploaded_file, save_path):
    with open(os.path.join(save_path, uploaded_file.name), "wb") as f:
        f.write(uploaded_file.getbuffer())

Step 3: Loading Whisper Model

Below, code function initializes and loads a whisper model named "base" and assigns it to the variable 'model'. The loaded model is then used globally in the script.

Python3

def load_whisper_model():
    model = whisper.load_model("base")
    return model


model = load_whisper_model()

Step 4: Audio Transcription Function

below, code function transcribes an audio file located at 'mp3_filepath' using the loaded whisper model. It returns the transcribed text from the audio.

Python3

def transcribe_audio(mp3_filepath):
    # Transcribe using whisper
    result = model.transcribe(mp3_filepath)
    transcription = result['text']
    return transcription

Step 5: Main Streamlit Application

This is the main part of the code where the Streamlit application is created. It provides a simple web interface for users to upload an MP3 audio file, displays the uploaded file, saves it, transcribes the audio using the Whisper model, and then displays the transcribed text in a Streamlit text area. The application is executed if the script is run directly.

Python3

def main():
    st.title("Song Transcriber")

    # File uploader widget
    uploaded_file = st.file_uploader("Upload an audio file", type="mp3")

    if uploaded_file is not None:
        save_path = "audio_files"  # Folder to save the uploaded file
        os.makedirs(save_path, exist_ok=True)

        # Display the uploaded file
        st.audio(uploaded_file, format='audio/mp3')

        # Save the uploaded file
        save_uploaded_file(uploaded_file, save_path)
        st.success("Audio file uploaded successfully.")

        # Transcribe the audio
        transcribed_text = transcribe_audio(
            f"audio_files/{uploaded_file.name}")

        # Display the transcribed text
        st.subheader("Transcribed Text:")
        st.text_area("Transcribed Text", value=transcribed_text, height=200)


if __name__ == "__main__":
    main()

Complete Code Implementation

Here is the complete code implementation that we have used in app.py file.

app.py

Python3

import streamlit as st
import os
import whisper

def save_uploaded_file(uploaded_file, save_path):
    with open(os.path.join(save_path, uploaded_file.name), "wb") as f:
        f.write(uploaded_file.getbuffer())

def load_whisper_model():
    model = whisper.load_model("base")
    return model

model = load_whisper_model()

def transcribe_audio(mp3_filepath):
    # Transcribe using whisper
    result = model.transcribe(mp3_filepath)
    transcription = result['text']
    return transcription

def main():
    st.title("Song Transcriber")

    # File uploader widget
    uploaded_file = st.file_uploader("Upload an audio file", type="mp3")

    if uploaded_file is not None:
        save_path = "audio_files"  # Folder to save the uploaded file
        os.makedirs(save_path, exist_ok=True)

        # Display the uploaded file
        st.audio(uploaded_file, format='audio/mp3')

        # Save the uploaded file
        save_uploaded_file(uploaded_file, save_path)
        st.success("Audio file uploaded successfully.")
        # Transcribe the audio
        transcribed_text = transcribe_audio(
            f"audio_files/{uploaded_file.name}")
        st.subheader("Transcribed Text:")
        st.text_area("Transcribed Text", value=transcribed_text, height=200)

if __name__ == "__main__":
    main()