Open In App

Understand Audio data

Our daily lives are full of various audio in the form of sounds and speeches. Analyzing them is now a crucial task for numerous industries like Music, Crime Investigation, speech recognition, etc. Audio data analysis involves the exploration and interpretation of sound signals by extracting valuable insights, recognizing patterns, and making informed decisions. This multifaceted field encompasses a variety of fundamental concepts and techniques, contributing to a deeper comprehension of audio data. In this article, we will see various techniques to understand audio data.

Audio data

Audio is the representation of sound as a set of electrical impulses or digital data. It is the process of converting sound into an electrical signal that may be stored, transferred, or processed. The electrical signal is subsequently transformed back into sound, which a listener may hear.



Audio is a way to communicate our lives, and it influences how we connect with one another and perceive the world around us.

For this implementation, we need an audio data file (.wav). Let’s understand audio data using ‘Recording.wav’, which can be directly downloaded from here. You can use any audio file (.wav) at your convenience.



Understanding audio data step-by-step

Importing libraries

At first, we will import all the required Python libraries, like Librosa, NumPy, Matplotlib and SciPy.




import librosa
import numpy as np
import matplotlib.pyplot as plt
from scipy.fft import fft
from scipy.signal import spectrogram
import soundfile as sf

Sampling and sampling rate

To understand the audio data, first we need to understand the Sampling and sampling rate.

Since human voice has audible frequencies below 8 kHz, sampling speech at 16 kHz is adequate. A faster sample rate just raises the computing cost of processing these files.

In the code, we will use Librosa module to sample audio file implicitly. It will first read the audio file and then convert it into a waveform (a sequence of amplitude values) that can be processed digitally.




# Load the audio file
audio_file = 'Recording.wav'
# sr set to none to extract sampling rate information from the audio file's metadata
waveform, sampling_rate = librosa.load(audio_file, sr=None)
print(f'Sampling Rate: {sampling_rate} Hz')

Output:

Sampling Rate: 48000 Hz

The sampling rate for the audio used is 48000 Hz.

Listen the Audio files at same Sampling Rate:




from IPython.display import Audio
 
Audio(waveform, rate=sampling_rate)

Output:

Sampling Rate : 4800

Listen the Audio files at Double Sampling Rate:




Audio(waveform, rate=sampling_rate*2)

Output:

Sampling Rate:9600

Listen the Audio files at half Sampling Rate:




Audio(waveform, rate=sampling_rate/2)

Output:

Sampling Rate: 2400

we will observe from the above audio that while changing the sampling rate audio are distorted.

Calculating Amplitude and Bit depth

Amplitude and Bit depth are two important concepts to understand the intensity of Audio Signal.

In the code snippet, calculates the amplitude range of the waveform. The soundlife library to stores the audio file in variable audio_data.

The code line bit_depth = audio_data.dtype.itemsize calculates the bit depth of the audio data by examining the data type of the audio_data array and finding its item size in bytes.




# Compute the amplitude range
amplitude_range = np.max(waveform) - np.min(waveform)
# Get the bit depth from the audio data
audio_data, sampling_rate = sf.read(audio_file)
bit_depth = audio_data.dtype.itemsize
# printing values
print(f'Amplitude Range: {amplitude_range}')
print(f'Bit Depth: {bit_depth} bits')

Output:

Amplitude Range: 0.292144775390625
Bit Depth: 8 bits

So, our audio file is in 8 bits category of bit depth.

Knowing the amplitude range and bit depth helps us to gain useful insights into the characteristics of an audio file, which is important in audio processing and analysis tasks.

Waveform (Time Domain) Representation

A waveform is the graphical representation of of an audio signal in the time domain where each point on the waveform represents the amplitude of the audio signal at a specific point in time. It will help us to understand how the audio signal varies over time by its revealing features like sound duration, pauses and amplitude changes.

In the code snippet, we have plot waveform leveraging librosa and matplotlib.




# Plot the waveform
## set size
plt.figure(figsize=(8, 4))
 
## Display the waveform of the adio signal
librosa.display.waveshow(waveform, sr = sampling_rate)
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.title('Audio Waveform')
plt.grid(True)
plt.show()

Output:

Waveform Representation

Visualizing Frequency Spectrum

Frequency Spectrum is a representation of how the energy in an audio signal is distributed across different frequencies which can be calculated by applying a mathematical transformation like the Fast Fourier Transform (FFT) to the audio signal. It is very useful for identifying musical notes, detecting harmonics or filtering specific frequency components.

The code snippet computes Fast Fourier Transform and plots frequency spectrum of audio using matplotlib.




# Compute the FFT of the waveform
spectrum = fft(waveform)
 
# Frequency bins
frequencies = np.fft.fftfreq(len(spectrum), 1 / sampling_rate)
 
# Plot the frequency spectrum
plt.figure(figsize=(8, 4))
plt.plot(frequencies[:len(frequencies)//2], np.abs(spectrum[:len(spectrum)//2]))
plt.xlabel('Frequency (Hz)')
plt.ylabel('Amplitude')
plt.title('Frequency Spectrum')
plt.grid(True)
plt.show()

Output:

Frequency Spectrum

The plot displays the frequency of the audio signal, which allows us to observe the dominant frequencies and their amplitudes.

Spectrogram

Spectrogram is a time-frequency representation of an audio signal which provides a 2D visualization of how the frequency content of the audio signal changes over time.

In spectrogram, the dark regions indicate less presence of a frequency and in the other hand bright regions indicate strong presence of a frequency at a certain time. This will help is various tasks like speech recognition, musical analysis and identifying sound patterns.

The code snippet computes and plots the audio waveform using matplotlib library and scipy.signal.spectrogram function. In the code, epsilon defines a small constant to avoid division by zero. Epsilon is added to the spectrogram values before taking logarithm to prevent issues with very small values.




# Compute the spectrogram
 
# Small constant to avoid division by zero(if any)
epsilon = 1e-40 
f, t, Sxx = spectrogram(waveform, fs=sampling_rate)
 
# Plot the spectrogram with the constant added to Sxx
plt.figure(figsize=(8, 4))
plt.pcolormesh(t, f, 10 * np.log10(Sxx + epsilon))
plt.colorbar(label='Power/Frequency (dB/Hz)')
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
plt.title('Spectrogram')
plt.show()

Output:

Spectrogram

The plot displays spectrogram, which represents how the frequencies in the audio signal change over time. The color intensity represents whether the frequency is high or low at each time point.

Conclusion

We can conclude that various types of visualization tasks help us to understand the behavior of audio signals very effectively. Also understanding audio signal is very essential task for audio classification, music genre classification and speed recognition.


Article Tags :