Open In App

Analyzing Mobile Data Speeds from TRAI with Pandas

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Let’s use a real dataset from TRAI to analyze mobile dataspeeds and try to see the average speeds for a particular operator or state in that month. This will also show how easily Pandas could be used on any real world data to derive interesting results.



About Dataset –
Telecom Regulatory Authority of India (TRAI) releases a monthly dataset of the internet speeds it measures through the MySpeed (TRAI) app. This includes speed tests initiated by the user itself or periodic background tests done by the app. We will try to analyze this dataset and see the average speeds for a particular operator or state in that month.

Inspecting the raw structure of data:



STATES Found:  ['Kerala' 'Rajasthan' 'Maharashtra' 'UP East' 'Karnataka' nan
 'Madhya Pradesh' 'Kolkata' 'Bihar' 'Gujarat' 'UP West' 'Orissa'
 'Tamil Nadu' 'Delhi' 'Assam' 'Andhra Pradesh' 'Haryana' 'Punjab'
 'North East' 'Mumbai' 'Chennai' 'Himachal Pradesh' 'Jammu & Kashmir'
 'West Bengal']
OPERATORS Found:  ['IDEA' 'JIO' 'AIRTEL' 'VODAFONE' 'CELLONE']

 

Step #5: Define the function fixed_operator, which will keep the operator constant and iterate through all the available states for that operator. We can construct a similar function for a fixed state.




# filter out the operator and technology
# first as this will be common for all
filtered = df[(df['Service Provider'] == CONST_OPERATOR) 
               & (df['Technology'] == CONST_TECHNOLOGY)]
  
# iterate through each of the states
for state in states:
  
    # create new dataframe which contains
    # only the data of the current state
    base = filtered[filtered['State'] == state]
  
    # filter only download speeds based on test type
    down = base[base['Test Type'] == 'download']
  
    # filter only upload speeds based on test type
    up = base[base['Test Type'] == 'upload']
  
    # calculate mean of speeds in Data Speed
    # column using the Pandas.mean() method
    avg_down = down['Data Speed'].mean()
  
    # calculate mean of speeds
    # in Data Speed column
    avg_up = up['Data Speed'].mean()
  
    # discard values if mean is not a number(nan)
    # and append only the valid ones
    if (pd.isnull(avg_down) or pd.isnull(avg_up)):
        down, up = 0, 0
      
    else:
        final_states.append(state)
        final_download_speeds.append(avg_down)
        final_upload_speeds.append(avg_up)
  
        # print output upto 2 decimal places
        print(str(state) + ' -- Avg. Download: ' +
                          str('%.2f' % avg_down) + 
         '  Avg. Upload: ' + str('%.2f' % avg_up))

Output:

Kerala -- Avg. Download: 26129.27  Avg. Upload: 5193.46
Rajasthan -- Avg. Download: 27784.86  Avg. Upload: 5736.18
Maharashtra -- Avg. Download: 20707.88  Avg. Upload: 4130.46
UP East -- Avg. Download: 22451.35  Avg. Upload: 5727.95
Karnataka -- Avg. Download: 16950.36  Avg. Upload: 4720.68
Madhya Pradesh -- Avg. Download: 23594.85  Avg. Upload: 4802.89
Kolkata -- Avg. Download: 26747.80  Avg. Upload: 5655.55
Bihar -- Avg. Download: 31730.54  Avg. Upload: 6599.45
Gujarat -- Avg. Download: 16377.43  Avg. Upload: 3642.89
UP West -- Avg. Download: 23720.82  Avg. Upload: 5280.46
Orissa -- Avg. Download: 31502.05  Avg. Upload: 6895.46
Tamil Nadu -- Avg. Download: 16689.28  Avg. Upload: 4107.44
Delhi -- Avg. Download: 20308.30  Avg. Upload: 4877.40
Assam -- Avg. Download: 5653.49  Avg. Upload: 2864.47
Andhra Pradesh -- Avg. Download: 32444.07  Avg. Upload: 5755.95
Haryana -- Avg. Download: 7170.63  Avg. Upload: 2680.02
Punjab -- Avg. Download: 14454.45  Avg. Upload: 4981.15
North East -- Avg. Download: 6702.29  Avg. Upload: 2966.84
Mumbai -- Avg. Download: 14070.97  Avg. Upload: 4118.21
Chennai -- Avg. Download: 20054.47  Avg. Upload: 4602.35
Himachal Pradesh -- Avg. Download: 7436.99  Avg. Upload: 4020.09
Jammu & Kashmir -- Avg. Download: 8759.20  Avg. Upload: 4418.21
West Bengal -- Avg. Download: 16821.17  Avg. Upload: 3628.78

Plotting the data –

Use the arange() method of Numpy which returns evenly spaced values within a given interval. Here, passing the length of the final_states list, hence we get values from 0 to the number of states in the list like [0, 1, 2, 3 …]
We can then use these indices to plot a bar at that location. The second bar is plotted by offsetting the location of the first bar by the bar width.




fig, ax = plt.subplots()
   
# the width of each bar
bar_width = 0.25
   
# opacity of each bar
opacity = 0.8
   
# store the positions
index = np.arange(len(final_states))
   
# the plt.bar() takes in the position
# of the bars, data to be plotted,
# width of each bar and some other 
# optional parameters, like the opacity and colour
   
# plot the download bars
bar_download = plt.bar(index, final_download_speeds,
                       bar_width, alpha=opacity,
                       color='b', label='Download')
# plot the upload bars  
bar_upload = plt.bar(index + bar_width, final_upload_speeds, 
                        bar_width, alpha=opacity, color='g',
                                             label='Upload')
   
# title of the graph
plt.title('Avg. Download/Upload speed for '
                     + str(CONST_OPERATOR))
   
# the x-axis label
plt.xlabel('States')
   
# the y-axis label
plt.ylabel('Average Speeds in Kbps')
   
# the label below each of the bars,
# corresponding to the states
plt.xticks(index + bar_width, final_states, rotation=90)
   
# draw the legend
plt.legend()
   
# make the graph layout tight
plt.tight_layout()
   
# show the graph
plt.show()

Bar plot of the calculated speeds

Comparing data of two months –

Let’s take some another month’s data as well and plot them together to observe the difference in the data speeds.

For this example, the previous month’s dataset will be same sept18_publish.csv and the next month’s dataset is oct18_publish.csv.

We just need to execute the same steps again. Read the another month’s data. Filter it out to subsequent dataframes and then plot it using a slightly different method. During plotting of the bars, we will increment the 3rd and 4th bars(corresponding to the second file’s upload and download) by 2 and 3 times the bar width, so that they are in their correct positions.

Offsetting logic when plotting 4 bars

Below is the implementation for comparing 2-months of data:




import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
   
# older month
   
# newer month
   
CONST_OPERATOR = 'JIO'
CONST_STATE = 'Delhi'
CONST_TECHNOLOGY = '4G'
   
# read file with Pandas and store as Dataframe
df = pd.read_csv(DATASET_FILENAME)
df2 = pd.read_csv(DATASET_FILENAME2)
   
# assign column names
df.columns = ['Service Provider', 'Technology', 'Test Type',
                   'Data Speed', 'Signal Strength', 'State']
df2.columns = ['Service Provider', 'Technology', 'Test Type',
                    'Data Speed', 'Signal Strength', 'State']
   
# find and display the unique states
states = df['State'].unique()
print('STATES Found: ', states)
   
# find and display the unique operators
operators = df['Service Provider'].unique()
print('OPERATORS Found: ', operators)
   
# define lists
final_download_speeds = []
final_upload_speeds = []
   
final_download_speeds_second =[]
final_upload_speeds_second = []
       
final_states = []
final_operators = []
   
# assign column names to the data
df.columns = ['Service Provider', 'Technology', 'Test Type'
                   'Data Speed', 'Signal Strength', 'State']
df2.columns = ['Service Provider', 'Technology', 'Test Type',
                    'Data Speed', 'Signal Strength', 'State']
   
  
print('\n\nComparing data for' + str(CONST_OPERATOR))
filtered = df[(df['Service Provider'] == CONST_OPERATOR)
               & (df['Technology'] == CONST_TECHNOLOGY)]
  
filtered2 = df2[(df2['Service Provider'] == CONST_OPERATOR)
                 & (df2['Technology'] == CONST_TECHNOLOGY)]
   
for state in states:
    base = filtered[filtered['State'] == state]
  
    # calculate mean of download speeds
    avg_down = base[base['Test Type'] ==
               'download']['Data Speed'].mean()
  
    # calculate mean of upload speeds
    avg_up = base[base['Test Type'] ==
                 'upload']['Data Speed'].mean()
  
    base2 = filtered2[filtered2['State'] == state]
  
    # calculate mean of download speeds
    avg_down2 = base2[base2['Test Type'] == 
             'download']['Data Speed'].mean()
  
    # calculate mean of upload speeds
    avg_up2 = base2[base2['Test Type'] == 
                'upload']['Data Speed'].mean()
  
  
    # discard values if mean is not a number(nan)
    # and append only the needed speeds
    if (pd.isnull(avg_down) or pd.isnull(avg_up) or
        pd.isnull(avg_down2) or pd.isnull(avg_up2)):
        avg_down = 0
        avg_up = 0
        avg_down2 = 0
        avg_up2 = 0
    else:
        final_states.append(state)
        final_download_speeds.append(avg_down)
        final_upload_speeds.append(avg_up)
        final_download_speeds_second.append(avg_down2)
        final_upload_speeds_second.append(avg_up2)
          
        print('Older: ' + str(state) + ' -- Download: ' +
              str('%.2f' % avg_down) + '  Upload: ' +
              str('%.2f' % avg_up))
  
        print('Newer: ' + str(state) + ' -- Download: ' +
              str('%.2f' % avg_down2) + '  Upload: ' +
              str('%.2f' % avg_up2))
  
# plot bargraph
fig, ax = plt.subplots()
index = np.arange(len(final_states))
bar_width = 0.2
opacity = 0.8
  
rects1 = plt.bar(index, final_download_speeds, 
                 bar_width, alpha=opacity, color='b',
                     label='Older Month\'s Download')
  
rects2 = plt.bar(index + bar_width, final_upload_speeds, 
                    bar_width, alpha=opacity, color='g',
                          label='Older Month\'s Upload')
  
rects3 = plt.bar(index + 2 * bar_width, final_download_speeds_second, 
                                 bar_width, alpha=opacity, color='y',
                                     label='Newer Month\'s Download')
  
rects4 = plt.bar(index + 3 * bar_width, final_upload_speeds_second, 
                               bar_width, alpha=opacity, color='r',
                                     label='Newer Month\'s Upload')
  
plt.xlabel('States')
plt.ylabel('Average Speeds')
plt.title('Avg. Download/Upload speed for '
                     + str(CONST_OPERATOR))
  
plt.xticks(index + bar_width, final_states, rotation=90)
plt.legend()
plt.tight_layout()
  
plt.show()

Output:

STATES Found:  ['Kerala' 'Rajasthan' 'Maharashtra' 'UP East' 'Karnataka' nan
 'Madhya Pradesh' 'Kolkata' 'Bihar' 'Gujarat' 'UP West' 'Orissa'
 'Tamil Nadu' 'Delhi' 'Assam' 'Andhra Pradesh' 'Haryana' 'Punjab'
 'North East' 'Mumbai' 'Chennai' 'Himachal Pradesh' 'Jammu & Kashmir'
 'West Bengal']
OPERATORS Found:  ['IDEA' 'JIO' 'AIRTEL' 'VODAFONE' 'CELLONE']
Comparing data forJIO
Older: Kerala -- Download: 26129.27  Upload: 5193.46
Newer: Kerala -- Download: 18917.46  Upload: 4290.13
Older: Rajasthan -- Download: 27784.86  Upload: 5736.18
Newer: Rajasthan -- Download: 13973.66  Upload: 4721.17
Older: Maharashtra -- Download: 20707.88  Upload: 4130.46
Newer: Maharashtra -- Download: 26285.47  Upload: 5848.77
Older: UP East -- Download: 22451.35  Upload: 5727.95
Newer: UP East -- Download: 24368.81  Upload: 6101.20
Older: Karnataka -- Download: 16950.36  Upload: 4720.68
Newer: Karnataka -- Download: 33521.31  Upload: 5871.38
Older: Madhya Pradesh -- Download: 23594.85  Upload: 4802.89
Newer: Madhya Pradesh -- Download: 16614.49  Upload: 4135.70
Older: Kolkata -- Download: 26747.80  Upload: 5655.55
Newer: Kolkata -- Download: 23761.85  Upload: 5153.29
Older: Bihar -- Download: 31730.54  Upload: 6599.45
Newer: Bihar -- Download: 34196.09  Upload: 5215.58
Older: Gujarat -- Download: 16377.43  Upload: 3642.89
Newer: Gujarat -- Download: 9557.90  Upload: 2684.55
Older: UP West -- Download: 23720.82  Upload: 5280.46
Newer: UP West -- Download: 35035.84  Upload: 5797.93
Older: Orissa -- Download: 31502.05  Upload: 6895.46
Newer: Orissa -- Download: 31826.96  Upload: 6968.59
Older: Tamil Nadu -- Download: 16689.28  Upload: 4107.44
Newer: Tamil Nadu -- Download: 27306.54  Upload: 5537.58
Older: Delhi -- Download: 20308.30  Upload: 4877.40
Newer: Delhi -- Download: 25198.16  Upload: 6228.81
Older: Assam -- Download: 5653.49  Upload: 2864.47
Newer: Assam -- Download: 5243.34  Upload: 2676.69
Older: Andhra Pradesh -- Download: 32444.07  Upload: 5755.95
Newer: Andhra Pradesh -- Download: 19898.16  Upload: 4002.25
Older: Haryana -- Download: 7170.63  Upload: 2680.02
Newer: Haryana -- Download: 8496.27  Upload: 2862.61
Older: Punjab -- Download: 14454.45  Upload: 4981.15
Newer: Punjab -- Download: 17960.28  Upload: 4885.83
Older: North East -- Download: 6702.29  Upload: 2966.84
Newer: North East -- Download: 6008.06  Upload: 3052.87
Older: Mumbai -- Download: 14070.97  Upload: 4118.21
Newer: Mumbai -- Download: 26898.04  Upload: 5539.71
Older: Chennai -- Download: 20054.47  Upload: 4602.35
Newer: Chennai -- Download: 36086.70  Upload: 6675.70
Older: Himachal Pradesh -- Download: 7436.99  Upload: 4020.09
Newer: Himachal Pradesh -- Download: 9277.45  Upload: 4622.25
Older: Jammu & Kashmir -- Download: 8759.20  Upload: 4418.21
Newer: Jammu & Kashmir -- Download: 9290.38  Upload: 4533.08
Older: West Bengal -- Download: 16821.17  Upload: 3628.78
Newer: West Bengal -- Download: 9763.05  Upload: 2627.28

Bar plot of the output

We just learned how to analyze some real world data and draw some interesting observations from it. But note that not all the data will be as nicely formatted and simple to deal with, Pandas makes it incredibly easy to work with such datasets.


Article Tags :