There are a number of different libraries in Python that can be used to create visualizations of superhero characters. Some popular libraries include Matplotlib, Seaborn, and Plotly.
In this article, we use Matplotlib to generate visualizations and get insights from the Superheroes Dataset.
Matplotlib is a plotting library for Python that provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK. It has a wide range of capabilities and can create a variety of different types of plots, including line plots, scatter plots, bar plots, pie plots, and more.
CSV (Comma Separated Values) is a file format that stores data in a tabular form, i.e., in the form of rows and columns where each column is separated by a comma.
For generating better conclusions and plotting visualizations from the dataset, first, the data should be reliable and clean. Pre-processing of data is the major step to be performed for any dataset to get insights from it. It means we need to check whether all the values are present in the dataset or not. Find any missing values and fill in or remove them completely if needed.
So, Let’s import the required libraries and clean our dataset. Later, we can perform some visualizations accordingly.
Step 1: Importing required libraries.
Python3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
|
Step 2: Cleaning the dataset and find any missing values.
You can download the dataset from here.
Python3
df = pd.read_csv( "C:/Users/admin/Downloads/superheroes_stats.csv" )
df.head( 10 )
|
Output:
We can observe columns 7 and 8 have missing values (NaN). So, they need to be removed.
Superheroes Dataset
Let’s list out how many missing values the dataset contains through the below code.
Python3
columns = list (df)
for column in columns:
print ( "No. of missing values in" , column,
"attribute:" , df[column].isnull(). sum ())
df = df.dropna(axis = 0 )
|
Output:
From the above python code, we found the dataset contains null values for the entire columns of some specific rows. So, such rows are dropped entirely with dropna( ) method for our effective use of dataset.
Missing Values in each column of the dataset
Step 3: Getting insights from the Superheroes dataset.
Data Insight 1:
Let’s find the nature (good, bad and neutral) of superheroes with the help of the Alignment column from the dataset.
Python3
cnt = df[ 'Alignment' ].value_counts()
print (cnt)
|
Output:
Nature of Superhero characters count
Plotting pie-plot to know the percentage of superheroes with good, bad and neutral natures.
Python3
label = [ 'good' , 'bad' , 'neutral' ]
plt.pie(cnt, labels = label, autopct = '%.2f%%' )
plt.show()
|
Output:
percentage of good, bad & neutral nature of superheroes
Data Insight 2:
Let’s find the top 10 superheroes who are good-natured.
Python3
good = df[df[ 'Alignment' ] = = "good" ]
Top_ten = good.sort_values(by = [ 'Total' ], ascending = False ).head( 10 )
x = Top_ten[ 'Name' ]
y = Top_ten[ 'Total' ]
plt.figure(figsize = ( 10 , 5 ))
y_ticks = np.arange( 0 , y. max () + 50 , 50 )
plt.xticks(rotation = 80 , fontsize = 12 )
plt.yticks(y_ticks)
plt.title( "Top 10 good super-heroes" , fontsize = 22 )
plt.bar(x, y, color = "g" )
plt.show()
|
Output:
From the output, we can see that the overall top superheroes are Martian Manhunter, Superman, Stardust, Thor, Supergirl, Nova, Goku, Jean Grey, Phoenix and Iron Man.
Top 10 Superheroes
Data Insight 3:
Now, let’s find all the good superheroes having the Highest Strength and Intelligence.
Python3
Max_strength_Intelligence = good.sort_values(
by = [ 'Strength' , 'Intelligence' ], ascending = False )
Max_strength_Intelligence
|
Output:
Filtered Dataset with high Strength & Intelligence Superheroes
Python3
X = Max_strength_Intelligence[ 'Name' ][ 0 : 5 ]
Intelligence = Max_strength_Intelligence[ 'Intelligence' ][ 0 : 5 ]
Strength = Max_strength_Intelligence[ 'Strength' ][ 0 : 5 ]
X_axis = np.arange( len (X))
plt.figure(figsize = ( 10 , 5 ))
plt.bar(X_axis - 0.2 , Intelligence, 0.4 , label = 'Intelligence' )
plt.bar(X_axis + 0.2 , Strength, 0.4 , label = 'Strength' )
plt.xticks(X_axis, X)
plt.xlabel( "Super-heroes" , fontsize = 18 )
plt.ylabel( "Strength and Intelligence" , fontsize = 18 )
plt.title( "Good Superheroes with highest Strength and Intelligence" , fontsize = 18 )
plt.legend()
plt.show()
|
Output:
From this output, we can conclude that Captain Marvel, Martian Manhunter, Superman, Beyonder and Hulk have high Strength and Intelligence compared to other characters.
Comparing both the highest Strengths & Intelligence of Good Superheroes
Data Insight 4:
Let’s find the Top 5 Highest Power Superheroes along with the highest Speeds.
Python3
Max_Power_Speed = good.sort_values(by = [ 'Power' , 'Speed' ], ascending = False )
Max_Power_Speed
|
Output:
Python3
X = Max_Power_Speed[ 'Name' ][ 0 : 5 ]
Speed = Max_Power_Speed[ 'Speed' ][ 0 : 5 ]
Power = Max_Power_Speed[ 'Power' ][ 0 : 5 ]
X_axis = np.arange( len (X))
plt.figure(figsize = ( 9 , 5 ))
plt.bar(X_axis - 0.2 , Speed, 0.4 , label = 'Speed' , color = 'y' )
plt.bar(X_axis + 0.2 , Power, 0.4 , label = 'Power' , color = 'g' )
plt.xticks(X_axis, X)
plt.xlabel( "Super-heroes" , fontsize = 18 )
plt.ylabel( "Speed and Power" , fontsize = 18 )
plt.title( "Good Superheroes with highest Speed and Power" , fontsize = 18 )
plt.legend(bbox_to_anchor = ( 1.05 , 1.0 ), loc = 'upper left' )
plt.show()
|
Output:
Bar plot shows Superheroes with the highest Speeds & Powers
Data Insight 5:
Plotting Histogram to know the distribution of Speeds of Good Super-heroes from the dataset:
Python3
plt.figure(figsize = ( 12 , 6 ))
X = good[ 'Speed' ]
plt.xticks(np.arange( 0 , len (X), 5 ))
plt.hist(X)
plt.title( "Distribution of Speed" , fontsize = 20 )
plt.xlabel( "Speed" , fontsize = 18 )
plt.ylabel( "Number of Super-heroes" , fontsize = 18 )
plt.show()
|
Output:
From the Distribution of the Speed histogram, we observe that there are 20 good superheroes with highest speed between 90-100 and there are 80 good superheroes with 25-35 speed range.
Histogram showing the Distribution of Speed
Data Insight 6:
Plotting Line chart to know the superheroes with Total Superpower
The ‘Total’ column value in the dataset includes the sum of the superhero’s Intelligence, Strength, Speed, Durability, Power and Combat values.
Python3
plt.figure(figsize = ( 12 , 6 ))
Top_ten_total = df.sort_values(by = 'Total' , ascending = False ).head( 10 )
X = Top_ten_total[ 'Name' ]
Y = Top_ten_total[ 'Total' ]
plt.xticks(rotation = 80 )
plt.plot(X, Y, 'o-' , color = 'g' )
plt.ylabel( "Total Superpower" , fontsize = 18 )
plt.xlabel( "Superheroes" , fontsize = 18 )
plt.title( "Line chart with Total Strength of Superheroes" , fontsize = 20 )
plt.show()
|
Output:
Line chart of top-ten superheroes with Total power
In this way, we can generate many such visualizations, customize them and gather insights from the data.
Data Insight – 7 :
Plotting bar charts of only Good super heroes with highest strength and durability
We all know that to defeat enemy and win fights easily having durability is as much important as having sheer strength. So in this plot we will check which good natured super heroes have the highest strength and durability.
Python3
good = df[df[ 'Alignment' ] = = "good" ]
Max_strength_durability = good.sort_values(
by = [ 'Strength' , 'Durability' ], ascending = False )
Max_strength_durability
|
Python3
X = Max_strength_durability[ 'Name' ][ 0 : 5 ]
Durability = Max_strength_durability[ 'Durability' ][ 0 : 5 ]
Strength = Max_strength_durability[ 'Strength' ][ 0 : 5 ]
X_axis = np.arange( len (X))
plt.figure(figsize = ( 10 , 5 ))
plt.bar(X_axis - 0.2 , Durability, 0.4 , label = 'Durability' )
plt.bar(X_axis + 0.2 , Strength, 0.4 , label = 'Strength' )
plt.xticks(X_axis, X)
plt.xlabel( "Super-heroes" , fontsize = 18 )
plt.ylabel( "Strength and Durability" , fontsize = 18 )
plt.title( "Good Superheroes with highest Durability and Strength" , fontsize = 18 )
plt.legend()
plt.show()
|
Output –
Share your thoughts in the comments
Please Login to comment...