How to plot a dataframe using Pandas?
Pandas is one of the most popular Python packages used in data science. Pandas offer a powerful, and flexible data structure ( Dataframe & Series ) to manipulate, and analyze the data. Visualization is the best way to interpret the data.Â
Python has many popular plotting libraries that make visualization easy. Some of them are Matplotlib, Seaborn, and Python Plotly. It has great integration with Matplotlib. We can plot a Dataframe using the plot() method. But we need a Dataframe to plot. We can create a Dataframe by just passing a dictionary to the DataFrame() method of the Pandas library.Â
Plot a Dataframe using Pandas
Making a different Plot from a Pandas DataFrame is easy. First, we create a simple Pandas DataFrame to make it easier to understand.
- Scatter Plot
- Area Plot
- Bar Plot
- Violin Plot
- Line Plot
- Box Plot
- Histogram Plot
Create a Dataframe
Let’s create a simple Dataframe: In this example, code imports the Pandas and Matplotlib libraries creates a dictionary representing student data, and uses it to create a Pandas DataFrame. The `head()` function displays the first five rows of the DataFrame.
Python3
import pandas as pd
import matplotlib.pyplot as plt
data_dict = { 'name' : [ 'p1' , 'p2' , 'p3' , 'p4' , 'p5' , 'p6' ],
'age' : [ 20 , 20 , 21 , 20 , 21 , 20 ],
'math_marks' : [ 100 , 90 , 91 , 98 , 92 , 95 ],
'physics_marks' : [ 90 , 100 , 91 , 92 , 98 , 95 ],
'chem_marks' : [ 93 , 89 , 99 , 92 , 94 , 92 ]
}
df = pd.DataFrame(data_dict)
df.head()
|
Output:Â
name age math_marks physics_marks chem_marks
0 p1 20 100 90 93
1 p2 20 90 100 89
2 p3 21 91 91 99
3 p4 20 98 92 92
4 p5 21 92 98 94
Create Plots in Pandas Dataframe
There are a number of plots available to interpret the data. Each graph is used for a purpose. there are various way to create plots in pandas dataframe here we are discussing some generally used method for create plots in pandas dataframe those are following.
Plot Dataframe using Pandas Scatter Plot
To get the scatterplot of a dataframe all we have to do is to just call the plot() method by specifying some parameters.
kind=’scatter’,x= ‘some_column’,y=’some_colum’,color=’somecolor’
Example : In this example code creates a scatter plot using a DataFrame ‘df’ with ‘math_marks’ on the x-axis and ‘physics_marks’ on the y-axis, plotted in red. The plot is titled ‘ScatterPlot’ and displayed using Matplotlib.
Python3
df.plot(kind = 'scatter' ,
x = 'math_marks' ,
y = 'physics_marks' ,
color = 'red' )
plt.title( 'ScatterPlot' )
plt.show()
|
Output:Â
There are many ways to customize plots this is the basic one.Â
Plot a Dataframe Pandas using Area Plot
An area plot is a data visualization technique that displays quantitative data over a two-dimensional surface, depicting the magnitude of values and the cumulative total as filled-in areas, providing a visual representation of trends and patterns.
Example :In this example Python code uses the pandas, numpy, and matplotlib libraries to create a sample DataFrame with ‘X’, ‘Y1’, and ‘Y2’ columns, then generates and displays an area plot with ‘X’ on the x-axis and ‘Y1’ and ‘Y2’ on the y-axis, titled ‘Area Plot’.
Python3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = { 'X' : np.arange( 1 , 11 ),
'Y1' : np.random.randint( 1 , 10 , size = ( 10 )),
'Y2' : np.random.randint( 1 , 10 , size = ( 10 ))}
df = pd.DataFrame(data)
df.plot(x = 'X' , kind = 'area' , stacked = False )
plt.title( 'Area Plot' )
plt.xlabel( 'X' )
plt.ylabel( 'Y' )
plt.show()
|
Output :
Area Plot
Plot a Pandas DataFrame using Bar Plot
Similarly, we have to specify some parameters for plot() method to get the bar plot.Â
kind=’bar’,x= ‘some_column’,y=’some_colum’,color=’somecolor’
Example : In this example code creates a bar plot using the ‘physics_marks’ data from the DataFrame ‘df’ with names on the x-axis, green bars, and a title ‘BarPlot’. The plot is displayed using Matplotlib’s `show()` function.
Python3
df.plot(kind = 'bar' ,
x = 'name' ,
y = 'physics_marks' ,
color = 'green' )
plt.title( 'BarPlot' )
plt.show()
|
Output:Â
Plot a Pandas DataFrame using Violin Plot
A violin plot is a data visualization that combines aspects of a box plot and a kernel density plot, providing insights into the distribution, central tendency, and probability density of a dataset.
Example : In this example the code generates and plots a violin plot using Seaborn to visualize the distribution of ‘Values’ in two categories (‘A’ and ‘B’) from a sample DataFrame.
Python3
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = { 'Category' : [ 'A' ] * 100 + [ 'B' ] * 100 ,
'Values' : np.concatenate([np.random.normal( 0 , 1 , 100 ), np.random.normal( 3 , 1 , 100 )])}
df = pd.DataFrame(data)
plt.figure(figsize = ( 8 , 6 ))
sns.violinplot(x = 'Category' , y = 'Values' , data = df)
plt.title( 'Violin Plot' )
plt.xlabel( 'Category' )
plt.ylabel( 'Values' )
plt.show()
|
Output :
Violin Plot
Create Plots in Pandas using Line Plot
The line plot of a single column is not always useful, to get more insights we have to plot multiple columns on the same graph. To do so we have to reuse the axes.Â
kind=’line’,x= ‘some_column’,y=’some_colum’,color=’somecolor’,ax=’someaxes’ Â
Example : In this example the code uses Matplotlib to create a line plot with three lines representing math, physics, and chemistry marks from a DataFrame (‘df’) with student data, all displayed on the same axis (‘ax’), and the plot is titled ‘LinePlots’.
Python3
ax = plt.gca()
df.plot(kind = 'line' ,
x = 'name' ,
y = 'math_marks' ,
color = 'green' , ax = ax)
df.plot(kind = 'line' , x = 'name' ,
y = 'physics_marks' ,
color = 'blue' , ax = ax)
df.plot(kind = 'line' , x = 'name' ,
y = 'chem_marks' ,
color = 'black' , ax = ax)
plt.title( 'LinePlots' )
plt.show()
|
Output:
Create plots in pandas using Box Plot
Box plot is majorly used to identify outliers, we can information like median, maximum, minimum, quartiles and so on. Let’s see how to plot it.
Example : In this example These two lines of code use the Pandas library to create a box plot of a DataFrame (assumed to be named ‘df’) and then display the plot using Matplotlib.
Output:
Plotting with Pandas and Matplotlib Histogram Plot
A histogram plot is a graphical representation of the distribution of a dataset, displaying the frequency of values within specified intervals (bins) along a continuous range. It provides a visual summary of the data’s underlying frequency distribution.
Example : In this example the code uses the pandas library to create a DataFrame with 100 random values from a standard normal distribution, then utilizes matplotlib to plot a histogram with 20 bins, displaying the frequency distribution of the values.
Python3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = { 'Values' : np.random.randn( 100 )}
df = pd.DataFrame(data)
df[ 'Values' ].plot(kind = 'hist' , bins = 20 , edgecolor = 'black' )
plt.title( 'Histogram Plot' )
plt.xlabel( 'Values' )
plt.ylabel( 'Frequency' )
plt.show()
|
Output :
Histogram Plot
Last Updated :
18 Dec, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...