Exploring Data Distribution | Set 2

  • Last Updated : 21 Jan, 2019
Prerequisite: Exploring Data Distribution | Set 1

Terms related to Exploration of Data Distribution

-> Boxplot
-> Frequency Table
-> Histogram 
-> Density Plot

To get the link to csv file used, click here.

Loading Libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Loading Data

data = pd.read_csv("../data/state.csv")
# Adding a new column with derived data 
data['PopulationInMillions'] = data['Population']/1000000
print (data.head(10))

Output :

  • Histogram: It is a way of visualizing data distribution through frequency table with bins on the x-axis and data count on the y-axis.

    Code – Histogram

    # Histogram Population In Millions
    fig, ax2 = plt.subplots()
    ax2 = sns.distplot(data.PopulationInMillions, kde = False)
    ax2.set_ylabel("Frequency", fontsize = 15)
    ax2.set_xlabel("Population by State in Millions", fontsize = 15)
    ax2.set_title("Population - Histogram", fontsize = 20)

    Output :

  • Density Plot: It is related to histogram as it shows data-values being distributed as continuous line. It is a smoothed histogram version. The output below is the density plor superposed over histogram.

    Code – Density Plot for the data

    # Density Plot - Population
    fig, ax3 = plt.subplots()
    ax3 = sns.distplot(data.Population, kde = True)
    ax3.set_ylabel("Density", fontsize = 15)
    ax3.set_xlabel("Murder Rate per Million", fontsize = 15)
    ax3.set_title("Desnsity Plot - Population", fontsize = 20)

    Output :

