Whenever we work in data science and machine learning, our approach of handling the data and finding something useful out of it is based on the distribution of the data.
Distribution means that how data can be present in different possible ways, the percentage of specific data, identifying the outliers. So, data distribution is the way of using graphical methods to organize and display useful information.
Terms related to Exploration of Data Distribution
-> Boxplot -> Frequency Table -> Histogram -> Density Plot
Boxplot : It is based on the percentiles of the data as shown in the figure below. The top and bottom of the boxplot are 75th and 25th percentile of the data. The extended lines are known as whiskers that includes the range of rest of the data.
To get the link to
csvfile used, click here.
Code #1 : Loading Libraries
numpy as np
pandas as pd
seaborn as sns
matplotlib.pyplot as pltchevron_right
Code #2: Loading Data
# Adding a new column with derived data
Code #3 : BoxPlot
# BoxPlot Population In Millions
"Population by State in Millions"
"Population - BoxPlot"
Frequency Table : It is a tool to distribute the data into equally spaced ranges, segments and tells us how many values fall in each segment.
Code #1: Adding a column to perform crosstab and groupby functionality.
# Perform the binning action, the bins have been
# chosen to accentuate the output for the Frequency Table
Code #2: Cross Tab – a type of Frequency Table
# Cross Tab - a type of Frequency Table
pd.crosstab(data.PopulationInMillionsBins, data.Abbreviation, margins
Code #3: GroupBy – a type of Frequency Table
# Groupby - a type of Frequency Table
- Exploring Data Distribution | Set 2
- Exploring Categorical Data
- Source distribution and built distribution in python
- Exploring Correlation in Python
- Seaborn | Distribution Plots
- NLP | Storing Frequency Distribution in Redis
- Distribution of candies according to ages of students
- Inverse Gamma Distribution in Python
- NLP | Storing Conditional Frequency Distribution in Redis
- Difference between a Data Analyst and a Data Scientist
- Processing of Raw Data to Tidy Data in R
- Data Integration in Data Mining
- Data with Hadoop
- Youtube Data API | Set-1
- Youtube Data API | Set-2
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.