Open In App

Interpretations of Histogram

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

Histograms helps visualizing and comprehending the data distribution. The article aims to provide comprehensive overview of histogram and its interpretation.

What is Histogram?

Histograms are graphical representations of data distributions. They consist of bars, each representing the frequency or count of observations falling within specific intervals, known as bins. We can also say a histogram is a variation of a bar chart in which data values are grouped together and put into different classes. This grouping enables you to see how frequently data in each class occur in the dataset. 

The histogram graphically shows the following:

  • Frequency of different data points in the dataset.
  • Location of the center of data.
  • The spread of dataset.
  • Skewness/variance of dataset.
  • Presence of outliers in the dataset.

The features provide a strong indication of the proper distributional model in the data. The probability plot or a goodness-of-fit test can be used to verify the distributional model.

The histogram contains the following axes:

  • Vertical Axis: Frequency/count of each bin.
  • Horizontal Axis: List of bins/categories.

How histogram works?

The histogram works by organizing and visualizing the distribution of data into intervals or bins along a continuous scale.

  • The range of data values is divided into intervals called “bins.” The number of bins and their widths can be predefined or determined algorithmically based on the range and distribution of the data.
  • Each data point in the dataset is assigned to a corresponding bin based on its value. As data points are assigned to bins, the frequency or count of data points falling within each bin is calculated.
  • The histogram is constructed by plotting the bins along the x-axis and the frequencies (or densities) along the y-axis. Each bin is represented by a bar, and the height of the bar corresponds to the frequency of data points in that bin.

By examining the histogram, you can gain insights into the distribution of the data. You can identify patterns, trends, central tendencies, variability, outliers, and other characteristics of the dataset. For example, a symmetric bell-shaped histogram suggests a normal distribution, while skewed histograms indicate asymmetry in the data.

Suppose you’re analyzing the distribution of scores on a standardized test. You have data for 2000 students, and you want to visualize how many students scored within different score ranges. For this you can create a histogram using the following data.

Score Range

Frequency

0-25

150

26-50

300

51-75

600

76-100

750

101-125

150

126-150

50

download-(8)

Histogram

The histogram show that the data is normally distributed, and the students have mostly score between 76-100. This histogram displays the frequency of students falling within different score ranges on the standardized test. Each bar represents a score range, and the height of the bar represents the frequency of students in that range. By customizing the x-axis intervals and the labels, you can effectively visualize the distribution of test scores. Additionally, you can further customize the histogram by changing the y-axis to display percentages or density if needed.

Histogram and its Interpretation

Normal Histogram

Normal histogram is a classical bell-shaped histogram with most of the frequency counts focused on the middle with diminishing tails and there is symmetry with respect to the median. Since the normal distribution is most commonly observed in real-world scenarios, you are most likely to find these. In Normally distributed histogram mean is almost equal to median.

Note: In the implementation. we will be using NumPy, Matplotlib and Seaborn plotting libraries. These libraries are pre-installed in colab, however for local environment, you can install these easily with pip install command.

Python3




import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
 
# Normal histogram plot
data = np.random.normal(10.0, 3, 500)
sns.displot(data, kde= True, bins=10, color='black')


Output:

download-(1)

Normal Distribution Graph

We have plotted a normal distribution graph.

  • The peak of the curve represents the mean of the dataset.
  • The normal distribution graph is symmetric.

Non-normal Short-tailed/ long-tailed histogram

In short-tailed distribution tail approaches 0 very fast, as we move from the median of data, In the long-tailed histogram, the tail approaches 0 slowly as we move far from the median. Here, we refer tail as the extreme regions in the histogram where most of the data is not concentrated and this is on both sides of the peak.

Bimodal Histogram

A mode of data represents the most common values in the histogram (i.e. peak of the histogram. A bimodal histogram represents that there are two peaks in the histogram. The histogram can be used to test the unimodality of data. The bimodality (or for instance non-unimodality) in the dataset represents that there is something wrong with the process. Bimodal histogram many one or both of two characters: Bimodal normal distribution and symmetric distribution.

Python3




# Bi-modal histogram
N=400
mu_1, sigma_1 = 80, 10
mu_2, sigma_2 = 20, 10
# Generate two normal distributions of given mean sdand concatenate
X_1 = np.random.normal(mu_1, sigma_1, N)
X_2 = np.random.normal(mu_2, sigma_2, N)
X = np.concatenate([X_1, X_2])
sns.displot(X,bins=10,kde=True , color='green')


Output:

download-(2)

Skewed Left/Right Histogram

Skewed histogram is those where the one-side tail is quite clearly longer than the other-side tail. A right-skewed histogram means that the right-sided tail of the peak is more stretched than its left and vice-versa for the left-sided. In a left-skewed histogram, the mean is always lesser than the median, while in a right-skewed histogram mean is greater than the histogram.

Right-skewed Histogram

Python3




# Right-skewed Histogram
rdata = [0] * 19 + [1]*49 + [2]*60 + [3] * \
    47 + [4]*32 + [5] * 18 + [6]*3 + [7]*3 + [8]
sns.displot(rdata, bins=8, kde=True, alpha=0.6, color='blue')


Output:

download-(6)

Right Skewed Histogram

Left-skewed Histogram

Python3




# Left-skewed Histogram
ldata = [0]* 19 + [-1]*49 + [-2]*60 + [-3] *47 + [-4]*32 + [-5]* 18+ [-6]*3 + [-7]*3 + [-8]
sns.displot(ldata, kde = True,bins=8,  alpha=0.6, color='red')


Output:

download-(7)

Left Skewed Histogram

Uniform Histogram

In uniform histogram, each bin contains approximately the same number of counts (frequency). The example of uniform histogram is such as a die is rolled n (n>>30) number of times and record the frequency of different outcomes.

Python3




# Generate random data following a uniform distribution
data = np.random.uniform(low=0, high=1, size=600)
sns.histplot(data, kde =True, bins =10 )
plt.show()


Output:

download-(4)

Uniform Distribution

Normal Distribution with an Outlier

This histogram is similar to normal histogram except it contains an outlier where the count/ probability of outcome is substantive. This is mostly due to some system errors in process, which led to faulty generation of products etc.

Python3




# Normal distribution with an outlier
mu, sigma = 80, 10
X_1 = np.random.normal(mu, sigma, N)
X_1 =np.concatenate([X_1, [200]*30])
sns.displot(X_1, kde= True, bins=13)


Output:

download-(3)

Normal Distribution with an Outlier

Histogram – Frequently Asked Questions (FAQs)

Q. How do you interpret the mode of a histogram?

The mode of histogram represents the value with the highest frequency, indicating the most occurrence within the data distribution.

Q. How do you interpret the shape of a histogram?

The shape of a histogram provides insights into the distribution of the data. A symmetrical shape suggests a normal distribution, while skewness indicates a tendency towards higher or lower values. Bimodal or multimodal shapes suggest multiple distinct peaks within the data.

Q. How do you interpret the spread of a histogram?

The spread of a histogram indicates the variability or dispersion of the data. A wider spread suggests greater variability, while a narrower spread indicates less variability among the values.

Q. How do you interpret a density histogram?

The density of a histogram reflects the concentration of data within specific intervals. Higher density indicates more data points within those intervals, while lower density suggests fewer data points.

Q. How do you interpret outliers in a histogram?

Outliers in a histogram represent data points that significantly deviate from the rest of the distribution. They may indicate rare occurrences, errors, or important insights about the data’s behavior.



Last Updated : 13 Feb, 2024
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads