Interquartile Range and Quartile Deviation using NumPy and SciPy

Last Updated : 12 Feb, 2024

In statistical analysis, understanding the spread or variability of a dataset is crucial for gaining insights into its distribution and characteristics. Two common measures used for quantifying this variability are the interquartile range (IQR) and quartile deviation.

Quartiles

Quartiles are a kind of quantile that divides the number of data points into four parts, or quarters.

The first quartile (Q1) , is defined as the middle number between the smallest number and the median of the data set,
The second quartile (Q2) is the median of the given data set.
The third quartile (Q3) is the middle number between the median and the largest value of the data set.

Quartiles

Algorithm to find Quartiles

Here’s a step-by-step algorithm to find quartiles:

Sort the dataset in ascending order.
Calculate the total number of entries in the dataset.
If the number of entries is even:
- Calculate the median (Q2) by taking the average of the two middle values.
- Divide the dataset into two halves: the first half containing the smallest n entries and the second half containing the largest n entries, where n = total number of entries / 2.
- Calculate Q1 as the median of the first half.
- Calculate Q3 as the median of the second half.
If the number of entries is odd:
- Calculate the median (Q2) as the middle value.
- Divide the dataset into two halves: the first half containing the smallest n entries and the second half containing the largest n entries, where n = (total number of entries – 1) / 2.
- Calculate Q1 as the median of the first half.
- Calculate Q3 as the median of the second half.
The calculated values of Q1, Q2, and Q3 represent the first quartile, median (second quartile), and third quartile respectively.

Range:

It is the difference between the largest value and the smallest value in the given data set.

Interquartile Range

The interquartile range (IQR) is indeed defined as the difference between the third quartile (Q3) and the first quartile (Q1). It’s often used as a measure of statistical dispersion, specifically focusing on the middle 50% of the data. This range effectively covers the center of the distribution and contains 50% of the observations, making it useful for understanding the variability within a dataset while being less sensitive to outliers compared to the range.

Mathematically, it’s represented as:

IQR=Q3−Q1

Where , Q3 is the third quartile and Q1 is the first quartile.

Uses of IQR

The interquartile range has a breakdown point of 25% due to which it is often preferred over the total range.
The IQR is used to build box plots, simple graphical representations of a probability distribution.
The IQR can also be used to identify the outliers in the given data set.
The IQR gives the central tendency of the data.

Interpretation of IQR

The data set has a higher value of interquartile range (IQR) has more variability.
The data set having a lower value of interquartile range (IQR) is preferable.

Suppose, if we have two data sets and their interquartile ranges are IR1 and IR2, and if IR1 > IR2 then the data in IR1 is said to have more variability than the data in IR2 and data in IR2 is preferable.

Quartile Deviation

The quartile deviation is a measure of statistical dispersion or spread within a dataset. It’s defined as half of the difference between the third quartile (Q3) and the first quartile (Q1). Mathematically, it’s represented as:

Quartile Deviation= $\frac {Q3-Q1} 2$

Uses of Quartile Deviation

Quartile deviation quantifies spread within a dataset, computed as half the difference between the third and first quartiles.
It provides a robust measure of variability less sensitive to outliers compared to other measures like the range or standard deviation.
Used in descriptive statistics to complement measures of central tendency.
Helps assess skewness and identify potential outliers in distributions.
Facilitates comparison of variability between different datasets or subsets of data.

Interpretation of Quartile Deviation

Quartile deviation represents the average spread or variability within the middle 50% of a dataset.
It indicates how data points are distributed around the median, providing insights into the dispersion of values.
A larger quartile deviation suggests greater variability among the central 50% of data points.
Quartile deviation is less affected by extreme values or outliers compared to other measures of spread, making it robust in skewed distributions.
It aids in comparing the consistency or dispersion of data between different datasets or subsets.

Interquartile Range And Quartile Deviation of One Array using NumPy

We define a sample dataset named data.
We use NumPy’s percentile function to calculate the first quartile (Q1) and third quartile (Q3) of the dataset.
We then calculate the interquartile range (IQR) as the difference between Q3 and Q1.
Finally, we compute the quartile deviation by dividing the IQR by 2.

Python3

import numpy as np
 
# Sample dataset
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
 
# Calculate Interquartile Range (IQR) using numpy
q1 = np.percentile(data, 25)
q3 = np.percentile(data, 75)
iqr = q3 - q1
 
# Calculate Quartile Deviation
quartile_deviation = (q3 - q1) / 2
 
print("Interquartile Range (IQR):", iqr)
print("Quartile Deviation:", quartile_deviation)

Output:

Interquartile Range (IQR): 4.5
Quartile Deviation: 2.25

Interquartile Range And Quartile Deviation of One Array using SciPy

We import NumPy and SciPy libraries.
We define a sample dataset named data.
We use SciPy’s iqr function to directly calculate the interquartile range (IQR) of the dataset.
We then calculate the quartile deviation by dividing the IQR by 2.

Python3

import numpy as np
from scipy.stats import iqr
 
# Sample dataset
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
 
# Calculate Interquartile Range (IQR) using scipy
iqr_value = iqr(data)
 
# Calculate Quartile Deviation
quartile_deviation = iqr_value / 2
 
print("Interquartile Range (IQR):", iqr_value)
print("Quartile Deviation:", quartile_deviation)

Output:

Interquartile Range (IQR): 4.5
Quartile Deviation: 2.25

Suggest improvement

Measures of spread - Range, Variance, and Standard Deviation

Anova Formula

Share your thoughts in the comments

Introduction to Data Analysis

Data Analysis Libraries

Data Visulization Libraries

Exploratory Data Analysis (EDA)

Data Preprocessing

Data Transformation

Time Series Data Analysis

Case Studies and Projects

Introduction to Data Analysis

Data Analysis Libraries

Data Visulization Libraries

Exploratory Data Analysis (EDA)

Data Preprocessing

Data Transformation

Time Series Data Analysis

Case Studies and Projects

Interquartile Range and Quartile Deviation using NumPy and SciPy

Quartiles

Range:

Interquartile Range

Uses of IQR

Interpretation of IQR

Quartile Deviation

Uses of Quartile Deviation

Interpretation of Quartile Deviation

Interquartile Range And Quartile Deviation of One Array using NumPy

Python3

Interquartile Range And Quartile Deviation of One Array using SciPy

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?