Measure of Dispersion
This age is called the age of data, data is generated almost everywhere and all the systems right now are flooded with data. There are lots of techniques available that present to summarize and analyze the data. Mean is one of the important statistics that are used to summarize the center of the data. This measure is not enough to give an idea about the whole data, it might be possible that data is scattered, and the mean is not enough to express that. Thus, some other measures are used which termed measures of dispersion. These measures allow us to measure the scatter in the data. Let’s look at these measures in detail.
Measures of Dispersion
Measures of dispersion measure the scatter of the data, that is how far the values in the distribution are. These measures capture the variation between different values of the data. Intuitively, dispersion is the measure of the extent to which the points of the distribution differ from the average of the distribution. Measures of dispersion can be classified into two categories shown below:
- Absolute Measures of Dispersion
- Relative Measures of Dispersion
Absolute Measures of Dispersion
These measures of dispersion are measured and expressed in the units of data themselves. For example – Meters, Dollars, Kg, etc. Some absolute measures of dispersion are:
- Range: It is defined as the difference between the largest and the smallest value in the distribution.
- Mean Deviation: This is the arithmetic mean of the difference between the values and their mean.
- Standard Deviation: This is the square root of the arithmetic average of the square of the deviations measured from the mean.
Range
The range is the difference between the largest and the smallest values in the distribution. Thus, it can be written as R = L – S where L stands for the largest value in the distribution and S stands for the smallest value in the distribution. Higher the value of range implies higher variation. One drawback of this measure is that it only takes into account the maximum and the minimum value which might not always be the proper indicator of how the values of the distribution are scattered.
For example,
10, 20, 15, 0, 100
The smallest value S in the data = 0, the largest value L in the data = 100
R = 100 – 0 = 100
Note: Range cannot be calculated for the open-ended frequency distributions. Open-ended frequency distributions are those distributions in which either the lower limit of the lowest class or the higher limit of the highest class is not defined.
Range for ungrouped data:
Question 1: Find out the range for the following observations.
20, 24, 31, 17, 45, 39, 51, 61
Solution:
The largest value in the given observations is 61 and the smallest value is 17. The Range is 61 – 17 = 44
Range for grouped data:
Question 2: Find out the range for the following frequency distribution table for the marks scored by class 10 students.
Marks Intervals | Number of Students |
0-10 | 5 |
10-20 | 8 |
20-30 | 15 |
30-40 | 9 |
Solution:
For the largest value – Take higher limit of the highest class = 40
For the smallest value – Take lower limit of the lowest class = 0
Range = 40 – 0
Range = 40
Mean Deviation
Range as a measure of dispersion only depends on the highest and the lowest values in the data. Mean deviation on other hand measures the deviation of the observations from the mean of the distribution. Since the average is the central value of the data, some deviation might be positive and some might be negative. If they are added like that, their sum will not reveal much as they tend to cancel each other’s effect. For example,
Consider the data given below,
-5, 10, 25
The mean of this data = 10
Now deviation from the mean for different values is (-5 -10), (10 – 10), (25 – 10) i.e -15, 0, 15
Now adding the deviations, shows that there is zero deviation from the mean which is incorrect. Thus, to counter this problem only the absolute values of the difference are taken while calculating the mean deviation.
So, Mean Deviation (MD) =
Mean deviation from the mean for Ungrouped data:
For calculating the mean deviation for ungrouped data, the following steps must be followed:
- Calculate the arithmetic mean for all the values of the dataset.
- Calculate the difference between each value of the dataset and the mean. Only absolute values of the differences will be considered. |d|
- Calculate the arithmetic mean of these deviations.
M.D =
Question 1: Calculate the mean deviation for the given ungrouped data:
2, 4, 6, 8, 10
Solution:
Following the steps mentioned above,
Mean =
⇒
M. D =
⇒ M.D =
⇒ M.D =
⇒M.D =
⇒ M.D = 2.4
Mean Deviation from the median for Ungrouped Data:
For calculating the mean deviation for ungrouped data, the following steps must be followed:
- Calculate the median of all the values of the dataset.
- Calculate the difference between each value of the dataset and the median. Only absolute values of the differences will be considered. |d|
- Calculate the arithmetic mean of these deviations.
Question 2: Calculate the mean deviation from the median for the given ungrouped data:
2, 4, 6, 8, 10
Solution:
Following the steps mentioned above,
Median of this is also 6.
M. D =
⇒ M.D =
⇒ M.D =
⇒M.D =
⇒ M.D = 2.4
Mean deviation from mean for continuous frequency distribution:
For calculating the mean deviation for ungrouped data, the following steps must be followed:
- Calculate the arithmetic mean for all the values of the dataset.
- Calculate the difference between the middle value of the class interval and the mean. Only absolute values of the differences will be considered. |d|
- Multiply |d| with their corresponding group frequencies.
- Calculate the arithmetic mean of these deviations.
M.D =
Question 3: Calculate the mean deviation for the given data:
Class Interval | Frequency |
0-10 | 4 |
10-20 | 2 |
20-30 | 4 |
30-40 | 0 |
Solution:
Following the steps mentioned above,
Mean =
⇒
M. D =
⇒ M.D =
⇒ M.D =
⇒M.D =
⇒ M.D = 8
Mean deviation from the median for continuous frequency distribution:
For calculating the mean deviation for ungrouped data, the following steps must be followed:
- Calculate the median for all the values of the dataset.
- Calculate the difference between the middle value of the class interval and median. Only absolute values of the differences will be considered. |d|
- Multiply |d| with their corresponding group frequencies.
- Calculate the arithmetic mean of these deviations.
M.D =
Question 4: Calculate the mean deviation for the given data:
Class Interval | Frequency |
0-10 | 7 |
10-20 | 1 |
20-30 | 3 |
30-40 | 0 |
Solution:
Following the steps mentioned above,
Median lies in the interval (0-10) so, let’s say 5 is the median.
M. D =
⇒ M.D =
⇒ M.D =
⇒M.D =
⇒ M.D = 4
Relative Measures of Dispersion
These measures of deviation are expressed in the form of ratios, percentages. For example – Standard Deviation divided by the mean is an example of a relative measure. These measures are always dimensionless and are also known as the coefficient of dispersion. These measures come in handy while comparing the variation of two datasets that have different units. For example, consider two datasets of weights of students. In one dataset, the weight is measured in Kilograms, and in another one, it is measured in grams. Both will have equivalent variation in the values but since the units are different, absolute measures of dispersion will give a very high value for the dispersion in the dataset with weights in grams. Since absolute measures of dispersion are not appropriate in these cases, the relative measures of dispersion are used.
Lorenz Curve
The Lorenz curve is an important part of economics. It is a representation of the distribution of wealth and income. It was developed by Max.O. Lorenz to represent the inequality of wealth distribution. The figure below shows a typical Lorenz curve. The area enclosed between the straight line and the curved line is called the Gini coefficient. The further away the curved line is from the straight line, the more inequality in the wealth is indicated.
This curve is used in a lot of fields such as ecology, studies of biodiversity, and business modeling.
Gini Coefficient: It is defined as the representation scalar measurement of inequality.
Sample Problems
Question 1: Find out the range for the following observations.
20, 42, 13, 71, 54, 93, 15, 16
Solution:
The largest value in the given observations is 71 and the smallest value is 13. The Range is 71 – 13 = 58
Question 2: Find out the range for the following frequency distribution table for the marks scored by class 10 students.
Marks Intervals | Number of Students |
10-20 | 8 |
20-30 | 25 |
30-40 | 9 |
Solution:
For the largest value – Take higher limit of the highest class = 40
For the smallest value – Take lower limit of the lowest class = 10
Range = 40 – 10
Range = 30
Question 3: Calculate the mean deviation for the given ungrouped data:
-5, -4, 0, 4, 5
Solution:
Following the steps mentioned above,
Mean =
⇒
M. D =
⇒ M.D =
⇒ M.D =
⇒M.D =
⇒ M.D = 3.6
Question 4: Calculate the mean deviation for the given data:
Class Interval | Frequency |
0-10 | 1 |
10-20 | 1 |
20-30 | 8 |
30-40 | 0 |
Solution:
Following the steps mentioned above,
Median lies in the interval (20-30) so, let’s say 25 is the median.
M. D =
⇒ M.D =
⇒ M.D =
⇒M.D =
⇒ M.D = 3
Please Login to comment...