GRE Data Analysis | Numerical Methods for Describing Data

Data can be described numerically by various statistics, or statistical measures. These statistical measures are often grouped in 3 categories:

1. Measures of central tendency
2. Measures of position
3. Measures of dispersion 

Measures Of Central Tendency:

In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution.
Measures of central tendency indicate the “center” of the data along the number line and are usually reported as values that represent the data. There are three common measures of central tendency:

  • Arithmetic mean, usually called the average or simply the mean
  • Median
  • Mode

1. Arithmetic Mean:
It is the well known measure of central tendency. Mean is average of a given set of data.To calculate the mean of n numbers, take the sum of the n numbers and divide it by n.



Mean for Ungrouped data can be defined as,

\overline{X} =  \frac{X_1 + X_2 + X_3 + ..... + X_n}{n}

Mean for Grouped data:

\overline{X} =  \frac{\Sigma fx}{n}  

Where,
f is frequency in each class,
x is midpoint in each class,
n is total number of scores

Example:
There are 16 numbers in a list find there mean (average)

 2, 4, 4, 5, 7, 7, 7, 7, 7, 7, 8, 8, 9, 9, 9, 9

Explanation:
There are 6 different values hence can be considered as weighted mean because several values are repeated in a list. Hence 2 occur 1 times, 4 occur 2 times, 5 occur 1 time, 7 occur 6 times, 8 occur 2 times, 9 occur 4 times

Mean(\overline{X}) = \frac{2*1 + 4*2 + 5*1 + 7*6 + 8*2 + 9*4}{16}  = \frac{109}{16} 

\overline{X} = 6.8125  

2. Median:
The mean can be affected by just a few values that lie far above or below the rest of the data, because these values contribute directly to the sum of the data and therefore to the mean. By contrast, the median is a measure of central tendency that is fairly unaffected by unusually high or low values relative to the rest of the data.

Median is the middle value of a set of data. To calculate the median of n numbers, first order the numbers from least to greatest.

  • If n is odd then median is the middle number
  • If n is even then median is the average of two middle values

Median for grouped data:

Median =  L + \frac{(n/2) - B}{G} * w 

Where,
L is the lower class boundary of the group containing the median,
n is the total number of data,
B is the cumulative frequency of the groups before the median group,
G is the frequency of the median group,
w is the group width

Example:
Consider 6 numbers find there mean and median replace 8 with 38 then again find mean and median


 4, 4, 5, 7, 8, 8

Explanation:
Here n is even, so

Median = avg (value at (n/2) + value at (n/2)+1)
Median = \frac{5 + 7}{2} = 6

And,
Mean = \frac{4 + 4 + 5 + 7 + 8 + 8}{6} = 6 

Now replacing 8 by 38, median will remain the same i.e., 6, but mean get affected.

Mean = \frac{4 + 4 + 5 + 7 + 8 + 38}{6} = 11 

3. Mode:
Mode is the value which occurs most frequently in a set of observations. The mode of the 6 numbers in the list 1, 3, 6, 4, 3, 5 is 3 because frequency of 3 is greater than all other elements.

Example:
Find mode of each part

(a) 1, 2, 4, 7
(b) 1, 1, 2, 2, 3, 4 

Exaplanation:

(a) There is no mode (mode = none)
(b) There are 2 modes in this case 1, 2 (mode = 1, 2) 

Measures Of Position:

There are three most basic positions or locations in a list of numerical data ordered from least to greatest

  • The beginning, or the least value L
  • The end, or the greatest value G
  • The middle, or median M

Apart from these most common measures of positions are

  • Quartiles
  • Percentiles

(a). Quartiles:
A quartile is a statistical term describing a division of observations into four defined intervals. Quartiles divide the data into four equal groups after the data have been ordered from the least value L to the greatest value G. There are three quartile numbers, called the first quartile, the second quartile, and the third quartile, that divide the data into four roughly equal groups.





The numbers  Q_1, Q_2, and Q_3 divide the data into 4 roughly equal groups as follows. After the data are listed in increasing order, the first group consists of the data from L to Q_1, the second group is from Q_1 to Q_2, the third group is from Q_2 to Q_3, and the fourth group is from Q_3 to G.


There are various rules to determine the exact values of Q_1, Q_2 and Q_3. Basically Q_2 is median. For Q_1 and Q_3 arrange the data into increasing order:

  • Q_1 is the median of the first half of the data in the ordered list,
  • Q_3 is the median of the second half of the data in the ordered list,

Example:
Find the quartiles for the list of 16 numbers,

2, 4, 4, 5, 7, 7, 7, 7, 7, 7, 8, 8, 9, 9, 9, 9

Explanation:

Median(Q_2) = \frac{7 + 7}{2}
Q_2 = 7 

For Q_1 and Q_3 divide the data into two smaller group. First group contain 2, 4, 4, 5, 7, 7, 7, 7 and second group contain 7, 7, 8, 8, 9, 9, 9, 9 now,

Q_1 = 6 (average of 6 and 7)
Q_3 = 8.5 (average of 8 and 9) 

In this example, we can say that 4 is in first quartile (or first group), 8 is in third quartile (third group) and 9 is in fourth quartile. The phrase “in a quartile” refers to being in one of the four groups determined by Q_1, Q_2, and Q_3.

(b). Percentiles:
Percentiles are mostly used for very large lists of numerical data ordered from least to greatest.Instead of four groups it divide the data into 100 equal groups. The 99 percentiles P_1, P_2, P_3, P_4 ...... P_9_9 divide the data into 100 equal groups. Here,

P_2_5 = Q_1
P_5_0 = Q_2
P_7_5 = Q_3 

Percentile in competitive examination is calculated as,

Percentile = (number of people behind you / total number of people) x 100

Measures Of Dispersion:

Measures of dispersion indicate the degree of spread of the data. The most common statistics used as measures of dispersion are,

  • The range
  • The interquartile range
  • The standard deviation

1. Range:
Range reflects the maximum spread of the data. The range of the numbers in a group of data is the difference between the greatest number G in the data and the least number L in the data; that is,

Range(R) = G-L

Sometimes a data value is unusually small or unusually large in comparison with the rest of the data. Such data are called Outliers. An outlier is a data point that differs significantly from other observations . Outliers lie so far out from the rest of the data. Range is affected by outliers


Example:
Five numbers are given find range,

11, 10, 5, 13, 21

Explanation:

Greatest number (G) = 21
Least number (L) = 5
Range (R) = 21-5 = 16 

2. Interquartile Range:
Interquartile range is defined as the difference between the third quartile and the first quartile. That is, Q_3 - Q_1. It measures the spread of the middle half of the data and not affected by outliers.

3. Standard Deviation:
Standard deviation is a measure of spread. It is a measure of how spread out numbers are. symbol is  \sigma. The more the data are spread away from the mean, the greater the standard deviation; and the more the data are clustered around the mean, the lesser the standard deviation.

The standard deviation of a group of numerical data can be computed as,

  1. Calculate the mean of the values,
  2. Find the difference between the mean and each of the values,
  3. Square each of the differences,
  4. Find the average of the squared differences,
  5. Take the non negative square root of the average of the squared differences,

Refer for Mean, Variance and Standard Deviation



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.