What are some of the important formulae used in statistics?

Last Updated : 22 Feb, 2024

Statistics is a branch of science that is used for data collection, evaluation, and summarising. It summarises the data in a mathematical format. Statistics is used mainly to gain an understanding of the data and focus on various applications. It is used to collect facts and figures about the data set specified using the set of numbers. Mathematical statistics applies mathematical techniques like linear algebra, differential equations, mathematical analysis, and theories of probability.

There are two methods of analyzing data in mathematical statistics that are used on a large scale:

Descriptive Statistics
Inferential Statistics

Some of the important formulae used in statistics

Mean

Also known as the arithmetic mean, it is calculated by computing the average of a given set of numbers. It is the summation of all the given data values divided by the total number of data values given in the set. It is calculated in the following way:

$Mean = \frac{Sum\ of\ Observations}{Total\ number\ of\ observations}$

Mean Formula

Mean of a given data set is specified by the following formula,

$Mean = \frac{Sum\ of\ Observations}{Total\ number\ of\ observations}$

$x̄ = \frac{Σfx}{Σf}$

where,

x̄ = the mean value of the set of given data.
f = frequency of each class
x = mid-interval value of each class

Hence, the average of all the data points is termed as mean.

Median

The median of the given set of numbers is calculated as the middle-most observation. This value is obtained after arranging the data in ascending order. The median of the data is a measure of the central tendency of the data and therefore, is useful for data analysis. Also known as the Place Average, the median is an easy metric to calculate. It is the data placed in the middle of a specified data sequence.

Median Formula

In order to find the median of the data set, the numbers are first arranged in ascending order. The middle value is then calculated from the following.

Odd number of observations

In case the total number of observations contained in the data set is odd, then the median formula is as follows:

$Median = (\frac{n+1}{2})^{th}term$

where n is the number of observations

Even number of observations

In case the total number of observation contained in the data set is even, then the median formula is as follows:

$Median = \frac{[(\frac{n}{2})^{th} term + ((\frac{n}{2})+1)^{th}term]}{2}$

where, n is the number of observations

Mode

In statistical data analysis, the mode of a given data set is the repeatedly occurring value in a given set of values. It corresponds to the value that occurs the maximum number of times. It is the value that has the highest frequency among other sets of numbers.

It is the value that appears the most number of times.

For instance, In the given set of numbers: 8, 9, 10, 10, 5, 10, the mode of the given data set of integers is 10 since it occurs the maximum number of times, that is three times.

Mode formula for ungrouped data

The computation of ungrouped data requires the arrangement of data values either in ascending or descending order. The repeated values are then found and captured along with their frequency. Now, the captured observation with the highest frequency is the modal value for the given data. This is the calculated modal value.

Mode formula for grouped data

$Mode = I_0 + \left(\frac{f_1-f_0}{2f_1-f_0-f_2}\right)h$

In this formula, we have,

I₀ is the lower limit of the modal class
h is the size of the class interval
f₁ is the frequency of the modal class
f₀ is the frequency of the class preceding the modal class
f₂ is the frequency of the class succeeding the modal class

Standard deviation

Standard deviation is a measure of the degree of dispersion of the values forming the data set. It is the measurement of scatter relative to its corresponding value.

It is used in descriptive statistics. It is an indicator of the measure of the variation of the data points from the mean of the data points. The standard deviation of a sample is computed as the square root of its variance.

Standard Deviation Formula

Population standard deviation

$\displaystyle\sigma=\sqrt{\frac{1}{N}\sum^N_{i=1}(X_i-\mu)^2}$

In this formula, we have,

σ = Population standard deviation
N = Number of observations in population
X_i = i^th observation in the population
μ = Population mean

Sample standard deviation

$\displaystyle s=\sqrt{\frac{1}{n-1}\sum^n_{i=1}(x_i-\bar x)^2}$

In this formula, we have,

s = Sample standard deviation
n = Number of observations in sample
x_i = i^th observation in the sample
$\bar x$ = Sample mean

Variance

The variance of data distribution is a measure of how data points differ from the mean. It is an indicator of the measure of how much far a set of the numbers are spread out from their corresponding average value. The variance of the data is considered to be double of standard deviation.

It is used to compute the expected difference of deviation from the actual value. Variance is dependent on the standard deviation of the specified data set of the observations. This implies that if the variance is more, the data values are more spread out from the mean and similarly if the variance is less, the data values are less spread out from the mean. Therefore, it measures the scatter of data from the mean of the dataset.

Variance Formula

Population variance

$\displaystyle\sigma^2=\frac{1}{N}\sum^N_{i=1}(X_i-\mu)^2$

In this formula, we have,

σ = Population standard deviation
N = Number of observations in population
X_i = i^th observation in the population
μ = Population mean

Sample variance

$\displaystyle s^2=\sqrt{\frac{1}{n-1}\sum^n_{i=1}(x_i-\bar x)^2}$

In this formula, we have,

s = Sample standard deviation
n = Number of observations in sample
x_i = i^th observation in the sample
$\bar x$ = Sample mean

Sample Questions

Question 1. Find the mean of the class test marks of 10 students out of 100

99, 95, 87, 55, 72, 86, 92, 89, 75, 88

Solution:

To find the mean of the data first we need to find the sum of all the marks of students

Sum of observations = 99+95+87+55+72+86+92+89+75+88 = 838

Number of observations = 10

Therefore,

$Mean = \frac{Sum\ of\ Observations}{Total\ number\ of\ observations} \\ =\frac{838}{10}$

= 83.8

Therefore,

Mean of the marks of 10 students is 83.8

Question 2. Find the median of the following data

2, 45, 15, 18, 11, 85, 19, 22, 7, 5, 13

Solution:

First arrange this data in ascending order

2, 5, 7, 11, 13, 15, 18, 19, 22, 45, 85

Here as we can see that the number of observation is 11 that is odd.

So apply formula of median when the number of observation is odd.

$Median = (\frac{n+1}{2})^{th}term$

Here n = number of observation that is 11, n = 11

$Median = (\frac{11+1}{2})^{th}term\\ =(\frac{12}{2})^{th}term$

Median = 6^th term

That is 15

Therefore,

The median of the data is 15.

Question 3. Find the mode of marks obtained by students is class test out of 50 for 40 students

Marks obtained	Number of students
10-20	4
20-30	8
30-40	16
40-50	12

Solution:

To find the mode use the formula of mode for grouped data

$Mode = I_0 + \left(\frac{f_1-f_0}{2f_1-f_0-f_2}\right)h$

Here we have,

f₁ = The maximum class frequency = 16

The class interval of f₁ = 30-40

l₀ = Lower limit of the maximum frequency ( modal class ) = 30

h = Size of the class interval = 10

f₀ = Frequency of the preceding class = 8

f₂ = Frequency of the succeeding class = 12

Now put all these values in the mode formula for grouped data

$Mode = I_0 + \left(\frac{f_1-f_0}{2f_1-f_0-f_2}\right)h\\ =30+\frac{16-8}{2\times16-8-12}\times10\\ =30+\frac{8}{12}\times10\\ =30+\frac{80}{12}$

= 30+ 6.66

= 36.66

Therefore,

Mode = 36.66

Question 4. Assume there are 40 students in a class. Randomly 5 students were selected and their heights were measured as 167, 162, 160, 159, 169. Calculate the standard deviation of their heights?

Solution:

Here,

N = 5

Mean $(\bar x)$ = $\frac{167+ 162+ 160+ 159+ 169}{5}\\ =\frac{817}{5}$

$(\bar x)$ = 163.4

Standard Deviation (S.D) = $\sqrt{\frac{\sum(x_1-\bar x)^2}{N-1}}\\ =\sqrt{\frac{(167-163.4)^2+(162-163.4)^2+(160-163.4)^2+(159-163.4)^2+(169-163.4)^2 }{5-1}}\\ =\sqrt{\frac{77.2}{4}}\\ =\sqrt{19.3}\\ =4.393$

Standard Deviation = 4.393