Statistics Cheat Sheet

Last Updated : 21 Feb, 2024

Statistics is like a toolkit we use to understand and make sense of information. It helps us collect, organize, analyze, and interpret data to find patterns, trends, and relationships in the world around us.

In this Statistics cheat sheet, you will find simplified complex statistical concepts, with clear explanations, practical examples, and essential formulas. This cheat sheet will make things easy when getting ready for an interview or just starting with data science. It explains stuff like mean, median, and hypothesis testing with examples, so you’ll get it in no time. With this cheat sheet, you’ll feel more sure about your stats skills and do great in interviews and real-life data jobs!

What is Statistics?

Statistics is the branch of mathematics that deals with collecting, analyzing, interpreting, presenting, and organizing data. It involves the study of methods for gathering, summarizing, and interpreting data to make informed decisions and draw meaningful conclusions.

Statistics is widely used in various fields such as science, economics, social sciences, business, and engineering to provide insights, make predictions, and guide decision-making processes. Statistics is like a tool that helps us see patterns, trends, and relationships in the world around us. Whether it’s counting how many people like pizza or figuring out the average score on a test, statistics helps us make decisions based on data. It is used in lots of different areas, like science, business, and even sports, to help us learn more about the world and make better choices.

Types of Statistics

There are commonly two types of statistics, which are discussed below:

Descriptive Statistics: Descriptive Statistics helps us simplify and organize big chunks of data. This makes large amounts of data easier to understand.
Inferential Statistics: Inferential Statistics is a little different. It uses smaller data to conclude a larger group. It helps us predict and draw conclusions about a population.

Table of Content

Basics of Statistics
Measure of Central Tendency
Measure of Dispersion
Measure of Shape
Measure of Relationship
Probability Theory
Probability Distributions Functions
Parameter estimation for Statistical Inference
Hypothesis Testing
Statistical Tests:
Non-Parametric Test
A/B Testing or Split Testing
Regression
Statistics Cheat Sheet – FAQs

Basics of Statistics

Basic formulas of statistics are,

Parameters	Definition	Formulas
Population Mean, (μ)	Entire group for which information is required.	∑x/N
Sample Mean	Subset of population as entire population is too large to handle.	∑x/n
Sample/Population Standard Deviation	Standard Deviation is a measure that shows how much variation from the mean exists.	[Tex]\\\sqrt\frac{{\sum(x-\overline{x})^2}}{n-1}[/Tex]
Sample/Population Variane	Variance is the measure of spread of data along its central values.	[Tex]Variance(Population)~=~\frac{{\sum(x-\overline{x})^2}}{n}\\ Variance(Sample)~=~\frac{{\sum(x-\overline{x})^2}}{n-1}[/Tex]
Class Interval(CI)	Class interval refers to the range of values assigned to a group of data points.	Class Interval = Upper Limit – Lower Limit
Frequency(f)	Number of time any particular value appears in a data set is called frequency of that value.	f is number of times any value comes in a article
Range, (R)	Range is the difference between the largest and smallest values of the data set	Range = (Largest Data Value – Smallest Data Value)

What is Data in Statistics?

Data is a collection of observations, it can be in the form of numbers, words, measurements, or statements.

Types of Data

Qualitative Data: This data is descriptive. For example – She is beautiful, He is tall, etc.
Quantitative Data: This is numerical information. For example- A horse has four legs.

Types of Quantitative Data

Discrete Data: It has a particular fixed value and can be counted.
Continuous Data: It is not fixed but has a range of data and can be measured.

Measure of Central Tendency

Mean: The mean can be calculated by summing all values present in the sample divided by total number of values present in the sample or population.
Formula:
[Tex]Mean (\mu) = \frac{Sum \, of \, Values}{Number \, of \, Values} [/Tex].
Median: The median is the middle of a dataset when arranged from lowest to highest or highest to lowest in order to find the median, the data must be sorted. For an odd number of data points the median is the middle value and for an even number of data points median is the average of the two middle values.
- For odd number of data points:
  [Tex]Median = (\frac{n+1}{2})^{th} [/Tex]
- For even number of data points:
  [Tex]Median = Average \, of \, (\frac{n}{2})^{th} value \, and \, its \, next \, value [/Tex]

Mode: The most frequently occurring value in the Sample or Population is called as Mode.

Measure of Dispersion

Range: Range is the difference between the maximum and minimum values of the Sample.
Variance (σ²): Variance is a measure of how spread-out values from the mean by measuring the dispersion around the Mean.
Formula:
[Tex]\sigma^2~=~\frac{\Sigma(X-\mu)^2}{n} [/Tex].
Standard Deviation (σ): Standard Deviation is the square root of variance. The measuring unit of S.D. is same as the Sample values’ unit. It indicates the average distance of data points from the mean and is widely used due to its intuitive interpretation.
Formula:
[Tex]\sigma=\sqrt(\sigma^2)=\sqrt(\frac{\Sigma(X-\mu)^2}{n}) [/Tex]
Interquartile Range (IQR): The range between the first quartile (Q1) and the third quartile (Q3). It is less sensitive to extreme values than the range.
Formula:
[Tex]IQR = Q_3 -Q_1 [/Tex]
To compute IQR, calculate the values of the first and third quartile by arranging the data in ascending order. Then, calculate the mean of each half of the dataset.
Quartiles: Quartiles divides the dataset into four equal parts:
- Q1 is the median of the lower 25%
- Q2 is the median (50%)
- Q3 is the median of the upper 25% of the dataset.
Mean Absolute Deviation: The average of the absolute differences between each data point and the mean. It provides a measure of the average deviation from the mean.
Formula:
[Tex]Mean \, Absolute \, Deviation = \frac{\sum_{i=1}^{n}{|X – \mu|}}{n} [/Tex]
Coefficient of Variation (CV):
CV is the ratio of the standard deviation to the mean, expressed as a percentage. It is useful for comparing the relative variability of different datasets.
[Tex]CV = (\frac{\sigma}{\mu}) * 100[/Tex]

Measure of Shape

Kurtosis

Kurtosis quantifies the degree to which a probability distribution deviates from the normal distribution. It assesses the “tailedness” of the distribution, indicating whether it has heavier or lighter tails than a normal distribution. High kurtosis implies more extreme values in the distribution, while low kurtosis indicates a flatter distribution.

Types of Kurtosis

Mesokurtic:
- A mesokurtic distribution has kurtosis equal to 3. This is considered the baseline or normal level of kurtosis. The distribution has tails and a peak similar to the normal distribution (bell curve).
Leptokurtic:
- A leptokurtic distribution has kurtosis greater than 3. This indicates that the distribution has fatter tails and a sharper peak compared to the normal distribution. It implies that the data has more extreme values or outliers.
Platykurtic:
- A platykurtic distribution has kurtosis less than 3. In this case, the distribution has thinner tails and a flatter peak compared to the normal distribution. It suggests that the data has fewer extreme values and is more dispersed.

Skewness

Skewness is the measure of asymmetry of probability distribution about its mean.

Right Skew:

Also known as positive skewness.
Characteristics:
- Longer or fatter tail on the right-hand side (upper tail).
- More extreme values on the right side.
- Mean > Median.
Indicates a distribution that is skewed towards the left.

Left Skew:

Also known as negative skewness.
Characteristics:
- Longer or fatter tail on the left-hand side (lower tail).
- More extreme values on the left side.
- Mean < Median.
Indicates a distribution that is skewed towards the right.

Zero Skew:

Also known as symmetrical distribution.
Characteristics:
- Symmetric distribution.
- Left and right sides are mirror images of each other.
- Mean = Median.
Indicates a distribution with no skewness.

Types of Skewed data

Measure of Relationship

Covariance: Covariance measures the degree to which two variables change together.
[Tex]Cov(x,y) = \frac{\sum(X_i-\overline{X})(Y_i – \overline{Y})}{n} [/Tex]
Correlation: Correlation measures the strength and direction of the linear relationship between two variables. It is represented by correlation coefficient which ranges from -1 to 1. A positive correlation indicates a direct relationship, while a negative correlation implies an inverse relationship. Pearson’s correlation coefficient is given by:
[Tex]\rho(X, Y) = \frac{cov(X,Y)}{\sigma_X \sigma_Y} [/Tex]

Probability Theory

Here are some basic concepts or terminologies used in probability:

Term	Definition
Sample Space	The set of all possible outcomes in a probability experiment. For instance, in a coin toss, it’s “head” and “tail”.
Sample Point	One of the possible results in an experiment. For example, in rolling a fair six-sided dice, sample points are 1 to 6.
Experiment	A process or trial with uncertain results. Examples include coin tossing, card selection, or rolling a die.
Event	A subset of the sample space representing certain outcomes. Example: getting “1” when rolling a die.
Favorable Outcome	An outcome that produces the desired or expected consequence.

Various other probability formulas are,

Joint Probability (Intersection of Event)	Probability of occurring events A and B	P(A and B) = P(A) × P(B)
Union of Events	Probability of occurring events A or B	P(A or B) = P(A) + P(B) – P(A and B)
Conditional Probability	Probability of occurring events A when event B has occurred	P(A \| B) = P(A and B)/P(B)

Bayes Theorem

Bayes’ Theorem is a fundamental concept in probability theory that relates conditional probabilities. It is named after the Reverend Thomas Bayes, who first introduced the theorem. Bayes’ Theorem is a mathematical formula that provides a way to update probabilities based on new evidence. The formula is as follows:

[Tex]P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} [/Tex]

where

P(A∣B): Probability of event A given that event B has occurred (posterior probability).
P(B∣A): Probability of event B given that event A has occurred (likelihood).
P(A): Probability of event A occurring (prior probability).
P(B): Probability of event B occurring.

Types of Probability Functions

Probability Mass Function(PMF)
Probability Mass Function is a concept in probability theory that describes the probability distribution of a discrete random variable. The PMF gives the probability of each possible outcome of a discrete random variable.
Probability Density Function (PDF)
Probability Density Function describes the likelihood of a continuous random variable falling within a particular range. It’s the derivative of the cumulative distribution function (CDF).
Cumulative Distribution Function (CDF)
Cumulative Distribution Function gives the probability that a random variable will take a value less than or equal to a given value. It’s the integral of the probability density function (PDF).
Empirical Distribution Function (EDF):
Empirical Distribution Function is a non-parametric estimator of the cumulative distribution function (CDF) based on observed data. For a given set of data points, the EDF represents the proportion of observations less than or equal to a specific value. It is constructed by sorting the data and assigning a cumulative probability to each data point.

Probability Distributions Functions

Normal or Gaussian Distribution

The normal distribution is a continuous probability distribution characterized by its bell-shaped curve and can be by described by mean (μ) and standard deviation (σ).

Formula: [Tex]f(X|\mu,\sigma)=\frac{\epsilon^{-0.5(\frac{X-\mu}{\sigma})^2}}{\sigma\sqrt(2\pi)} [/Tex]

There is a empirical rule in normal distribution, which states that:

Approximately 68% of the data falls within one standard deviation (σ) of the mean in both directions. This is often referred to as the 68-95-99.7 rule.
About 95% of the data falls within two standard deviations (2σ) of the mean.
Approximately 99.7% of the data falls within three standard deviations (3σ) of the mean.

These rule is used to detect outliers.

Normal-Distribution

Central Limit Theorem

The Central Limit Theorem (CLT) states that, regardless of the shape of the original population distribution, the sampling distribution of the sample mean will be approximately normally distributed if the sample size tends to infinity.

Student t-distribution

The t-distribution, also known as Student’s t-distribution, is a probability distribution that is used in statistics.

[Tex]f(t) =\frac{\Gamma\left(\frac{df+1}{2}\right)}{\sqrt{df\pi} \, \Gamma\left(\frac{df}{2}\right)} \left(1 + \frac{t^2}{df}\right)^{-\frac{df+1}{2}} [/Tex]

where,

Γ(.) is the gamma function
df = Degrees of freedom

Chi-square Distribution

The chi-squared distribution, denoted as [Tex]\chi ^2 [/Tex] is a probability distribution used in statistics it is related to the sum of squared standard normal deviates.[Tex]\chi^2 = \frac 1{2^{k/2}\Gamma {(k/2)}} x^{{\frac k 2}-1} e^{\frac {-x}2} [/Tex]

Binomial Distribution

The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials, where each trial has the same probability of success (p).

Formula: [Tex]P(X=k)=(^n_k)p^k(1-p)^{n-k} [/Tex]

Assuming each trial is an independent event with a success probability of p=0.5, and we are calculating the probability of getting 3 successes in 6 trials: [Tex]P(X=3)=(^6_3)(0.5)^3(1-0.5)^3=0.3125 [/Tex]

Poisson Distribution

The Poisson distribution models the number of events that occur in a fixed interval of time or space. It’s characterized by a single parameter (λ), the average rate of occurrence.

Formula: [Tex]P(X=k)=\frac{\epsilon^{-\lambda}\lambda^k}{k!} [/Tex]

For the previous dataset, assuming the average rate of waiting time is λ=10, and we are calculating the probability of waiting exactly 12 minutes: [Tex]P(X=12)=\frac{\varepsilon^{-10}.10^{12}}{12!}≈0.0948 [/Tex]

Uniform Distribution

The uniform distribution represents a constant probability for all outcomes in a given range.

Formula: [Tex]f(X)=\frac{1}{b-a} [/Tex]

For the same previous dataset, assuming the bus arrives uniformly between 5 and 18 minutes so the probability of waiting less than 15 minutes: [Tex]P(X<15)= ∫_5^{15}\frac{1}{18-5}dx=\frac{10}{13}=0.7692 [/Tex]

Parameter estimation for Statistical Inference

Population: Population is the group of individual, object or measurements about which you want to draw conclusion.
Sample: Sample is the subset of population; the group chosen from the larger population to gather information and make inference about entire population.
Expectation: Expectation, in statistics and probability theory, represents the anticipated or average value of a random variable. It is represented by E(x).
Parameter: A parameter is a numerical characteristic of a population that is of interest in statistical analysis.
Examples of parameters include the population mean (μ), population standard deviation (σ), or the success probability in a binomial distribution.
Statistic: A statistic is a numerical value or measure calculated from a sample of data. It is used to estimate or infer properties of the corresponding population.
Estimation: Estimation involves using sample data to make inferences or predictions about population parameters.
Estimator: An estimator is a statistic used to estimate an unknown parameter in a statistical model.
Bias: Bias in parameter estimation refers to the systematic error or deviation of the estimated value from the true value of the parameter.
[Tex]Bias(\widehat{\theta}) = E(\widehat{\theta}) – \theta [/Tex]
An estimator is considered unbiased if, on average, it produces parameter estimates that are equal to the true parameter value. Bias is measured as the difference between the expected value of the estimator and the true parameter value.
[Tex]E(\widehat{\theta}) = \theta [/Tex]

Hypothesis Testing

Hypothesis testing makes inferences about a population parameter based on sample statistic.

type-error

Null Hypothesis (H₀) and Alternative Hypothesis (H₁)

H₀: There is no significant difference or effect.
H₁: There is a significant effect i.e the given statement can be false.

Degrees of freedom

Degrees of freedom (df) in statistics represent the number of values or quantities in the final calculation of a statistic that are free to vary. It is mainly defined as sample size – one(n-1).

Level of Significance([Tex]\alpha [/Tex])

This is the threshold used to determine statistical significance. Common values are 0.05, 0.01, or 0.10.

p-value

The p-value, short for probability value, is a fundamental concept in statistics that quantifies the evidence against a null hypothesis.

If p-value ≤ α: Reject the null hypothesis.
If p-value > α: Fail to reject the null hypothesis (meaning there isn’t enough evidence to reject it).

Type I Error and Type II Error

Type I Error that occurs when the null hypothesis is true, but the statistical test incorrectly rejects it. It is often referred to as a “false positive” or “alpha error.”

Type II Error that occurs when the null hypothesis is false, but the statistical test fails to reject it. It is often referred to as a “false negative.”

Confidence Intervals

A confidence interval is a range of values that is used to estimate the true value of a population parameter with a certain level of confidence. It provides a measure of the uncertainty or margin of error associated with a sample statistic, such as the sample mean or proportion.

Example of Hypothesis testing:

Let us consider An e-commerce company wants to assess whether a recent website redesign has a significant impact on the average time users spend on their website.

The company collects the following data:

Data on user session durations before and after the redesign.
Before redesign: Sample mean ([Tex]\overline{\rm x} [/Tex]) = 3.5 minutes, Sample standard deviation (s) = 1.2 minutes, Sample size (n) = 50.
After redesign: Sample mean ([Tex]\overline{\rm x} [/Tex]= 4.2 minutes, Sample standard deviation (s) = 1.5 minutes, Sample size (n) = 60.

The Hypothesis are defined as:

Null Hypothesis (H0): The website redesign has no impact on the average user session duration [Tex]\mu_{after} -\mu_{before} = 0 [/Tex]
Alternative Hypothesis (Ha): The website redesign has a positive impact on the average user session duration [Tex]\mu_{after} -\mu_{before} > 0 [/Tex]

Significance Level:

Choose a significance level, α=0.05(commonly used)

Test Statistic and P-Value:

Conduct a test for the difference in means.
Calculate the test statistic and p-value.

Result:

If the p-value is less than the chosen significance level, reject the null hypothesis.
If the p-value is greater than or equal to the significance level, fail to reject the null hypothesis.

Interpretations:

Based on the analysis, the company draws conclusions about whether the website redesign has a statistically significant impact on user session duration.

Statistical Tests:

Parametric test are statistical methods that make assumption that the data follows normal distribution.

Z-test	t-test	F-test
Testing if the mean of a sample is significantly different from a known population mean	Comparing means of two independent samples or testing if the mean of a sample is significantly different from a known or hypothesized population mean	Comparing the variances of multiple groups to assess if they are significantly different
Used when the population standard deviation is known, and the sample size is sufficiently large.	Used when the population standard deviation is unknown or when dealing with small sample sizes	Used to compare variances between two or more groups.
One-Sample Test: Z = [Tex] \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}} [/Tex] Two-Sample Test: Z = [Tex]\frac{\overline{X_1} -\overline{X_2}}{\sqrt{\frac{\sigma_{1}^{2}}{n_1} + \frac{\sigma_{2}^{2}}{n_2}}} [/Tex]	One- sample: t = [Tex]\frac{\overline{X}- \mu}{\frac{s}{\sqrt{n}}} [/Tex] Two-Sample Test: [Tex]t= \frac{\overline{X_1} – \overline{X_2}}{\sqrt{\frac{s_{1}^{2}}{n_1} + \frac{s_{2}^{2}}{n_2}}} [/Tex] Paired t-Test: t=[Tex]\frac{\overline{d}}{\frac{s_d}{\sqrt{n}}} [/Tex] d= difference	[Tex]F = \frac{s_{1}^{2}}{s_{2}^{2}}[/Tex]

Z-test

t-test

F-test

Testing if the mean of a sample is significantly different from a known population mean

Comparing means of two independent samples or testing if the mean of a sample is significantly different from a known or hypothesized population mean

Comparing the variances of multiple groups to assess if they are significantly different

Used when the population standard deviation is known, and the sample size is sufficiently large.

Used when the population standard deviation is unknown or when dealing with small sample sizes

Used to compare variances between two or more groups.

One-Sample Test:
Z =

[Tex] \frac{\overline{X}-\mu}{\frac{\sigma}{\sqrt{n}}} [/Tex]

Two-Sample Test:

Z = [Tex]\frac{\overline{X_1} -\overline{X_2}}{\sqrt{\frac{\sigma_{1}^{2}}{n_1} + \frac{\sigma_{2}^{2}}{n_2}}} [/Tex]

One- sample:

t = [Tex]\frac{\overline{X}- \mu}{\frac{s}{\sqrt{n}}} [/Tex]

Two-Sample Test:

[Tex]t= \frac{\overline{X_1} – \overline{X_2}}{\sqrt{\frac{s_{1}^{2}}{n_1} + \frac{s_{2}^{2}}{n_2}}} [/Tex]

Paired t-Test:

t=[Tex]\frac{\overline{d}}{\frac{s_d}{\sqrt{n}}} [/Tex]
d= difference

[Tex]F = \frac{s_{1}^{2}}{s_{2}^{2}}[/Tex]

ANOVA (Analysis Of Variance)

Source of Variation	Sum of Squares	Degrees Of Freedom	Mean Squares	F-Value
Between Groups	SSB= [Tex]\Sigma n _1(\bar x_1 – \bar x)^2 [/Tex]	df₁=k-1	MSB= SSB/ (k-1)	f=MSB/MSE
Error	SSE=[Tex]\Sigma\Sigma (\bar x_1 – \bar x)^2 [/Tex]	df₂=N-1	MSE=SSE/(N-k)
Total	SST= SSE+SSE	df₃=N-1

There are mainly two types of ANOVA:

One-way Anova: Used to compare means of three or more groups to determine if there are statistically significant differences among them.
here,
- H₀: The means of all groups are equal.
- H₁: At least one group mean is different.
Two-way Anova: It assess the influence of two categorical independent variables on a dependent variable, examining the main effects of each variable and their interaction effect.

Chi-Squared Test

The chi-squared test is a statistical test used to determine if there is a significant association between two categorical variables. It compares the observed frequencies in a contingency table with the frequencies. Formula:
[Tex]X^2=\Sigma{\frac{(O_{ij}-E_{ij})^2}{E_{ij}}}[/Tex].

This test is also performed on big data with multiple number of observations.

Non-Parametric Test

Non-parametric test does not make assumptions about the distribution of the data. They are useful when data does not meet the assumptions required for parametric tests.

Mann-Whitney U Test: Mann-Whitney U Test is used to determine whether there is a difference between two independent groups when the dependent variable is ordinal or continuous. Applicable when assumptions for a t-test are not met. In it we rank all data points, combines the ranks, and calculates the test statistic.
Kruskal-Wallis Test: Kruskal-Wallis Test is used to determine whether there are differences among three or more independent groups when the dependent variable is ordinal or continuous. Non-parametric alternative to one-way ANOVA.

A/B Testing or Split Testing

A/B testing, also known as split testing, is a method used to compare two versions (A and B) of a webpage, app, or marketing asset to determine which one performs better.

Example : a product manager change a website’s “Shop Now” button color from green to blue to improve the click-through rate (CTR). Formulating null and alternative hypotheses, users are divided into A and B groups, and CTRs are recorded. Statistical tests like chi-square or t-test are applied with a 5% confidence interval. If the p-value is below 5%, the manager may conclude that changing the button color significantly affects CTR, informing decisions for permanent implementation.

Regression

Regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables.

The equation for regression:
[Tex]y=\alpha+ \beta x[/Tex]

Where,

y is the dependent variable,
x is the independent variable
[Tex]\alpha[/Tex] is the intercept
[Tex]\beta[/Tex] is the regression coefficient.

Regression coefficient is a measure of the strength and direction of the relationship between a predictor variable (independent variable) and the response variable (dependent variable).
[Tex]\beta = \frac{\sum(X_i-\overline{X})(Y_i – \overline{Y})}{\sum(X_i-\overline{X})^2}[/Tex]

Conclusion

In summary, statistics is a vital tool for understanding and utilizing data across various fields. Descriptive statistics simplify and organize data, while inferential statistics allow us to draw conclusions and make predictions based on samples. Measures like central tendency, dispersion, and shape offer insights into data characteristics. Hypothesis testing, confidence intervals, and probability distributions help make informed decisions and analyze relationships between variables. Whether you’re preparing for an interview, exploring data science, or making business choices, a solid grasp of statistics is essential for success in navigating and interpreting the complexities of data.