Open In App

P-Value: Comprehensive Guide to Understand, Apply, and Interpret

Improve
Improve
Like Article
Like
Save
Share
Report

A p-value is a statistical metric used to assess a hypothesis by comparing it with observed data.

This article delves into the concept of p-value, its calculation, interpretation, and significance. It also explores the factors that influence p-value and highlights its limitations.

What is the P-value?

The p-value, or probability value, is a statistical measure used in hypothesis testing to assess the strength of evidence against a null hypothesis. It represents the probability of obtaining results as extreme as, or more extreme than, the observed results under the assumption that the null hypothesis is true.

In simpler words, it is used to reject or support the null hypothesis during hypothesis testing. In data science, it gives valuable insights on the statistical significance of an independent variable in predicting the dependent variable. 

How P-value is calculated?

Calculating the p-value typically involves the following steps:

  1. Formulate the Null Hypothesis (H0): Clearly state the null hypothesis, which typically states that there is no significant relationship or effect between the variables.
  2. Choose an Alternative Hypothesis (H1): Define the alternative hypothesis, which proposes the existence of a significant relationship or effect between the variables.
  3. Determine the Test Statistic: Calculate the test statistic, which is a measure of the discrepancy between the observed data and the expected values under the null hypothesis. The choice of test statistic depends on the type of data and the specific research question.
  4. Identify the Distribution of the Test Statistic: Determine the appropriate sampling distribution for the test statistic under the null hypothesis. This distribution represents the expected values of the test statistic if the null hypothesis is true.
  5. Calculate the Critical-value: Based on the observed test statistic and the sampling distribution, find the probability of obtaining the observed test statistic or a more extreme one, assuming the null hypothesis is true.
  6. Interpret the results: Compare the critical-value with t-statistic. If the t-statistic is larger than the critical value, it provides evidence to reject the null hypothesis, and vice-versa.

Its interpretation depends on the specific test and the context of the analysis. Several popular methods for calculating test statistics that are utilized in p-value calculations.

Test

Scenario

Interpretation

Z-Test (Z-Statistic)

Used when dealing with large sample sizes or when the population standard deviation is known.

A small p-value (smaller than 0.05) indicates strong evidence against the null hypothesis, leading to its rejection.

T-Test (T-Statistic)

Appropriate for small sample sizes or when the population standard deviation is unknown.

Similar to the Z-test

Chi-Square Test

Used for tests of independence or goodness-of-fit.

A small p-value indicates that there is a significant association between the categorical variables, leading to the rejection of the null hypothesis.

F-Test

Commonly used in Analysis of Variance (ANOVA) to compare variances between groups.

A small p-value suggests that at least one group mean is different from the others, leading to the rejection of the null hypothesis.

Correlation Test

Measures the strength and direction of a linear relationship between two continuous variables.

A small p-value indicates that there is a significant linear relationship between the variables, leading to rejection of the null hypothesis that there is no correlation.

In general, a small p-value indicates that the observed data is unlikely to have occurred by random chance alone, which leads to the rejection of the null hypothesis. However, it’s crucial to choose the appropriate test based on the nature of the data and the research question, as well as to interpret the p-value in the context of the specific test being used.

P-value in Hypothesis testing

The table given below shows the importance of p-value and shows the various kinds of errors that occur during hypothesis testing.

Truth /Decision

Accept h0

Reject h0

h0 -> true

Correct decision based 
on the given p-value
(1-α)

Type I error (α)

h0 -> false

Type II error (β)

Incorrect decision based 
on the given p-value
(1-β)

Type I error: Incorrect rejection of the null hypothesis. It is denoted by α (significance level).
Type II error: Incorrect acceptance of the null hypothesis. It is denoted by β (power level)

Let’s consider an example to illustrate the process of calculating a p-value for Two Sample T-Test:

A researcher wants to investigate whether there is a significant difference in mean height between males and females in a population of university students.

Suppose we have the following data:

  • Group 1 (Males): n1 = 30, \overline{x_1} = 175 and s1=5
  • Group 2 ( Females): n2=35, \overline{x_2} = 168 and s2 =6

Starting with interpreting the process of calculating p-value

Step 1: Formulate the Null Hypothesis (H0):

H0: There is no significant difference in mean height between males and females.

Step 2: Choose an Alternative Hypothesis (H1):

H1: There is a significant difference in mean height between males and females.

Step 3: Determine the Test Statistic:

The appropriate test statistic for this scenario is the two-sample t-test, which compares the means of two independent groups.

The t-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.

t = \frac{\overline{x_1} - \overline{x_2}}{ \sqrt{\frac{(s_1)^2}{n_1} + \frac{(s_2)^2}{n_2}}}

Where,

  • \overline{x_1} is the mean of the first sample
  • \overline{x_2} is the mean of the second sample
  • s1 = First sample’s standard deviation
  • s2 = Second sample’s standard deviation
  • n1 = First sample’s sample size
  • n2 = Second sample’s sample size

Therefore,

\begin{aligned}t &= \frac{175 - 168}{\sqrt{\frac{5^2}{30} + \frac{6^2}{35}}}\\&= \frac{7}{\sqrt{0.8333 + 1.0286}}\\&= \frac{7}{\sqrt{1.8619}}\\& \approx  \frac{7}{1.364}\\& \approx 5.13\end{aligned}

So, the calculated two-sample t-test statistic (t) is approximately 5.13.

Step 4: Identify the Distribution of the Test Statistic:

The t-distribution is used for the two-sample t-test. The degrees of freedom for the t-distribution are determined by the sample sizes of the two groups.

 The t-distribution is a probability distribution with tails that are thicker than those of the normal distribution.

df = (n_1+n_2)-2

  • where, n1 is total number of values for 1st category.
  • n2 is total number of values for 2nd category.

So, df= (30+35)-2=63

The degrees of freedom (63) represent the variability available in the data to estimate the population parameters. In the context of the two-sample t-test, higher degrees of freedom provide a more precise estimate of the population variance, influencing the shape and characteristics of the t-distribution.

T-distribution-gfg

T-Statistic


The t-distribution is symmetric and bell-shaped, similar to the normal distribution. As the degrees of freedom increase, the t-distribution approaches the shape of the standard normal distribution. Practically, it affects the critical values used to determine statistical significance and confidence intervals.

Step 5: Calculate Critical Value.

To find the critical t-value with a t-statistic of 5.13 and 63 degrees of freedom, we can either consult a t-table or use statistical software.

We can use scipy.stats module in Python to find the critical t-value using below code.

Python3

import scipy.stats as stats
 
t_statistic = 5.13
degrees_of_freedom = 63
alpha = 0.05
critical_t_value = stats.t.ppf(1 - alpha/2, degrees_of_freedom)
print(f"Critical t-value at alpha={alpha} , df:{degrees_of_freedom} and {critical_t_value}")

                    

Output:

Critical t-value at alpha=0.05 , df:63 and 1.9983405417721956

Comparing with T-Statistic:

Since, 1.9983<5.13

The larger t-statistic suggests that the observed difference between the sample means is unlikely to have occurred by random chance alone. Therefore, we reject the null hypothesis.

How to interpret p-value?

To interpret the p-value, you need to compare it to a chosen significance level (\alpha) . During hypothesis testing, we assume a significance level (α), generally 5% (α = 0.05). It is the probability of rejecting the null hypothesis when it is true. It is observed that lower the p-value, higher is the probability of rejecting the null hypothesis. When:

  • p ≤ (α = 0.05) : Reject the null hypothesis. There is sufficient evidence to conclude that the observed effect or relationship is statistically significant, meaning it is unlikely to have occurred by chance alone.
  • p > (α = 0.05) : reject alternate hypothesis (or accept null hypothesis). The observed effect or relationship does not provide enough evidence to reject the null hypothesis. This does not necessarily mean there is no effect; it simply means the sample data does not provide strong enough evidence to rule out the possibility that the effect is due to chance.

In case the significance level is not specified, consider the below general inferences while interpreting your results. 

  • If p > .10: not significant
  • If p ≤ .10: slightly significant
  • If p ≤ .05: significant
  • If p ≤ .001: highly significant

Graphically, the p-value is located at the tails of any confidence interval. [As shown in fig 1]

Fig 1: Graphical Representation 

What influences p-value?

The p-value in hypothesis testing is influenced by several factors:

  1. Sample Size: Larger sample sizes tend to yield smaller p-values, increasing the likelihood of detecting significant effects.
  2. Effect Size: A larger effect size results in smaller p-values, making it easier to detect a significant relationship.
  3. Variability in the Data: Greater variability often leads to larger p-values, making it harder to identify significant effects.
  4. Significance Level: A lower chosen significance level increases the threshold for considering p-values as significant.
  5. Choice of Test: Different statistical tests may yield different p-values for the same data.
  6. Assumptions of the Test: Violations of test assumptions can impact p-values.

Understanding these factors is crucial for interpreting p-values accurately and making informed decisions in hypothesis testing.

Significance of P-value

  • The p-value provides a quantitative measure of the strength of the evidence against the null hypothesis.
  • Decision-Making in Hypothesis Testing
  • P-value serves as a guide for interpreting the results of a statistical test. A small p-value suggests that the observed effect or relationship is statistically significant, but it does not necessarily mean that it is practically or clinically meaningful.

Limitations of P-value

  • The p-value is not a direct measure of the effect size, which represents the magnitude of the observed relationship or difference between variables. A small p-value does not necessarily mean that the effect size is large or practically meaningful.
  • Influenced by Various Factors

The p-value is a crucial concept in statistical hypothesis testing, serving as a guide for making decisions about the significance of the observed relationship or effect between variables.

Implementing P-value in Python

Let’s consider a scenario where a tutor believes that the average exam score of their students is equal to the national average (85). The tutor collects a sample of exam scores from their students and performs a one-sample t-test to compare it to the population mean (85).

  • The code performs a one-sample t-test to compare the mean of a sample data set to a hypothesized population mean.
  • It utilizes the scipy.stats library to calculate the t-statistic and p-value. SciPy is a Python library that provides efficient numerical routines for scientific computing.
  • The p-value is compared to a significance level (alpha) to determine whether to reject the null hypothesis.

Python3

import scipy.stats as stats
 
# exam scores
sample_data = [78, 82, 88, 95, 79, 92, 85, 88, 75, 80]
 
# Population mean
population_mean = 85
 
# One-sample t-test
t_stat, p_value = stats.ttest_1samp(sample_data, population_mean)
 
print("t-statistic:", t_stat)
print("p-value:", p_value)
 
# Conditions
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is enough evidence to suggest a significant difference.")
else:
    print("Fail to reject the null hypothesis. The difference is not statistically significant.")

                    

Output:

t-statistic: -0.3895364838967159
p-value: 0.7059365203154573
Fail to reject the null hypothesis. The difference is not statistically significant.

Since, 0.7059>0.05, we would conclude to fail to reject the null hypothesis. This means that, based on the sample data, there isn’t enough evidence to claim a significant difference in the exam scores of the tutor’s students compared to the national average. The tutor would accept the null hypothesis, suggesting that the average exam score of their students is statistically consistent with the national average.

Applications of p-value

  1. During Forward and Backward propagation: When fitting a model (say a Multiple Linear Regression model), we use the p-value in order to find the most significant variables that contribute significantly in predicting the output.
  2. Effects of various drug medicines: It is highly used in the field of medical research in determining whether the constituents of any drug will have the desired effect on humans or not. P-value is a very strong statistical tool used in hypothesis testing. It provides a plethora of valuable information while making an important decision like making a business intelligence inference or determining whether a drug should be used on humans or not, etc. For any doubt/query, comment below.

Conclusion

The p-value is a crucial concept in statistical hypothesis testing, providing a quantitative measure of the strength of evidence against the null hypothesis. It guides decision-making by comparing the p-value to a chosen significance level, typically 0.05. A small p-value indicates strong evidence against the null hypothesis, suggesting a statistically significant relationship or effect. However, the p-value is influenced by various factors and should be interpreted alongside other considerations, such as effect size and context.

Frequently Based Questions (FAQs)

Why is p-value greater than 1?

A p-value is a probability, and probabilities must be between 0 and 1. Therefore, a p-value greater than 1 is not possible.

What does P 0.01 mean?

It means that the observed test statistic is unlikely to occur by chance if the null hypothesis is true. It represents a 1% chance of observing the test statistic or a more extreme one under the null hypothesis.

Is 0.9 a good p-value?

A good p-value is typically less than or equal to 0.05, indicating that the null hypothesis is likely false and the observed relationship or effect is statistically significant.

What is p-value in a model?

It is a measure of the statistical significance of a parameter in the model. It represents the probability of obtaining the observed value of the parameter or a more extreme one, assuming the null hypothesis is true.

Why is p-value so low?

A low p-value means that the observed test statistic is unlikely to occur by chance if the null hypothesis is true. It suggests that the observed relationship or effect is statistically significant and not due to random sampling variation.

How Can You Use P-value to Compare Two Different Results of a Hypothesis Test?

Compare p-values: Lower p-value indicates stronger evidence against null hypothesis, favoring results with smaller p-values in hypothesis testing.



Last Updated : 31 Jan, 2024
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads