Open In App

How to Perform a Repeated Measures ANOVA in Python

Last Updated : 28 Feb, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Repeated measures ANOVA in Python is used to find whether there is a statistically significant difference exists between the means of three or more groups in which the same subjects displayed in each group. 

Hypothesis:

A repeated-measures ANOVA has the following null and alternative hypotheses involved:

  • The null hypothesis (H0): µ1 = µ2 = µ3 (In other words, population means are equal)
  • The alternative hypothesis: (Ha): According to it, at least one population mean differs from the rest

Perform a repeated-measures ANOVA in Python:

Let us consider an example, researchers are curious to know if four different engine oils lead to different mileage of cars. In order to test this, they measured the mileage of 5 cars using four different engine oils. Since each car’s mileage is measured by applying each of the four-engine oils one by one so we can use a repeated-measures ANOVA to check if the mean reaction time differs between drugs. 

Syntax to install numpy, pandas and and statsmodels library:

pip3 install numpy pandas statsmodels

Performing the repeated measures ANOVA in Python is a step-by-step process. These steps are explained below.

Step 1: Create the data 

Python3




# Import the library
import numpy as np
import pandas as pd
  
# Create the data
dataframe = pd.DataFrame({'Cars': np.repeat([1, 2, 3, 4, 5], 4),
                          'Engine Oil': np.tile([1, 2, 3, 4], 5),
                          'Mileage': [36, 38, 30, 29,
                                      34, 38, 30, 29,
                                      34, 28, 38, 32,
                                      38, 34, 20, 44,
                                      26, 28, 34, 50]})
  
# Print the dataframe
print(dataframe)


Output:

Step 2: Conduct the repeated measures ANOVA.

Python provides us AnovaRM() function from the statsmodels library to measure repeated measures ANOVA. 

Example:

Python3




# Import library
import numpy as np
import pandas as pd
from statsmodels.stats.anova import AnovaRM
  
# Create the data
dataframe = pd.DataFrame({'Cars': np.repeat([1, 2, 3, 4, 5], 4),
                          'Oil': np.tile([1, 2, 3, 4], 5),
                          'Mileage': [36, 38, 30, 29,
                                      34, 38, 30, 29,
                                      34, 28, 38, 32,
                                      38, 34, 20, 44,
                                      26, 28, 34, 50]})
  
# Conduct the repeated measures ANOVA
print(AnovaRM(data=dataframe, depvar='Mileage',
              subject='Cars', within=['Oil']).fit())


Output:

Output

Step 3: Analyse the results.

In this example, the F test-statistic comes out to be equal to 0.5679 and the corresponding p-value is 0.6466. Since this p-value is not less than 0.05, we cannot reject the null hypothesis and conclude that there is a not statistically significant difference in mean response times between the four-engine oils.

Step 4: Report the outcome.

Let us report the result now: A one-way repeated measures ANOVA is conducted on 5 individuals to interpret the effect of four different engine oils on the mileage. Results showed that the type of drug used led to statistically significant differences in response time (F(3, 12) = 0.5679, p < 0.6466).



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads