How to Perform an ANCOVA in Python

Last Updated : 22 Feb, 2022

ANCOVA (Analysis of Covariance) is used to identify the statistical difference between means of 2 or more independent groups after controlling one or more explanatory variables (Covariates). Variables that influence a response variable but are irrelevant to the study are known as covariates.

The independent variable or a predictor variable that explains the variation in the response variable (output variable) is known as Explanatory Variable.
The dependent variable or an outcome variable that responds to the changes in the explanatory variable is known as Response Variable.

Applying ANCOVA function

Example: A tutor wants to know if three distinct teaching and learning methodologies have an adverse effect on test scores, but she also wants to account for the student’s current grade in the class. She’ll run an ANCOVA with the following variables:

Learning methodologies is a Factor Variable.
Current grade is a Covariate.
Test score is a Response variable.

Steps to perform ANCOVA

Step 1: Create a Pandas Data frame to hold the data for performing ANCOVA.

Python

import numpy as np
import pandas as pd
 
# create data
data = pd.DataFrame({'methodology': np.repeat(['A', 'B', 'C'], 4),
                     'current_grade': [67, 88, 75, 85,
                                       92, 77, 74, 88,
                                       91, 88, 82, 80],
                     'test_score': [77, 89, 74, 69,
                                    88, 93, 94, 90,
                                    85, 81, 83, 79]})
# view data
data

Step 2: Now perform ANCOVA using ancova() from pingouin library. Make sure you have installed pingouin library before using ANCOVA() as follows.

Syntax:

pip install pingouin

The above code will execute all the necessary libraries and modules of pingouin.

ancova() functions:

Syntax: pingouin.ancova(data=None, dv=None, between=None, covar=None, effsize=’np2′)

Pameters:

Data : pandas data frame that is supplied to perform ANCOVA.

DV : Dependent variable column name.

Between : name of the column in data with Factor variable.

Covar : name of the columns in data with covariate.

effsize : Effect size.

Python

from pingouin import ancova
 
data = pd.DataFrame({'methodology': np.repeat(['A', 'B', 'C'], 4),
                     'current_grade': [67, 88, 75, 85,
                                       92, 77, 74, 88,
                                       91, 88, 82, 80],
                     'test_score': [77, 89, 74, 69,
                                    88, 93, 94, 90,
                                    85, 81, 83, 79]})
 
ancova(data=data, dv='test_score', covar='current_grade', between='methodology')

Output:

Step 3: Analyze the results obtained after performing ANCOVA.

ANCOVA() function after executing successfully it returns the following values.

aov-pandas.DataFrame

ANCOVA summary:

‘Source’: Names of the factor considered

‘SS’: Sums of squares

‘DF’: Degrees of freedom

‘F’: F-values

‘p-unc’: Uncorrected p-values

‘np2’: Partial eta-squared

According to the ANCOVA table, the p-value (p-unc = “uncorrected p-value”) for study methodology is 0.025542. Because this value is less than 0.05, we can reject the null hypothesis that each of the studying methodologies results in the same average test score, even after controlling for the student’s current grade in the class.

Suggest improvement

Python | Pandas Series.ne()

Python time.pthread_getcpuclockid() Function

Share your thoughts in the comments