How to Perform a Two-Way ANOVA in Python
Last Updated :
28 Feb, 2022
Two-Way ANOVA: Two-Way ANOVA in statistics stands for Analysis of Variance and it is used to check whether there is a statistically significant difference between the mean value of three or more that has been divided into two factors. In simple words, ANOVA is a test conducted in statistics and it is used to interpret the difference between the mean value of at least three groups. The main objective of a two-way ANOVA is to find out how two factors affect a response variable and to find out whether there is a relation between the two factors on the response variable.
Syntax to installs pandas and NumPy libraries in the system:
pip3 install numpy pandas
Performing a Two-Way ANOVA in Python:
Let us consider an example in which scientists need to know whether plant growth is affected by fertilizers and watering frequency. They planted exactly 30 plants and allowed them to grow for six months under different conditions for fertilizers and watering frequency. After exactly six months, they recorded the heights of each plant centimeters. Performing a Two-Way ANOVA in Python is a step by step process and these are discussed below:
Step 1: Import libraries.
The very first step is to import the libraries installed above.
Python3
import numpy as np
import pandas as pd
|
Step 2: Enter the data.
Let us create a pandas DataFrame that consist of the following three variables:
- fertilizers: how frequently each plant was fertilized that is daily or weekly.
- watering: how frequently each plant was watered that is daily or weekly.
- height: the height of each plant (in inches) after six months.
Example:
Python3
import numpy as np
import pandas as pd
dataframe = pd.DataFrame({ 'Fertilizer' : np.repeat([ 'daily' , 'weekly' ], 15 ),
'Watering' : np.repeat([ 'daily' , 'weekly' ], 15 ),
'height' : [ 14 , 16 , 15 , 15 , 16 , 13 , 12 , 11 , 14 ,
15 , 16 , 16 , 17 , 18 , 14 , 13 , 14 , 14 ,
14 , 15 , 16 , 16 , 17 , 18 , 14 , 13 , 14 ,
14 , 14 , 15 ]})
|
Step 3: Conduct the two-way ANOVA:
To perform the two-way ANOVA, the Statsmodels library provides us with anova_lm() function. The syntax of the function is given below,
Syntax:
sm.stats.anova_lm(model, type=2)
Parameters:
- model: It represents model statistics
- type: It represents the type of Anova test to perform that is { I or II or III or 1 or 2 or 3 }
Python3
import statsmodels.api as sm
from statsmodels.formula.api import ols
model = ols(
'height ~ C(Fertilizer) + C(Watering) + \
C(Fertilizer):C(Watering)', data = df).fit()
sm.stats.anova_lm(model, typ = 2 )
|
Step 4: Combining all the steps.
Example:
Python3
import statsmodels.api as sm
from statsmodels.formula.api import ols
dataframe = pd.DataFrame({ 'Fertilizer' : np.repeat([ 'daily' , 'weekly' ], 15 ),
'Watering' : np.repeat([ 'daily' , 'weekly' ], 15 ),
'height' : [ 14 , 16 , 15 , 15 , 16 , 13 , 12 , 11 ,
14 , 15 , 16 , 16 , 17 , 18 , 14 , 13 ,
14 , 14 , 14 , 15 , 16 , 16 , 17 , 18 ,
14 , 13 , 14 , 14 , 14 , 15 ]})
model = ols('height ~ C(Fertilizer) + C(Watering) + \
C(Fertilizer):C(Watering)',
data = dataframe).fit()
result = sm.stats.anova_lm(model, type = 2 )
print (result)
|
Output:
Output
Interpreting the result:
Following are the p-values for each of the factors in the output:
- The fertilizer p-value is equal to 0.913305
- The Watering p-value is equal to 0.990865
- The Fertilizer * Watering: p-value is equal to 0.904053
The p-values for water and sun turn out to be less than 0.05 which implies that the means of both the factors possess a statistically significant effect on plant height. The p-value for the interaction effect (0.904053) is greater than 0.05 which depicts that there is no significant interaction effect between fertilizer frequency and watering frequency.
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...