# ANOVA Test in R Programming

ANOVA also known as Analysis of variance is used to investigate relations between categorical variables and continuous variables in R Programming. It is a type of hypothesis testing for population variance. It enables us to assess whether observed variations in means are statistically significant or merely the result of chance by comparing the variation within groups to the variation between groups. The ANOVA test is frequently used in many disciplines, including business, social sciences, biology, and experimental research.

### R – ANOVA Test

ANOVA tests may be run in R programming, and there are a number of functions and packages available to do so.

ANOVA test involves setting up:

• Null Hypothesis: The default assumption, or null hypothesis, is that there is no meaningful relationship or impact between the variables. It stands for the absence of a population-wide link, difference, or effect. The statement that two or more groups are equal or that the effect size is zero is sometimes expressed as the null hypothesis. The null hypothesis is commonly written as H0.

• Alternate Hypothesis: The opposite of the null hypothesis is the alternative hypothesis. It implies that there is a significant relationship, difference, or link among the population’s variables. Depending on the study question or the nature of the issue under investigation, it may take several forms. Alternative hypotheses are sometimes referred to as H1 or HA.

ANOVA tests are of two types:

• One-way ANOVA: One-way When there is a single categorical independent variable (also known as a factor) and a single continuous dependent variable, an ANOVA is employed. It seeks to ascertain whether there are any notable variations in the dependent variable’s means across the levels of the independent variable.

• Two-way ANOVA: When there are two categorical independent variables (factors) and one continuous dependent variable, two-way ANOVA is used as an extension of one-way ANOVA. You can evaluate both the direct impacts of each independent variable and how they interact with one another on the dependent variable.

### The Dataset

The mtcars(motor trend car road test) dataset is used which consist of 32 car brands and 11 attributes. The dataset comes preinstalled in dplyr package in R.

To get started with ANOVA, we need to install and load the dplyr package.

### Performing One Way ANOVA test in R language

One-way ANOVA test is performed using mtcars dataset which comes preinstalled with dplyr package between disp attribute, a continuous attribute and gear attribute, a categorical attribute.here are some steps.

• Setup Null Hypothesis and Alternate Hypothesis
• H0 = mu = mu01 = mu02(There is no differencebetween average displacement for different gears)
• H1 = Not all means are equal.

## R

 `# Installing the package` `install.packages``(``"dplyr"``)`   `# Loading the package` `library``(dplyr)`   `# Variance in mean within group and between group` `histogram``(mtcars\$disp~``factor``(mtcars\$gear),color=``'B'``,` `        ``xlab = ``"gear"``, ylab = ``"disp"``)`

Output: ANOVA Test in R Programming

The Histogram shows the mean values of gear with respect of displacement. Hear categorical variable is the gear on which factor function is used and the continuous variable is disp.

Calculate test statistics using aov function.

## R

 `mtcars_aov <- ``aov``(mtcars\$disp~``factor``(mtcars\$gear))` `summary``(mtcars_aov)`

Output:

```                    Df Sum Sq Mean Sq F value   Pr(>F)
factor(mtcars\$gear)  2 280221  140110   20.73 2.56e-06 ***
Residuals           29 195964    6757
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1```
• Df: The model’s degrees of freedom.
• Sum Sq: The sums of squares, which represent the variability that the model is able to account for.
• Mean Sq: The variance explained by each component is represented by the mean squares.
• F-value: It is the measure used to compare the mean squares both within and between groups.
• Pr(>F): The F-statistics p-value, which denotes the factors’ statistical significance.
• Residuals: Relative deviations from the group mean, are often known as residuals and their summary statistics.

Identifier codes: Asterisks (*) are used to show the degree of significance; they stand for p 0.05, p 0.01, and p 0.001, respectively.

### Performing Two Way ANOVA test in R

A two-way ANOVA test is performed using mtcars dataset which comes preinstalled with dplyr package between disp attribute, a continuous attribute and gear attribute, a categorical attribute, am attribute, a categorical attribute.

• Setup Null Hypothesis and Alternate Hypothesis
• H0 = mu0 = mu01 = mu02(There is no difference between average displacement for different gear)
• H1 = Not all means are equal

## R

 `# Installing the package` `install.packages``(``"dplyr"``)`   `# Loading the package` `library``(dplyr)`   `# Variance in mean within group and between group` `histogram``(mtcars\$disp~mtcars\$gear, subset = (mtcars\$am == 0),` `        ``xlab = ``"gear"``, ylab = ``"disp"``, main = ``"Automatic"``)` `histogram``(mtcars\$disp~mtcars\$gear, subset = (mtcars\$am == 1),` `        ``xlab = ``"gear"``, ylab = ``"disp"``, main = ``"Manual"``)`

Output: ANOVA Test in R Programming ANOVA Test in R Programming

The histogram shows the mean values of gear with respect to displacement. Hear categorical variables are gear and am on which factor function is used and continuous variable is disp.

Calculate test statistics using aov function

## R

 `mtcars_aov2 <- ``aov``(mtcars\$disp~``factor``(mtcars\$gear) *` `                            ``factor``(mtcars\$am))` `summary``(mtcars_aov2)`

Output:

```                    Df Sum Sq Mean Sq F value   Pr(>F)
factor(mtcars\$gear)  2 280221  140110  20.695 3.03e-06 ***
factor(mtcars\$am)    1   6399    6399   0.945    0.339
Residuals           28 189565    6770
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1```

The summary shows that the gear attribute is very significant to displacement(Three stars denoting it) and am attribute is not much significant to displacement. P-value of gear is less than 0.05, so it proves that gear is significant to displacement i.e related to each other. P-value of am is greater than 0.05, am is not significant to displacement i.e not related to each other.

### Results

We see significant results from boxplots and summaries.

• Displacement is strongly related to Gears in cars i.e displacement is dependent on gears with p < 0.05.
• Displacement is strongly related to Gears but not related to transmission mode in cars with p 0.05 with am.

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!