In statistics, Bartlett’s test is used to test if k samples are from populations with equal variances. Equal variances across populations are called homoscedasticity or homogeneity of variances. Some statistical tests, for example, the ANOVA test, assume that variances are equal across groups or samples. The Bartlett test can be used to verify that assumption. Bartlett’s test enables us to compare the variance of two or more samples to decide whether they are drawn from populations with equal variance. It is fitting for normally distributed data. There are several solutions to test for the equality (homogeneity) of variance across groups, including:
- Bartlett’s test
- Levene’s test
- Fligner-Killeen test
It is very much easy to perform these tests in R programming. In this article let’s perform Bartlett’s test in R.
Statistical Hypotheses for Bartlett’s test
A hypothesis is a statement about the given problem. Hypothesis testing is a statistical method that is used in making a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. To know more about the statistical hypothesis please refer to Understanding Hypothesis Testing. For Bartlett’s test the statistical hypotheses are:
- Null Hypothesis: all populations variances are equal
- Alternative Hypothesis: At least two of them differ
Implementation in R
R provides a function bartlett.test() which is available in stats package can be used to compute Barlett’s test. The syntax for this function is given below:
formula: a formula of the form values ~ groups
dataset: a matrix or data frame
statistic: Bartlett’s K-squared test statistic
parameter: the degrees of freedom of the approximate chi-squared distribution of the test statistic.
p.value: the p-value of the test
There may arise two cases depending upon the format of data. And we have to apply the different formulas for these two different formats of data.
If data is in the stacked form: Data is in stacked form means the values for both samples stored in one variable, so in this case, use the following command:
bartlett.test(values ~ groups, dataset)
values: the name of the variable containing the data values
groups: the name of the variable that specifies which sample each value belongs too
If data is in the unstacked form: Data is in unstacked form means the samples stored in a separate variable, so in this case, nest the variable names inside the list() function as shown below:
bartlett.test(list(dataset$sample1, dataset$sample2, dataset$sample3))
Examples for Bartlett’s test
Bartlett’s test with one independent variable:
Consider the R’s inbuilt PlantGrowth dataset that gives the dried weight of three groups of ten batches of plants, wherever every group of ten batches got a different treatment. The weight variable gives the weight of the batch and the group variable gives the treatment received either ctrl, trt1 or trt2. To view the data set please type below command:
weight group 1 4.17 ctrl 2 5.58 ctrl 3 5.18 ctrl 4 6.11 ctrl 5 4.50 ctrl 6 4.61 ctrl 7 5.17 ctrl 8 4.53 ctrl 9 5.33 ctrl 10 5.14 ctrl 11 4.81 trt1 12 4.17 trt1 13 4.41 trt1 14 3.59 trt1 15 5.87 trt1 16 3.83 trt1 17 6.03 trt1 18 4.89 trt1 19 4.32 trt1 20 4.69 trt1 21 6.31 trt2 22 5.12 trt2 23 5.54 trt2 24 5.50 trt2 25 5.37 trt2 26 5.29 trt2 27 4.92 trt2 28 6.15 trt2 29 5.80 trt2 30 5.26 trt2
Suppose one wants to use Bartlett’s test to determine whether the variance in weight is the same for all treatment groups at a significance level of 0.05. Here let’s consider only one independent variable. To perform the test, use the below command:
Bartlett test of homogeneity of variances data: weight by group Bartlett's K-squared = 2.8786, df = 2, p-value = 0.2371
From the output, it can be seen that the p-value of 0.2371 is not less than the significance level of 0.05. This means the null hypothesis can not be rejected that the variance is the same for all treatment groups. This concludes that there is no proof to recommend that the variance in plant growth is different for the three treatment groups.
Bartlett’s test with multiple independent variables:
If one wants to do the test with multiple independent variables then the interaction() function must be used to collapse multiple factors into a single variable containing all combinations of the factors. Here let’s take the R’s inbuilt ToothGrowth data set.
len supp dose 1 4.2 VC 0.5 2 11.5 VC 0.5 3 7.3 VC 0.5 4 5.8 VC 0.5 5 6.4 VC 0.5 6 10.0 VC 0.5 7 11.2 VC 0.5 8 11.2 VC 0.5 9 5.2 VC 0.5 10 7.0 VC 0.5 Bartlett test of homogeneity of variances data: len by interaction(supp, dose) Bartlett's K-squared = 6.9273, df = 5, p-value = 0.2261