How to Perform Dunn’s Test in Python

Dunn's test is a statistical procedure used for multiple comparisons following a Kruskal-Wallis test. Here's a breakdown of what it does and when it's used:

Table of Content

Dunn’s Test
What is the Kruskal-Wallis test?
Key points about Dunn's test
How to Perform Dunn’s Test with Python
Step-by-Step Guide to Perform Dunn’s Test in Python
Frequently Asked Questions on Dunn’s Test

Dunn’s Test

Dunn’s Test is used after the Kruskal-Wallis one-way analysis of variance by ranks to identify which groups differ from each other. It determines whether the difference between the medians of various groups is statistically significant. Dunn’s Test adjusts for multiple comparisons, making it suitable for analyzing data with several groups.

Dunn’s Test is a non-parametric statistical test used for comparing multiple groups to each other. It's particularly useful when analyzing data with unequal sample sizes or when the assumption of normality is violated.

What is the Kruskal-Wallis test?

The Kruskal-Wallis test is a non-parametric statistical test used to determine whether there are statistically significant differences between three or more independent groups. If the Kruskal-Wallis test indicates significant differences, Dunn's test can be applied post-hoc to identify which specific pairs of groups differ significantly from each other. Dunn's test is tailored for pairwise comparisons following a significant result in the Kruskal-Wallis test, providing insights into specific group differences.

Key points about Dunn's test

Purpose: Dunn's test is used to identify which specific groups differ from each other when there are statistically significant differences detected between groups in the omnibus test.
Non-parametric: Like the Kruskal-Wallis and Friedman tests, Dunn's test is non-parametric, meaning it does not rely on assumptions about the distribution of the data.
Procedure: Dunn's test calculates pairwise comparisons between all groups using a rank-based approach. It computes the difference in ranks between pairs of groups and adjusts the p-values for multiple comparisons using methods such as the Bonferroni correction.
Interpretation: If the adjusted p-value for a pairwise comparison is below a predetermined significance level (e.g., 0.05), it indicates that the difference between those two groups is statistically significant.
Interpretation: If the adjusted p-value for a pairwise comparison is below a predetermined significance level (e.g., 0.05), it indicates that the difference between those two groups is statistically significant.

Overall, Dunn's test provides a valuable tool for identifying specific group differences in situations where traditional parametric tests are not appropriate or when dealing with ranked data. It helps researchers gain deeper insights into the relationships between multiple groups in their data.

How to Perform Dunn’s Test with Python

In Python, the scikit-posthocs library provides an efficient way to conduct Dunn’s Test. This article will guide you through the process of performing Dunn’s Test in Python, step by step.

Syntax to install posthocs library:

! pip install scikit-posthocs

posthoc_dunn() Function:

Syntax:

scikit_posthocs.posthoc_dunn(a, val_col: str = None, group_col: str = None, p_adjust: str = None, sort: bool = True)

Parameters:
a : it's an array type object or a dataframe object or series.
group_col : column of the predictor or the dependent variable
p_adjust: P values can be adjusted using this method. it's a string type possible values are :
'bonferroni'
hommel
holm-sidak
holm
simes-hochberg and more...
Returns: p-values.

Hypotheses:

This is a hypotheses test and the two hypotheses are as follows:

Null hypothesis: The given sample have the same median
Alternative hypothesis: The given sample has a different median.

Step-by-Step Guide to Perform Dunn’s Test in Python

1. Import Necessary Libraries

Import the required libraries for data manipulation and Dunn’s Test:

Python3

# Importing necessary packages and modules
import pandas as pd
import scikit_posthocs as sp
from sklearn.datasets import load_iris

2. Load Your Dataset

Load your dataset into a pandas DataFrame. Ensure your data is structured appropriately for comparison:

Python3

# Load the dataset
iris_dataset = load_iris(as_frame=True)
dataset = iris_dataset.frame
print(dataset.head())

Output:

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   target  
0       0  
1       0  
2       0  
3       0  
4       0

3. Prepare Your Data

Extract the data you want to compare. In this example, let's compare sepal widths among different species:

Python3

# Data containing sepal width of the three species
data = [dataset[dataset['target'] == 0]['sepal width (cm)'],
        dataset[dataset['target'] == 1]['sepal width (cm)'],
        dataset[dataset['target'] == 2]['sepal width (cm)']]

4. Perform Dunn’s Test:

Python3

# Using the posthoc_dunn() function
p_values = sp.posthoc_dunn(data, p_adjust='holm')

print(p_values)

Output:

              1             2             3
1  1.000000e+00  2.047087e-14  1.536598e-07
2  2.047087e-14  1.000000e+00  1.580934e-02
3  1.536598e-07  1.580934e-02  1.000000e+00

For the difference between groups 1 and 2, the adjusted p-value is 2.047087e-14
For the difference between groups 1 and 3, the adjusted p-value is 1.536598e-07
For the difference between groups 2 and 3, the adjusted p-value is 1.580934e-02

5. Compare with significance level

Let's assume our significance level is 0.05, So, we will check if p_values less than the chosen significance level indicate statistically significant differences between groups.

Python3

print(p_values <0.05)

Output:

       1      2      3
1  False   True   True
2   True  False   True
3   True   True  False

This indicates that:

Group 1 is significantly different from Group 2 and Group 3.
Group 2 is significantly different from Group 1 and Group 3.
Group 3 is significantly different from Group 1 and Group 2.

Conclusion

Performing Dunn’s Test in Python using the scikit-posthocs library is straightforward and efficient. By following the steps outlined in this article, you can accurately assess the differences between multiple groups in your dataset. Dunn’s Test is a valuable tool for post hoc analysis, providing insights into group comparisons beyond traditional statistical methods.

Frequently Asked Questions on Dunn’s Test

Q. When should we perform Dunn’s Test?

We should perform Dunn's Test when we have conducted a Kruskal-Wallis test and found a significant difference among group medians. Dunn's Test helps identify which specific pairs of groups have significantly different medians through pairwise comparisons.

Q. Difference between Dunn's test and Dunnett's test?

Dunn's test compares multiple groups with one control group, while Dunnett's test compares each group with a single control group. Dunn's test is often used with non-parametric data, whereas Dunnett's test is used in parametric analysis.

Q. What is Dunn's multiple comparisons post hoc test?

Dunn's test is a non-parametric post-hoc test used after obtaining a significant result in the Kruskal-Wallis test. It performs pairwise comparisons between groups to identify which pairs have significantly different medians.

Q. What is the null hypothesis of the Dunn's test?

The null hypothesis of Dunn's test states that there are no significant differences between the medians of the compared groups. It assumes that the observed differences are due to random variation rather than true differences in group medians.

Article Tags :

AI-ML-DS

Geeks Premier League

Machine Learning

Statistics

Geeks-Premier-League-2022

Python scikit-module