How to Calculate Cohen’s Kappa in R

Last Updated : 19 Mar, 2024

In this article, we will discuss What is Cohen’s Kappa and How to Calculate Cohen’s Kappa in the R Programming Language.

What is Cohen’s Kappa?

Cohen’s Kappa is a statistical measure used to assess inter-rater reliability or agreement between two raters when dealing with categorical data. It quantifies the level of agreement between the raters by taking into account the agreement that could be expected by chance alone. It’s particularly useful when assessing agreement on subjective judgments or classifications.

Why is Cohen’s Kappa Important?

Cohen’s Kappa is important because it helps ensure that different people making subjective judgments agree consistently. This is crucial in fields where opinions or classifications vary. By using Cohen’s Kappa, researchers or professionals can check if the agreement between raters is real or just by chance. It helps ensure that the data collected is reliable and trustworthy.

The Role of Categorical Agreement

Categorical agreement refers to the degree to which two raters assign the same category or label to a given set of data. In Cohen’s Kappa calculation, categorical agreement serves as the foundation for evaluating the level of agreement between the raters. The Kappa statistic compares the observed level of agreement with the level of agreement expected by chance alone.

The formula for Cohen’s Kappa is

k=Po-Pe/1-Pe

Where:

P_ois the observed proportion of agreement between the raters.

P_e is the expected proportion of agreement by chance.

Observed Agreement(Po)

Observed agreement refers to the proportion of cases in which two raters or methods agree on the categorization or classification of items. It represents the actual, observed instances where both raters provide the same classification.

Example: Suppose two medical professionals independently examine a set of X-ray images and categorize each image as either showing signs of a specific condition or not. If they both agree on the classification for 80 out of 100 X-ray images, then the observed agreement is 80%.

Expected Agreement(Pe)

Expected agreement is what we would expect to happen by chance. It considers how often they would agree just by guessing, based on the overall probability of each choice.

Example: In the X-ray example, if the prevalence of the condition in the dataset is 30%, and both raters are assigning categories randomly based on this prevalence, the expected agreement can be calculated. If 30 out of 100 X-ray images are expected to show signs of the condition, and both raters are randomly classifying them, the expected agreement for this category can be determined. This process is repeated for each category.

Cohen’s Kappa ranges from -1 to 1:

k=1: Perfect agreement beyond chance.
k=0: Agreement equal to that expected by chance alone.
k=−1: Perfect disagreement beyond chance.

By comparing observed and expected agreement, Cohen’s Kappa provides a normalized measure of agreement that accounts for the possibility of chance agreement. Categorical agreement is crucial in this context because it forms the basis for understanding the level of agreement between raters, which is then used to calculate Kappa. The Kappa coefficient helps researchers assess the reliability and validity of categorical assignments, taking into account what could be expected due to random chance.

Interpretation of Cohen’s Kappa values

Almost Perfect Agreement (0.81 – 1.00):
- Indicates very high agreement between the raters.
- Almost all observed agreement is due to actual agreement, with minimal disagreement.
Substantial Agreement (0.61 – 0.80):
- Represents a strong level of agreement between raters.
- A significant portion of the observed agreement is beyond what would be expected by chance.
Moderate Agreement (0.41 – 0.60):
- Suggests a moderate level of agreement.
- There is agreement, but there is still a notable amount of variability that cannot be attributed to agreement alone.
Fair Agreement (0.21 – 0.40):
- Indicates a fair level of agreement.
- Some agreement is present, but it may not be strong, and a substantial amount of variability exists.
Slight Agreement (0.00 – 0.20):
- Represents a slight level of agreement.
- The observed agreement is minimal, and most of it could be due to chance.
Poor Agreement (< 0.00):
- Signifies poor agreement, meaning the observed agreement is less than what would be expected by chance alone.

Let’s consider a scenario where two doctors are assessing the presence or absence of a specific medical condition (Condition X) in a set of patients. Each patient is either diagnosed as having the condition (Positive) or not having the condition (Negative). The two doctors independently review a sample of 100 patients, and we want to assess the agreement between their diagnoses using Cohen’s Kappa.

Photos-24-02-2024-17_04_37

Here,

a (60) represents the number of patients where both doctors agree on a positive diagnosis.
d (15) represents the number of patients where both doctors agree on a negative diagnosis.
b (10) represents the number of patients where Doctor 1 diagnoses as positive, but Doctor 2 diagnoses as negative.
c (15) represents the number of patients where Doctor 1 diagnoses as negative, but Doctor 2 diagnoses as positive.

Calculation of Cohen’s Kappa

1.Calculate Observed Agreement (Po):

P_o= a+d/a+b+c+d

P_o= 60+15/60+10+15+15

P_o= 75/100

P_o= 0.75

2.Calculate Agreement Expected by Chance (Pe):

P_o= (a+b)*(a+c)*(c+d)*(b+d)/(a+b+c+d)²

P_o= (60+10)*(60+15)*(15+15)*(10+15)/(60+10+15+15)²

P_o= 70*75*30*25/100²

P_o= 5250*750/10000

P_o= 6000/10000

P_o= 0.6

3.Calculate Cohen’s Kappa:

k = Po-Pe/1-Pe

k = 0.75-0.6/1-0.6

k = 0.15/0.4

k = 0.375

Therefore, Cohen’s Kappa for the two doctors’ diagnoses of Condition X is 0.375. The interpretation of this value would depend on the context, but generally, values above 0.6 are considered substantial agreement. In this case, there is a moderate level of agreement between the two doctors in diagnosing Condition X.

Calculation of Cohen’s Kappa in R

We can calculate Cohen’s Kappa in R using functions from packages such as irr (for inter-rater reliability), psych (for psychological statistics) or vcd (Visualizing Categorical Data) package.

Calculate Cohen’s Kappa in R using ‘irr’ package

Install and Load ‘irr’ package

install.packages("irr")
library(irr)

2. Create a matrix or data frame containing the ratings from each rater

ratings <- data.frame(
  Rater1 = c(1, 2, 3, 2, 1),
  Rater2 = c(1, 2, 3, 1, 1)
)

3. Calculate Cohen’s Kappa and Print the result

kappa_result <- kappa2(ratings, weight = "unweighted")
print(kappa_result)

Output:

Cohen's Kappa for 2 Raters (Weights: unweighted)

 Subjects = 5 
   Raters = 2 
    Kappa = 0.688 

        z = 2.28 
  p-value = 0.0224

First Installs and loads the irr package.

Then creates a data frame named ratings with ratings from two raters (you can replace this with your own data).
Calculates Cohen’s Kappa using the kappa2() function from the irr package, specifying “unweighted” for equal weight for all disagreements.
Prints the result, which includes Kappa value, z-value, and p-value.
Kappa = 0.688 indicates the Cohen’s Kappa value, which suggests substantial agreement.
z = 2.28 is the z-value, representing the deviation from expected chance agreement.
p-value = 0.0224 is the associated p-value, which is less than 0.05, showing that the observed agreement is statistically significant.

Calculate Cohen’s Kappa in R using vcd package

1.Install and load the ‘vcd’ package:

install.packages("vcd")
library(vcd)

data <- data.frame(
  Doctor1 = c(1, 1, 0, 1, 0, 1, 0, 0, 1, 1),
  Doctor2 = c(1, 1, 1, 1, 0, 1, 0, 0, 1, 0)
)

table_data <- table(data$Doctor1, data$Doctor2)
print(table_data)

Output:

    0 1
  0 3 1
  1 1 5

Use the kappa2 function to calculate Cohen’s Kappa

# Calculate Cohen's Kappa
kappa_result <- kappa2(table_data)
print(kappa_result)

Output:

 Cohen's Kappa for 2 Raters (Weights: unweighted)
 Subjects = 2 
   Raters = 2 
    Kappa = -0.333 
        z = -1.41 
  p-value = 0.157

Displays counts of observations for each combination of ratings.

Example: 3 cases with ratings 0 from both raters, 5 cases with ratings 1 from both raters, 1 case with rating 0 from one rater and rating 1 from the other, and vice versa.
2 subjects and 2 raters involved.
Kappa value: -0.333. Indicates disagreement between raters, potentially worse than chance.
z-value: -1.41. Indicates observed agreement is lower than expected by chance.
p-value: 0.157. Suggests observed agreement may not be statistically significant.

Calculate Cohen’s Kappa in R using psych package

# Install and load the 'psych' package
install.packages("psych")
library(psych)

# Create a matrix or data frame containing the ratings from each rater
ratings <- matrix(c(1, 2, 3, 2, 1, 
                    1, 1, 2, 3, 3), 
                  ncol = 2, byrow = TRUE)

# Calculate Cohen's Kappa
kappa_result <- cohen.kappa(ratings)

# Print the result
print(kappa_result)

Output:

Call: cohen.kappa1(x = x, w = w, n.obs = n.obs, alpha = alpha, levels = levels)

Cohen Kappa and Weighted Kappa correlation coefficients and confidence boundaries 
                  lower estimate upper
unweighted kappa -0.089     0.25  0.59
weighted kappa    0.128     0.57  1.00

 Number of subjects = 5

Install and load the “psych” package in R.

Generate or input data representing ratings from two raters into a matrix or data frame.
Use the cohen.kappa() function to calculate Cohen’s Kappa.
Interpret the output, which includes estimates and confidence intervals for unweighted and weighted kappa.
Assess the level of agreement based on the estimated kappa values.

For unweighted kappa:

The lower boundary is -0.089.
The estimate of kappa is 0.25.
The upper boundary is 0.59.

For weighted kappa:

The lower boundary is 0.128.
The estimate of kappa is 0.57.
The upper boundary is 1.00.

For unweighted kappa, the estimate of kappa is 0.25. This suggests slight to fair agreement.

For weighted kappa, the estimate of kappa is 0.57. This indicates moderate to substantial agreement.

Challenges in interpreting Cohen’s Kappa

Subjectivity: Categorical judgments can vary among raters due to subjectivity.
Small Sample Size: Cohen’s Kappa may produce unreliable estimates with small sample sizes, as it relies on observed and expected agreement frequencies.
Unequal Marginal Distributions: Disproportionate category distributions can skew Kappa estimates.
Ordinal Categories: Sometimes Assumes equal intervals between categories, which may not always hold true.
Interpretation: Interpreting Kappa values can be tricky, and what’s considered acceptable agreement might change based on the situation.
Sensitivity to Category Definitions: Small changes in category definitions or thresholds can significantly impact Kappa values.
Dependence on Rater Expertise: Kappa values may vary based on the expertise or training of the raters involved in the assessment.

Conclusion

Cohen’s Kappa is a important tool for assessing inter-rater agreement in various fields. By accounting for chance agreement, it provides a more accurate measure of reliability than simple agreement percentages. Applying Cohen’s Kappa can enhance the quality and validity of research findings, ensuring consistency and thoroughness in categorical judgments.

Suggest improvement

How to Calculate Cohen’s d in R

Share your thoughts in the comments

How to Calculate Cohen’s Kappa in R

What is Cohen’s Kappa?

Why is Cohen’s Kappa Important?

The Role of Categorical Agreement

The formula for Cohen’s Kappa is

Observed Agreement(Po)

Expected Agreement(Pe)

Interpretation of Cohen’s Kappa values

Calculation of Cohen’s Kappa

Calculation of Cohen’s Kappa in R

Calculate Cohen’s Kappa in R using ‘irr’ package

2. Create a matrix or data frame containing the ratings from each rater

3. Calculate Cohen’s Kappa and Print the result

Calculate Cohen’s Kappa in R using vcd package

Calculate Cohen’s Kappa in R using psych package

Challenges in interpreting Cohen’s Kappa

Conclusion

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?