Open In App

How to Calculate Matthews Correlation Coefficient in R

The correlation coefficient is a statistical measure used to quantify the relationship between two variables. It indicates the strength and direction of the linear association between them. The range of coefficient values is from -1 to 1.

It is denoted as '𝑟'

The most commonly used correlation coefficient is the Pearson correlation coefficient.

Which is calculated using the following formula

[Tex]r = \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} [/Tex]

Where:

This formula computes the correlation coefficient 𝑟 between two variables 𝓍 and 𝓎, with values ranging from -1 to 1. A correlation of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship.

Student

Height(X)

Weight(Y)

1

63

127

2

65

140

3

67

155

4

69

160

5

71

170

Now calculate the correlation coefficient between height and weight for these students.

Step 1: First, we need to calculate the mean (x̄ and ȳ):

x̄ = 63+65+67+69+71/5 = 67 inches

ȳ = 127+140+155+160+170/5 = 150.4 pounds

Step 2: Next, we calculate the correlation coefficient using the formula:

(63-67)(127-150.4)+(65-67)(140-150.4)+(67-67)(155-150.4)+(69-67)(160-150.4)+(71-67)(170+150.4)

------------------------------------------------------------------------------------------

√(63-67)2+(65-67)2+(67-67)2+(69-67)2+(71-67)2 √(127-150.4)2+(140-150.4)2+(155-150.4)2+(160-150.4)2+(170-150.4)2

𝑟 = 93.6+20.8+0+19.2+78.4/√40.√1166.76

𝑟 ≈ 212/216.804

r ≈ 0.978

So, the correlation coefficient ≈ 0.978

In R, you can calculate the Pearson correlation coefficient using the cor() function

# Sample data for heights and weights of students
height <- c(63, 65, 67, 69, 71)
weight <- c(127, 140, 155, 160, 170)

# Calculate correlation coefficient using cor() function
correlation_coefficient <- cor(height, weight)

# Print the correlation coefficient
print(correlation_coefficient)

Output:

[1] 0.9870827

Here is two vectors height and weight representing the heights and weights of five students.

What is Matthews Correlation Coefficient(MCC)

The Matthews correlation coefficient (MCC) is a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is regarded as a balanced measure even if the classes are of very different sizes. MCC is particularly useful when classes are imbalanced.

MCC values range from -1 to 1, where

The MCC is calculated by using the following formula

[Tex]MCC = \frac{(TP \times TN - FP \times FN)}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}} [/Tex]

Where:

  1. TP (True Positives) is the number of correctly predicted positive examples.
  2. TN (True Negatives) is the number of correctly predicted negative examples.
  3. FP (False Positives) is the number of incorrectly predicted positive examples.
  4. FN (False Negatives) is the number of incorrectly predicted negative examples.


Predicted Negative

Predicted Positive

Actual Negative

50

10

Actual Positive

5

135

In this confusion matrix:

For calculate MCC , we can use the formula

MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)

MCC = (135*50-10*5)/√(135+10)(135+5)(50+10)(50+5)

MCC = (6750-50)/√(145)(140)(60)(55)

MCC= 6700/8184.74

MCC = 0.8185

So, the correct Matthews correlation coefficient (MCC) for this example is approximately 0.8185.

Calculate of Matthews Correlation Coefficient in R

To calculate the Matthews correlation coefficient (MCC) in R Programming Language, we can use the 'mltools' package or the mcc() function from the 'pracma' package.

Calculate Matthews Correlation Coefficient in R Using 'mltools' package

install.packages("mltools")
library(mltools)
actual <- rep(c(1, 0), times=c(20, 380))
preds <- rep(c(1, 1, 0, 0), times=c(15, 5, 5, 75))
mcc(preds, actual)

Output:

[1] 0.4588315

install.packages("mltools"): This command installs the mltools package from CRAN if it is not already installed.

mcc(preds, actual): It calculates the Matthews correlation coefficient (MCC) between the preds and actual vectors using the mcc() function provided by the 'mltools' package

Calculate Matthews Correlation Coefficient in R Using 'pracma' package

# Install and load required packages
install.packages("pracma")
library(pracma)

# Create some example data
actual <- c(1, 0, 1, 0, 1)
predicted <- c(1, 0, 0, 1, 1)

# Calculate MCC
mcc_value <- mcc(as.logical(actual), as.logical(predicted))

# Print MCC
print(mcc_value)

Output:

[1] 0.1666667

install.packages("pracma"): This line installs the pracma package from CRAN if it's not already installed.

mcc_value <- mcc(as.logical(actual), as.logical(predicted)): This line calculates the Matthews correlation coefficient (MCC) between the actual and predicted vectors using the mcc() function from the 'pracma' package.

Calculate Matthews Correlation Coefficient in R Using 'caret' package

# Load required package
install.packages("caret")
library(caret)

# Generate example data
actual <- c(1, 0, 1, 0, 1)  # Actual labels
predicted <- c(1, 0, 0, 1, 1)  # Predicted labels

# Create confusion matrix
conf_matrix <- confusionMatrix(as.factor(predicted), as.factor(actual))

# Extract values from confusion matrix
TP <- conf_matrix$table[2, 2]  # True Positives
TN <- conf_matrix$table[1, 1]  # True Negatives
FP <- conf_matrix$table[2, 1]  # False Positives
FN <- conf_matrix$table[1, 2]  # False Negatives

# Calculate Matthews correlation coefficient (MCC)
mcc <- (TP * TN - FP * FN) / sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))

# Print MCC
print(mcc)

Output:

[1] 0.1666667

install.packages("caret"): This line installs the caret package from CRAN if it's not already installed.

This line calculates the Matthews correlation coefficient (MCC) using the formula provided earlier in this conversation. It uses the extracted values of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from the confusion matrix.

Conclusion

In conclusion, Matthews correlation coefficient (MCC) in R is a robust metric for binary classification model evaluation. It contains true and false positives/negatives, even with imbalanced data. It's a reliable tool for assessing model performance and comparing different algorithms or experiments.

Article Tags :