Open In App

How to Calculate Matthews Correlation Coefficient in R

Last Updated : 15 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

The correlation coefficient is a statistical measure used to quantify the relationship between two variables. It indicates the strength and direction of the linear association between them. The range of coefficient values is from -1 to 1.

It is denoted as ‘𝑟’

  • 𝑟 = 1 indicates a perfect positive linear relationship.
  • 𝑟 = −1 indicates a perfect negative linear relationship.
  • 𝑟 = 0 indicates no linear relationship.

The most commonly used correlation coefficient is the Pearson correlation coefficient.

Which is calculated using the following formula

[Tex]r = \frac{\sum (x_i – \bar{x}) (y_i – \bar{y})}{\sqrt{\sum (x_i – \bar{x})^2 \sum (y_i – \bar{y})^2}} [/Tex]

Where:

  • 𝑟 is the Pearson correlation coefficient.
  • 𝓍𝒾 and 𝓎𝒾 are the individual data points.
  • and ȳ are the means of the variables 𝓍 and 𝓎, respectively.

This formula computes the correlation coefficient 𝑟 between two variables 𝓍 and 𝓎, with values ranging from -1 to 1. A correlation of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship.

Student

Height(X)

Weight(Y)

1

63

127

2

65

140

3

67

155

4

69

160

5

71

170

Now calculate the correlation coefficient between height and weight for these students.

Step 1: First, we need to calculate the mean (x̄ and ȳ):

x̄ = 63+65+67+69+71/5 = 67 inches

ȳ = 127+140+155+160+170/5 = 150.4 pounds

Step 2: Next, we calculate the correlation coefficient using the formula:

(63-67)(127-150.4)+(65-67)(140-150.4)+(67-67)(155-150.4)+(69-67)(160-150.4)+(71-67)(170+150.4)

——————————————————————————————

√(63-67)2+(65-67)2+(67-67)2+(69-67)2+(71-67)2 √(127-150.4)2+(140-150.4)2+(155-150.4)2+(160-150.4)2+(170-150.4)2

𝑟 = 93.6+20.8+0+19.2+78.4/√40.√1166.76

𝑟 ≈ 212/216.804

r ≈ 0.978

So, the correlation coefficient ≈ 0.978

In R, you can calculate the Pearson correlation coefficient using the cor() function

R

# Sample data for heights and weights of students height <- c(63, 65, 67, 69, 71) weight <- c(127, 140, 155, 160, 170) # Calculate correlation coefficient using cor() function correlation_coefficient <- cor(height, weight) # Print the correlation coefficient print(correlation_coefficient)

Output:

[1] 0.9870827

Here is two vectors height and weight representing the heights and weights of five students.

  • We use the cor() function to calculate the correlation coefficient between the two variables.
  • The result is stored in the variable correlation_coefficient.
  • Finally, print the correlation coefficient using print().

What is Matthews Correlation Coefficient(MCC)

The Matthews correlation coefficient (MCC) is a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is regarded as a balanced measure even if the classes are of very different sizes. MCC is particularly useful when classes are imbalanced.

MCC values range from -1 to 1, where

  • 1 indicates a perfect prediction
  • 0 indicates a random prediction
  • -1 indicates total disagreement between prediction and observation

The MCC is calculated by using the following formula

[Tex]MCC = \frac{(TP \times TN – FP \times FN)}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}} [/Tex]

Where:

  1. TP (True Positives) is the number of correctly predicted positive examples.
  2. TN (True Negatives) is the number of correctly predicted negative examples.
  3. FP (False Positives) is the number of incorrectly predicted positive examples.
  4. FN (False Negatives) is the number of incorrectly predicted negative examples.


Predicted Negative

Predicted Positive

Actual Negative

50

10

Actual Positive

5

135

In this confusion matrix:

  • True Negatives (TN) = 50
  • False Positives (FP) = 10
  • False Negatives (FN) = 5
  • True Positives (TP) = 135

For calculate MCC , we can use the formula

MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)

MCC = (135*50-10*5)/√(135+10)(135+5)(50+10)(50+5)

MCC = (6750-50)/√(145)(140)(60)(55)

MCC= 6700/8184.74

MCC = 0.8185

So, the correct Matthews correlation coefficient (MCC) for this example is approximately 0.8185.

Calculate of Matthews Correlation Coefficient in R

To calculate the Matthews correlation coefficient (MCC) in R Programming Language, we can use the ‘mltools’ package or the mcc() function from the ‘pracma’ package.

Calculate Matthews Correlation Coefficient in R Using ‘mltools’ package

R

install.packages("mltools") library(mltools) actual <- rep(c(1, 0), times=c(20, 380)) preds <- rep(c(1, 1, 0, 0), times=c(15, 5, 5, 75)) mcc(preds, actual)

Output:

[1] 0.4588315

install.packages(“mltools”): This command installs the mltools package from CRAN if it is not already installed.

  • library(mltools): After installation, the library() function loads the mltools package into the R session, making its functions available for use.
  • actual <- rep(c(1, 0), times=c(20, 380)) creates a vector actual consisting of 20 instances of class 1 followed by 380 instances of class 0, using the rep() function to repeat values.
  • preds <- rep(c(1, 1, 0, 0), times=c(15, 5, 5, 75)) creates a vector preds representing predicted labels. It contains 15 instances of class 1, 5 instances of class 1, 5 instances of class 0, and 75 instances of class 0, using the rep() function similarly to the actual vector.

mcc(preds, actual): It calculates the Matthews correlation coefficient (MCC) between the preds and actual vectors using the mcc() function provided by the ‘mltools’ package

Calculate Matthews Correlation Coefficient in R Using ‘pracma’ package

R

# Install and load required packages install.packages("pracma") library(pracma) # Create some example data actual <- c(1, 0, 1, 0, 1) predicted <- c(1, 0, 0, 1, 1) # Calculate MCC mcc_value <- mcc(as.logical(actual), as.logical(predicted)) # Print MCC print(mcc_value)

Output:

[1] 0.1666667

install.packages(“pracma”): This line installs the pracma package from CRAN if it’s not already installed.

  • library(pracma): this line loads the pracma package into the R session, making its functions available for use.
  • actual <- c(1, 0, 1, 0, 1) creates a vector actual containing actual labels, where 1 represents one class and 0 represents another class.
  • predicted <- c(1, 0, 0, 1, 1) creates a vector predicted containing predicted labels corresponding to the actual labels.

mcc_value <- mcc(as.logical(actual), as.logical(predicted)): This line calculates the Matthews correlation coefficient (MCC) between the actual and predicted vectors using the mcc() function from the ‘pracma’ package.

Calculate Matthews Correlation Coefficient in R Using ‘caret’ package

R

# Load required package install.packages("caret") library(caret) # Generate example data actual <- c(1, 0, 1, 0, 1) # Actual labels predicted <- c(1, 0, 0, 1, 1) # Predicted labels # Create confusion matrix conf_matrix <- confusionMatrix(as.factor(predicted), as.factor(actual)) # Extract values from confusion matrix TP <- conf_matrix$table[2, 2] # True Positives TN <- conf_matrix$table[1, 1] # True Negatives FP <- conf_matrix$table[2, 1] # False Positives FN <- conf_matrix$table[1, 2] # False Negatives # Calculate Matthews correlation coefficient (MCC) mcc <- (TP * TN - FP * FN) / sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN)) # Print MCC print(mcc)

Output:

[1] 0.1666667

install.packages(“caret”): This line installs the caret package from CRAN if it’s not already installed.

  • library(caret): After installation, this line loads the caret package into the R session, making its functions available for use.
  • actual <- c(1, 0, 1, 0, 1): This line creates a vector actual containing actual labels, where 1 represents one class and 0 represents another class.
  • predicted <- c(1, 0, 0, 1, 1): This line creates a vector predicted containing predicted labels corresponding to the actual labels.
  • conf_matrix <- confusionMatrix(as.factor(predicted), as.factor(actual)): This line creates a confusion matrix using the confusionMatrix() function from the caret package. It takes the predicted and actual labels as inputs.
  • TP <- conf_matrix$table[2, 2]: true positives from the confusion matrix.
  • TN <- conf_matrix$table[1, 1]: true negatives from the confusion matrix.
  • FP <- conf_matrix$table[2, 1]: false positives from the confusion matrix.
  • FN <- conf_matrix$table[1, 2]: false negatives from the confusion matrix.

This line calculates the Matthews correlation coefficient (MCC) using the formula provided earlier in this conversation. It uses the extracted values of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from the confusion matrix.

Conclusion

In conclusion, Matthews correlation coefficient (MCC) in R is a robust metric for binary classification model evaluation. It contains true and false positives/negatives, even with imbalanced data. It’s a reliable tool for assessing model performance and comparing different algorithms or experiments.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads