Open In App

Receiver Operating Characteristic (ROC) with Cross Validation in Scikit Learn

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will implement ROC with Cross-Validation in Scikit Learn. Before we jump into the code, let’s first understand why we need ROC curve and Cross-Validation in Machine Learning model predictions. 

Receiver Operating Characteristic Curve (ROC Curve)

To understand the ROC curve one must be familiar with terminologies such as True Positive, False Positive, True Negative, and False Negative. ROC curve is a pictorial or graphical plot that indicates a False Positive vs True Positive relation, where False Positive is on the X axis and True Positive is on the Y axis. In this context, the False Positive rate is denoted as Specificity and the True Positive rate is denoted as Sensitivity. 

Sensitivity = TP/(TP+FN)

Specificity = TN/(TN+FP)

The top left corner of the ROC curve denotes the ideal point, where the False Positive Rate is 0 and the True Positive Rate is 1. You don’t usually get 1, but a score close to 1 is considered to be a good score. 

 

ROC curve can be used as evaluation metrics for the Classification based model. It works well when the target classification is Binary. 

Cross Validation 

In Machine Learning splitting the dataset into training and testing might be troublesome sometimes. Cross Validation is a technique using which we select the batches of the different training sets and fit them into the model. This in return helps in generalizing the model and is less prone to overfitting. The commonly used Cross Validation methods are KFold, StratifiedKFold, RepeatedKFold, LeaveOneGroupOut, and GroupKFold. 

We shall now implement the cross-validation technique to understand the ROC curve on different samples of the dataset. 

Receiver Operating Characteristic (ROC) with Cross-Validation in Scikit Learn

Before we proceed to implement the code, make sure you have downloaded the sklearn Python module.

pip install -U scikit-learn

Import the required libraries

Here we will import some useful Python libraries like NumPy, Matplotlib, SKlearn for performing complex computational tasks in a few lines of code.

Python3




import numpy as np
import matplotlib.pyplot as plt
 
from sklearn import datasets
from sklearn.metrics import roc_curve, auc,roc_auc_score
from sklearn.metrics import RocCurveDisplay
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold


Read the Data

SKlearn provides various toy datasets from which we are loading breast_cancer dataset for our article.

Python3




data = datasets.load_breast_cancer()
X = data.data
y = data.target
 
print(X.shape)
print(y.shape)


Output:

(569, 30)
(569,)

Define The Cross Validation and Model

In our case, we shall use KFold cross-validation and Logistic Regression since the data end target is Binary Classification. 

Python3




cross_val = KFold(n_splits=6, random_state=42, shuffle=True)
model = LogisticRegression()


Initialize True Positive Rate and Area Under Curve

Since we are using Cross Validation, we will have different samples of training sets. So we will define the mean False Positive rate, True Positive Rate, and Area under Curve as a list or array.

Python3




tprs, aucs = [], []
mean_fpr = np.linspace(0, 1, 100)


Plot ROC Curve for every Cross Validation Split

Sklearn provides ROC Curve display metrics that take in the model and testing data as the argument to calculate the ROC curve on the given dataset. True positive and Area Under curve is updated on each split. 

Python3




fig, ax = plt.subplots()
for index, (train, test) in enumerate(cross_val.split(X, y)):
    model.fit(X[train], y[train])
    plot = RocCurveDisplay.from_estimator(
        model, X[test], y[test],
        name="ROC fold {}".format(index),
        ax=ax,
    )
    interp_tpr = np.interp(mean_fpr, plot.fpr, plot.tpr)
    interp_tpr[0] = 0.0
    tprs.append(interp_tpr)
    aucs.append(plot.roc_auc)
 
ax.set(
    xlim=[-0.05, 1.05],
    ylim=[-0.05, 1.05],
    title="Receiver operating characteristic with CV",
)
plt.savefig("roc_cv.jpeg")


Output:

 



Last Updated : 03 Mar, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads