Probability Calibration for 3-class Classification in Scikit Learn

Last Updated : 08 Jun, 2023

Probability calibration is a technique to map the predicted probabilities of a model to their true probabilities. The probabilities predicted by some classification algorithms like Logistic Regression, SVM, or Random Forest may not be well calibrated, meaning they may not accurately reflect the true probabilities of the target classes. This can lead to incorrect conclusions when using the predicted probabilities for decision-making.

Probability Calibration

Probability calibration refers to the process of adjusting the predicted probabilities of a classification model to better match the true probabilities of the target variable. The goal of calibration is to ensure that the probabilities assigned by the model to different classes are accurate and can be used to make reliable predictions.

In other words, a well-calibrated model is one that produces predicted probabilities that are close to the actual probabilities of the target variable. For example, if a model predicts that there is a 70% chance of an event occurring, we would expect that the event actually occurs about 70% of the time.

Calibration is particularly important when the predicted probabilities are used to make decisions, such as in medical diagnoses or financial risk assessments. If the model is poorly calibrated, it can lead to incorrect decisions and potentially harmful outcomes.

Multiclass classification: It is a classification task where the goal is to assign input data into three or more classes.
Platt Scaling: It is a popular method for probability calibration that fits a sigmoid function to the predicted probabilities of a model.
Isotonic Regression: It is a non-parametric method for probability calibration that fits a monotonic function to the predicted probabilities of a model.

Example 1:

Step :

Load the necessary libraries
Generate a synthetic 3-class classification dataset
Split the data into training and testing sets
Find the predicted probabilities on the testing set
Compute the calibration curve for each class
Plot the calibration curves

Python3

# Load the necessary libraries 
from sklearn.datasets import make_classification 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression 
from sklearn.calibration import calibration_curve 
import matplotlib.pyplot as plt 
  
# Generate a synthetic 3-class classification dataset 
X, y = make_classification(n_samples=1000, 
                           n_classes=3,  
                           n_features=10,  
                           n_informative=5,  
                           random_state=42) 
  
# Split the data into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split(X, y,  
                                                    test_size=0.2,  
                                                    random_state=42) 
  
# Train a Logistic Regression 
clf = LogisticRegression(max_iter=12000) 
clf.fit(X_train, y_train) 
  
# Find the predicted probabilities on the testing set 
probabilities = clf.predict_proba(X_test) 
  
  
# Compute the calibration curve for each class 
calibration_curve_values = [] 
for i in range(3): 
    curve = calibration_curve(y_test == i,  
                              probabilities[:, i],  
                              n_bins=20,  
                              pos_label=True) 
    calibration_curve_values.append(curve) 
  
# Plot the calibration curves 
fig, axs = plt.subplots(1, 3, figsize=(17,5)) 
for i in range(3): 
    axs[i].plot(calibration_curve_values[i][1],  
                calibration_curve_values[i][0],  
                marker='o') 
    axs[i].plot([0, 1], [0, 1], linestyle='--') 
    axs[i].set_xlim([0, 1]) 
    axs[i].set_ylim([0, 1]) 
    axs[i].set_title(f"Class {i}", fontsize = 17) 
    axs[i].set_xlabel("Predicted probability", fontsize = 15) 
    axs[i].set_ylabel("True probability", fontsize = 15) 
plt.tight_layout() 
plt.show()

Output:

Probability Calibration for 3-class Classification

Example 2:

Steps :

Load the dataset: Load the dataset you want to use for classification.
Split the data: Split the data into training and testing sets.
Train a classification model: Train a classification model on the training set.
Predict probabilities: Predict probabilities on the testing set using the trained model.
Calibrate probabilities: Calibrate the predicted probabilities using either Platt Scaling or Isotonic Regression.
Evaluate the model: Evaluate the calibrated model using various metrics.

Here is an example code for probability calibration for 3-class classification in Scikit Learn using Platt Scaling:

Python3

# Load the necessary libraries 
from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression 
from sklearn.calibration import CalibratedClassifierCV 
from sklearn.metrics import log_loss 
import matplotlib.pyplot as plt 
  
# Load the iris dataset 
data = load_iris() 
  
# Split the data into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split(data.data,  
                                                    data.target,  
                                                    test_size=0.2) 
  
# Train a logistic regression model 
clf = LogisticRegression() 
clf.fit(X_train, y_train) 
  
# Predict probabilities on the testing set 
probabilities = clf.predict_proba(X_test) 
Uncalibrated_log_loss = log_loss(y_test, probabilities) 
print('Uncalibrated log loss:', Uncalibrated_log_loss) 
  
# Calibrate the predicted probabilities using Platt Scaling 
calibrated_classifier = CalibratedClassifierCV(clf,  
                                               cv='prefit', 
                                               method='sigmoid') 
calibrated_classifier.fit(X_test, y_test) 
  
# Evaluate the calibrated model using log loss 
calibrated_probabilities = calibrated_classifier.predict_proba(X_test) 
calibrated_log_loss = log_loss(y_test, calibrated_probabilities) 
print('Calibrated log loss:', calibrated_log_loss) 
  
# Plot the calibrated probabilities 
plt.figure(figsize=(8, 6)) 
plt.hist(probabilities[:, 0], bins=20, alpha=0.5, label='Uncalibrated') 
plt.hist(calibrated_probabilities[:, 0], 
         bins=20,  
         alpha=0.5,  
         label='Calibrated') 
  
plt.legend(loc='upper center') 
plt.title('Histogram of Predicted Probabilities') 
plt.xlabel('Predicted Probability') 
plt.ylabel('Frequency') 
plt.show()

Output:

Uncalibrated log loss: 0.22417906432427373
Calibrated log loss: 0.4612096780052399

Histogram of Predicted Probabilities

Hence, Probability calibration is a technique used to map the predicted probabilities of a model to their true probabilities. In this tutorial, we discussed probability calibration for 3-class classification using Scikit Learn, including the steps of loading the dataset, splitting the data, training a classification model, predicting probabilities, calibrating probabilities using Platt Scaling or Isotonic Regression, and evaluating the model using various metrics.

Suggest improvement

Probability Calibration of Classifiers in Scikit Learn

Share your thoughts in the comments

Probability Calibration for 3-class Classification in Scikit Learn