Open In App

Probability Calibration for 3-class Classification in Scikit Learn

Last Updated : 08 Jun, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Probability calibration is a technique to map the predicted probabilities of a model to their true probabilities. The probabilities predicted by some classification algorithms like Logistic Regression, SVM, or Random Forest may not be well calibrated, meaning they may not accurately reflect the true probabilities of the target classes. This can lead to incorrect conclusions when using the predicted probabilities for decision-making. 

Probability Calibration

Probability calibration refers to the process of adjusting the predicted probabilities of a classification model to better match the true probabilities of the target variable. The goal of calibration is to ensure that the probabilities assigned by the model to different classes are accurate and can be used to make reliable predictions.

In other words, a well-calibrated model is one that produces predicted probabilities that are close to the actual probabilities of the target variable. For example, if a model predicts that there is a 70% chance of an event occurring, we would expect that the event actually occurs about 70% of the time.

Calibration is particularly important when the predicted probabilities are used to make decisions, such as in medical diagnoses or financial risk assessments. If the model is poorly calibrated, it can lead to incorrect decisions and potentially harmful outcomes.

  • Multiclass classification: It is a classification task where the goal is to assign input data into three or more classes.
  • Platt Scaling: It is a popular method for probability calibration that fits a sigmoid function to the predicted probabilities of a model.
  • Isotonic Regression: It is a non-parametric method for probability calibration that fits a monotonic function to the predicted probabilities of a model.

Example 1:

Step :

  • Load the necessary libraries
  • Generate a synthetic 3-class classification dataset
  • Split the data into training and testing sets
  • Find the predicted probabilities on the testing set
  • Compute the calibration curve for each class
  • Plot the calibration curves

Python3




# Load the necessary libraries
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt
  
# Generate a synthetic 3-class classification dataset
X, y = make_classification(n_samples=1000,
                           n_classes=3
                           n_features=10
                           n_informative=5
                           random_state=42)
  
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2
                                                    random_state=42)
  
# Train a Logistic Regression
clf = LogisticRegression(max_iter=12000)
clf.fit(X_train, y_train)
  
# Find the predicted probabilities on the testing set
probabilities = clf.predict_proba(X_test)
  
  
# Compute the calibration curve for each class
calibration_curve_values = []
for i in range(3):
    curve = calibration_curve(y_test == i, 
                              probabilities[:, i], 
                              n_bins=20
                              pos_label=True)
    calibration_curve_values.append(curve)
  
# Plot the calibration curves
fig, axs = plt.subplots(1, 3, figsize=(17,5))
for i in range(3):
    axs[i].plot(calibration_curve_values[i][1], 
                calibration_curve_values[i][0], 
                marker='o')
    axs[i].plot([0, 1], [0, 1], linestyle='--')
    axs[i].set_xlim([0, 1])
    axs[i].set_ylim([0, 1])
    axs[i].set_title(f"Class {i}", fontsize = 17)
    axs[i].set_xlabel("Predicted probability", fontsize = 15)
    axs[i].set_ylabel("True probability", fontsize = 15)
plt.tight_layout()
plt.show()


Output:

Probability Calibration for 3-class Classification - Geeksforgeeks

Probability Calibration for 3-class Classification 

Example 2:

Steps :

  1. Load the dataset: Load the dataset you want to use for classification.
  2. Split the data: Split the data into training and testing sets.
  3. Train a classification model: Train a classification model on the training set.
  4. Predict probabilities: Predict probabilities on the testing set using the trained model.
  5. Calibrate probabilities: Calibrate the predicted probabilities using either Platt Scaling or Isotonic Regression.
  6. Evaluate the model: Evaluate the calibrated model using various metrics.

Here is an example code for probability calibration for 3-class classification in Scikit Learn using Platt Scaling:

Python3




# Load the necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
import matplotlib.pyplot as plt
  
# Load the iris dataset
data = load_iris()
  
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.data, 
                                                    data.target, 
                                                    test_size=0.2)
  
# Train a logistic regression model
clf = LogisticRegression()
clf.fit(X_train, y_train)
  
# Predict probabilities on the testing set
probabilities = clf.predict_proba(X_test)
Uncalibrated_log_loss = log_loss(y_test, probabilities)
print('Uncalibrated log loss:', Uncalibrated_log_loss)
  
# Calibrate the predicted probabilities using Platt Scaling
calibrated_classifier = CalibratedClassifierCV(clf, 
                                               cv='prefit',
                                               method='sigmoid')
calibrated_classifier.fit(X_test, y_test)
  
# Evaluate the calibrated model using log loss
calibrated_probabilities = calibrated_classifier.predict_proba(X_test)
calibrated_log_loss = log_loss(y_test, calibrated_probabilities)
print('Calibrated log loss:', calibrated_log_loss)
  
# Plot the calibrated probabilities
plt.figure(figsize=(8, 6))
plt.hist(probabilities[:, 0], bins=20, alpha=0.5, label='Uncalibrated')
plt.hist(calibrated_probabilities[:, 0],
         bins=20
         alpha=0.5
         label='Calibrated')
  
plt.legend(loc='upper center')
plt.title('Histogram of Predicted Probabilities')
plt.xlabel('Predicted Probability')
plt.ylabel('Frequency')
plt.show()


Output:

Uncalibrated log loss: 0.22417906432427373
Calibrated log loss: 0.4612096780052399
Histogram of Predicted Probabilities - Geeksforgeeks

Histogram of Predicted Probabilities

                    Hence, Probability calibration is a technique used to map the predicted probabilities of a model to their true probabilities. In this tutorial, we discussed probability calibration for 3-class classification using Scikit Learn, including the steps of loading the dataset, splitting the data, training a classification model, predicting probabilities, calibrating probabilities using Platt Scaling or Isotonic Regression, and evaluating the model using various metrics.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads