Open In App

Quick Start to Gaussian Process Regression

Gaussian Processes, often abbreviated as GPs, are powerful and flexible machine-learning techniques primarily used for regression and probabilistic modelling. They excel at modelling intricate relationships between input variables and their corresponding output values. GPs offer methods to estimate both the mean and uncertainty (variance) of predictions, making them particularly valuable for uncertainty quantification.

In this article, we’ll understand, how Gaussian Process Regression works in alternative cases.



Gaussian Process Regression in scikit-learn

Gaussian Process Regression in scikit-learn, facilitated by the `GaussianProcessRegressor` class, excels in modelling complex relationships between input variables and outputs. Utilizing kernels like the Radial Basis Function, it estimates mean and uncertainty, crucial for uncertainty quantification. By providing methods for prior-based predictions and hyperparameter selection, it offers flexibility. Data preparation involves organizing input features and output values. With the ability to handle noise, GPR is a powerful tool for regression tasks. The process includes selecting a kernel, initializing the model, training it on prepared data, making predictions with mean and uncertainty, and visualizing results for comprehensive insights.

Example:

Let’s generate synthetic data with both noise-free and noisy versions, fit Gaussian Process models to both datasets and visualize the results to showcase the predictions along with the associated uncertainty for each case. The comparison between noise-free and noisy Gaussian Process Regression (GPR) implementations is made to highlight the impact of noise on the modelling process and subsequent predictions. Below are the key factors for why this comparison is important:



Now, let’s delve deeper and explore the steps required to perform Gaussian Process regression in Scikit-Learn. We will provide code examples and explanations to ensure a clear understanding of the process.

Step 1: Importing Required Libraires

To perform Gaussian Process Regression, the first step is to import the necessary libraries. Obviously, we need scikit-learn. In addition to scikit-learn, we also require two more libraries: NumPy and Matplotlib. These libraries are essential for various aspects of GPR, including data manipulation, mathematical operations, and visualizing GPR plots.




import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
import matplotlib.pyplot as plt

Step 2: Data Preparation

As we already know, the initial step involves preparing our data to ensure it’s in the right format for our model. This entails organizing the input features and their corresponding output values appropriately. To illustrate this process, let’s generate synthetic data.




np.random.seed(42)
rng = np.random.default_rng(seed=42)
X = np.sort(rng.uniform(0, 5, 20))[:, np.newaxis]
y_noise_free = np.sin(X).ravel()
y_noisy = y_noise_free + rng.normal(0, 0.1, len(X))

Step 3: Choosing a Kernel

In this step, we need to select an appropriate kernel function that accurately models the relationship between the input features and output values. To begin, we’ll define the kernel function, specifically the Radial Basis Function (RBF). The choice of kernel function is critical as it determines how the Gaussian Process Regression model captures the underlying patterns in the data. For both cases, kernel’s parameters are calculated using maximum likelihood function.




kernel = 1.0 * RBF(length_scale=1.0)

Step 4: Creating the GP Model

Now, let’s proceed to initialize a GaussianProcessRegressor with the previously selected kernel and any relevant hyperparameters for our model. This step is crucial in setting up the GPR model with the chosen kernel and configuring it for the specific regression task.




gp_noise_free = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)
gp_noisy = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)

Step 5:Training the GP Model

In this phase, we will train our Gaussian Process Model using the prepared data. This process involves fitting the model to the training data, allowing it to learn the underlying patterns and relationships between the input features and output values. Training is a pivotal step in building an accurate GP model. Let’s first train with noise free data.




gp_noise_free.fit(X, y_noise_free)

Output:

GaussianProcessRegressor
GaussianProcessRegressor(kernel=1**2 * RBF(length_scale=1),
n_restarts_optimizer=10)

Training with noise data




gp_noisy.fit(X, y_noisy)

Output:

GaussianProcessRegressor
GaussianProcessRegressor(kernel=1**2 * RBF(length_scale=1),
n_restarts_optimizer=10)

Step 6: Making Predictions

With the model fully trained, we can now leverage it to make predictions on new data points. The process begins by generating test data for evaluation. Subsequently, the model will provide predictions that include both the mean and standard deviation, allowing us to assess not only the expected values but also the associated uncertainty in those predictions. This dual provision of mean and variance is a distinctive feature of Gaussian Process Regression




# Step 6: Making Predictions
# Generate test data for evaluation
X_pred = np.linspace(0, 5, 1000)[:, np.newaxis]
 
# Predictions for noise-free model
y_pred_noise_free, sigma_noise_free = gp_noise_free.predict(X_pred, return_std=True)
 
# Predictions for noisy model
y_pred_noisy, sigma_noisy = gp_noisy.predict(X_pred, return_std=True)

Step 7: Visualizing Regression Results

In the final step, we will visualize our regression results. Through visualization, we will be able to observe both the predicted mean and the associated uncertainty. This graphical representation is essential for gaining insights into the model’s performance and understanding the reliability of our predictions.




# Step 7: Visualizing Regression Results
# Plotting noise-free results
plt.figure(figsize=(12, 6))
 
plt.subplot(1, 2, 1)
plt.scatter(X, y_noise_free, c='r', marker='.', label='Observations ( noise-free)')
plt.plot(X_pred, y_pred_noise_free, 'b', label='Prediction')
plt.fill_between(X_pred.flatten(), y_pred_noise_free - 1.96 * sigma_noise_free, y_pred_noise_free + 1.96 * sigma_noise_free, alpha=0.2, color='blue', label='95% Confidence Interval')
plt.title('Gaussian Process Regression (Noise-Free)')
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
 
# Plotting noisy results
plt.subplot(1, 2, 2)
plt.scatter(X, y_noisy, c='r', marker='.', label='Observations (Noisy)')
plt.plot(X_pred, y_pred_noisy, 'b', label='Prediction')
plt.fill_between(X_pred.flatten(), y_pred_noisy - 1.96 * sigma_noisy, y_pred_noisy + 1.96 * sigma_noisy, alpha=0.2, color='blue', label='95% Confidence Interval')
plt.title('Gaussian Process Regression (Noisy)')
plt.xlabel('Input')
plt.ylabel('Output')
plt.legend()
 
plt.show()

Output:

Gaussian Process Regressor

The plots showcase the predictions along with the associated uncertainty for each case. The noise in the training data is evident in the wider confidence intervals in the second subplot, where the model is trained on noisy data.

Conclusion

In summary, we explored Gaussian Process Regression and understood Gaussian process regression is a powerful tool for modeling nonlinear relationships between variables.


Article Tags :