Open In App

Principal Component Regression (PCR)

Principal Component Regression (PCR) is a statistical technique for regression analysis that is used to reduce the dimensionality of a dataset by projecting it onto a lower-dimensional subspace. This is done by finding a set of orthogonal (i.e., uncorrelated) linear combinations of the original variables, called principal components, that capture the most variance in the data. The principal components are used as predictors in the regression model, instead of the original variables.

PCR is often used as an alternative to multiple linear regression, especially when the number of variables is large or when the variables are correlated. By using PCR, we can reduce the number of variables in the model and improve the interpretability and stability of the regression results.



Features of the Principal Component Regression (PCR)

Here are some key features of Principal Component Regression (PCR):

Breaking down the Math behind Principal Component Regression (PCR)

Here is a brief overview of the mathematical concepts underlying Principal Component Regression (PCR):



Overall, PCR uses mathematical concepts from linear algebra and statistics to reduce the dimensionality of a dataset and improve the interpretability and stability of regression results.

Limitations of Principal Component Regression (PCR)

While Principal Component Regression (PCR) has many advantages, it also has some limitations that should be considered when deciding whether to use it for a particular regression analysis:

Overall, while PCR has many advantages, it is important to carefully consider its limitations and potential drawbacks before using it for regression analysis.

How Principal Component Regression (PCR) is compared to other regression analysis techniques?

Principal Component Regression (PCR) is often compared to other regression analysis techniques, such as multiple linear regression, principal component analysis (PCA), and partial least squares regression (PLSR). Here are some key differences between PCR and these other techniques:

Overall, PCR is a useful technique for regression analysis that can be compared to multiple linear regression, PCA, and PLSR, depending on the specific characteristics of the data and the goals of the analysis.

Principal Component Regression (PCR) in Python:

Here is the implementation of Principal Component Regression (PCR) in Python, using the scikit-learn library:




# Import the required modules
from sklearn.datasets import load_diabetes
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,\
mean_squared_error
import numpy as np
  
# Load the diabetes dataset
from sklearn.pipeline import Pipeline
  
X, y = load_diabetes(return_X_y=True)
X.shape

Output:

(442, 10)

Now let’s reduce the dimensionality of the original dataset by half that is from 10-dimensional data to 5-dimensional data. Create a pipeline with PCA and linear regression: A pipeline is created that consists of two steps: PCA and linear regression. The PCA step is initialized with the n_components parameter set to 6, which means that only the first six principal components will be kept. The linear regression step is initialized with the default parameters.




# Create a pipeline with PCA and linear regression
pca = PCA(n_components=5)  
  
# Keep only the first six principal components
reg = LinearRegression()
pipeline = Pipeline(steps=[('pca', pca),
                           ('reg', reg)])
  
# Fit the pipeline to the data
pipeline.fit(X, y)
  
# Predict the labels for the data
y_pred = pipeline.predict(X)

Now let’s evaluate the performance of the model by using metrics like mean absolute error, mean squared error, root mean square error, and r2 score.




# Compute the evaluation metrics
mae = mean_absolute_error(y, y_pred)
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
r2 = pipeline.score(X, y)
  
# Print the number of features before and after PCR
print(f'Number of features before PCR: {X.shape[1]}')
print(f'Number of features after PCR: {pca.n_components_}')
  
# Print the evaluation metrics
print(f'MAE: {mae:.2f}')
print(f'MSE: {mse:.2f}')
print(f'RMSE: {rmse:.2f}')
print(f'R^2: {r2:.2f}')

Output:

Number of features before PCR: 10
Number of features after PCR: 5
MAE: 44.30
MSE: 2962.70
RMSE: 54.43
R^2: 0.50

Article Tags :