Open In App

Stepwise Regression in Python

Last Updated : 23 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Stepwise regression is a method of fitting a regression model by iteratively adding or removing variables. It is used to build a model that is accurate and parsimonious, meaning that it has the smallest number of variables that can explain the data.

There are two main types of stepwise regression:

  • Forward Selection –  In forward selection, the algorithm starts with an empty model and iteratively adds variables to the model until no further improvement is made.
  • Backward Elimination – In backward elimination, the algorithm starts with a model that includes all variables and iteratively removes variables until no further improvement is made.

The advantage of stepwise regression is that it can automatically select the most important variables for the model and build a parsimonious model. The disadvantage is that it may not always select the best model, and it can be sensitive to the order in which the variables are added or removed.

Use of Stepwise Regression?

The primary use of stepwise regression is to build a regression model that is accurate and parsimonious. In other words, it is used to find the smallest number of variables that can explain the data.

Stepwise regression is a popular method for model selection because it can automatically select the most important variables for the model and build a parsimonious model. This can save time and effort for the data scientist or analyst, who does not have to manually select the variables for the model.

Stepwise regression can also improve the model’s performance by reducing the number of variables and eliminating any unnecessary or irrelevant variables. This can help to prevent overfitting, which can occur when the model is too complex and does not generalize well to new data.

Overall, the use of stepwise regression is to build accurate and parsimonious regression models that can handle complex, non-linear relationships in the data. It is a popular and effective method for model selection in many different domains.

 Stepwise Regression And Other Regression Models?

Stepwise regression is different from other regression methods because it automatically selects the most important variables for the model. Other regression methods, such as ordinary least squares (OLS) and least absolute shrinkage and selection operator (LASSO), require the data scientist or analyst to manually select the variables for the model.

The advantage of stepwise regression is that it can save time and effort for the data scientist or analyst, and it can also improve the model’s performance by reducing the number of variables and eliminating any unnecessary or irrelevant variables. The disadvantage is that it may not always select the best model, and it can be sensitive to the order in which the variables are added or removed.

Overall, stepwise regression is a useful method for model selection, but it should be used carefully and in combination with other regression methods to ensure that the best model is selected.

Difference between stepwise regression and Linear regression

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. In other words, it is a method for predicting a response (or dependent variable) based on one or more predictor variables.

Stepwise regression is a method for building a regression model by adding or removing predictors in a step-by-step fashion. The goal of stepwise regression is to identify the subset of predictors that provides the best predictive performance for the response variable. This is done by starting with an empty model and iteratively adding or removing predictors based on the strength of their relationship with the response variable.

In summary, linear regression is a method for modeling the relationship between a response and one or more predictor variables, while stepwise regression is a method for building a regression model by iteratively adding or removing predictors.

Implemplementation of Stepwise Regression in Python

To perform stepwise regression in Python, you can follow these steps:

  • Install the mlxtend library by running pip install mlxtend in your command prompt or terminal.
  • Import the necessary modules from the mlxtend library, including sequential_feature_selector and linear_model.
  • Define the features and target variables in your dataset.
  • Initialize the stepwise regression model with the sequential_feature_selector and specify the type of regression to be used (e.g. linear_model.LinearRegression for linear regression).
  • Fit the stepwise regression model to your dataset using the fit method.

Use the k_features attribute of the fitted model to see which features were selected by the stepwise regression.

Importing Libraries

To implement stepwise regression, you will need to have the following libraries installed:

  • Pandas: For data manipulation and analysis.
  • NumPy: For working with arrays and matrices.
  • Sklearn: for machine learning algorithms and preprocessing tools
  • mlxtend: for feature selection algorithms

The first step is to define the array of data and convert it into a dataframe using the NumPy and pandas libraries. Then, the features and target are selected from the dataframe using the iloc method.

Python3




import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from mlxtend.feature_selection import SequentialFeatureSelector
 
# Define the array of data
data = np.array([[1, 2, 3, 4],
                 [5, 6, 7, 8],
                 [9, 10, 11, 12]])
 
# Convert the array into a dataframe
df = pd.DataFrame(data)
 
# Select the features and target
X = df.iloc[:, :-1]
y = df.iloc[:, -1]


Model Development in Stepwise Regression

Next, stepwise regression is performed using the SequentialFeatureSelector() function from the mlxtend library. This function uses a logistic regression model to select the most important features in the dataset, and the number of selected features can be specified using the k_features parameter.

Python3




# Perform stepwise regression
sfs = SequentialFeatureSelector(linear_model.LogisticRegression(),
                                k_features=3,
                                forward=True,
                                scoring='accuracy',
                                cv=None)
selected_features = sfs.fit(X, y)


After the stepwise regression is complete, the selected features are checked using the selected_features.k_feature_names_ attribute and a data frame with only the selected features are created. Finally, the data is split into train and test sets using the train_test_split() function from the sklearn library, and a logistic regression model is fit using the selected features. The model performance is then evaluated using the accuracy_score() function from the sklearn library.

Python3




# Create a dataframe with only the selected features
selected_columns = [0, 1, 2, 3]
df_selected = df[selected_columns]
 
# Split the data into train and test sets
X_train, X_test,\
    y_train, y_test = train_test_split(
        df_selected, y,
        test_size=0.3,
        random_state=42)
 
# Fit a logistic regression model using the selected features
logreg = linear_model.LogisticRegression()
logreg.fit(X_train, y_train)
 
# Make predictions using the test set
y_pred = logreg.predict(X_test)
 
# Evaluate the model performance
print(y_pred)


Output:

[8]

The difference between linear regression and stepwise regression is that stepwise regression is a method for building a regression model by iteratively adding or removing predictors, while linear regression is a method for modeling the relationship between a response and one or more predictor variables.

In the stepwise regression examples, the mlxtend library is used to iteratively add or remove predictors based on their relationship with the response variable, while in the linear regression examples, all predictors are used to fit the model.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads