Implementation of Lasso Regression From Scratch using Python

For regularization and feature selection, Lasso Regression, also known as the Least Absolute Shrinkage and Selection Operator, is a linear regression technique. The cost function of linear regression, which is the sum of squared discrepancies between predicted and actual values, gains a penalty element. The absolute value of the coefficients multiplied by a regularization parameter (alpha or lambda) is known as the penalty term. This motivates the model to reduce the coefficients’ absolute values, making some of them zero. Consequently, by selecting a sparse model with fewer significant predictors, Lasso Regression carries out automatic feature selection. Because it minimizes overfitting and finds the most pertinent characteristics for prediction, this characteristic makes Lasso Regression especially helpful when working with high-dimensional datasets.

Table of Content

Lasso Regression in Python
How does the Lasso Regression work?
Implementation of Lasso Regression in Python
Frequently Asked Questions (FAQs)

Lasso Regression in Python

Python’s Lasso Regression is a linear regression technique that chooses the most important characteristics in addition to predicting results. By adding a penalty term and reducing the size of less significant feature coefficients to zero, it promotes the use of simpler models. Because of this, Lasso Regression works well for handling large datasets, avoiding overfitting, and producing a simplified model. Lasso Regression may be easily implemented with Python packages such as scikit-learn, which makes it a useful tool for balancing simplicity with predicted accuracy in machine learning applications.

Prerequisites

Linear Regression
Gradient Descent

Lasso Regression is also another linear model derived from Linear Regression which shares the same hypothetical function for prediction. The cost function of Linear Regression is represented by J.

Here, m is the total number of training examples in the dataset.
h(x⁽ⁱ⁾) represents the hypothetical function for prediction.
y⁽ⁱ⁾represents the value of target variable for ith training example.

Linear Regression model considers all the features equally relevant for prediction. When there are many features in the dataset and even some of them are not relevant for the predictive model. This makes the model more complex with a too inaccurate prediction on the test set ( or overfitting). Such a model with high variance does not generalize on the new data. So, Lasso Regression comes for the rescue. It introduced an L1 penalty (or equal to the absolute value of the magnitude of weights) in the cost function of Linear Regression. The modified cost function for Lasso Regression is given below.

Here, w_(j) represents the weight for jth feature.
n is the number of features in the dataset.
lambda is the regularization strength.

Lasso Regression performs both, variable selection and regularization too.

Mathematical Intuition

During gradient descent optimization, added l1 penalty shrunk weights close to zero or zero. Those weights which are shrunken to zero eliminates the features present in the hypothetical function. Due to this, irrelevant features don’t participate in the predictive model. This penalization of weights makes the hypothesis more simple which encourages the sparsity ( model with few parameters ).

If the intercept is added, it remains unchanged.

We can control the strength of regularization by hyperparameter lambda. All weights are reduced by the same factor lambda.

Different cases for tuning values of lambda.

If lambda is set to be 0, Lasso Regression equals Linear Regression.
If lambda is set to be infinity, all weights are shrunk to zero.

If we increase lambda, bias increases if we decrease the lambda variance increase. As lambda increases, more and more weights are shrunk to zero and eliminates features from the model.

How does the Lasso Regression work?

To avoid overfitting and promote feature selection, Lasso Regression combines conventional linear regression with a regularization term. The linear regression loss function is expanded by the regularization term, denoted by λ. During training, this term drives some of the coefficients to exactly zero by penalizing their absolute values.

The objective of the model is to simultaneously minimize the sum of the absolute values of the coefficients and the sum of squared discrepancies between the actual and anticipated values. Only the most pertinent features are kept in a sparse model, which is the result of this dual optimization goal.

The following stages can be used to summarize how Lasso Regression operates:

Set the intercept and coefficients to zero.Set the intercept and coefficients to zero.
Gradient descent is used to update coefficients iteratively through training instances.
Sparsity is enforced by the L1 penalty term, and certain coefficients may become exactly zero.
By focusing on the most significant elements, the model adjusts to become a more straightforward and comprehensible model.

Implementation of Lasso Regression in Python

Dataset used in this implementation can be downloaded from the link.

It has 2 columns — “YearsExperience” and “Salary” for 30 employees in a company. So in this, we will train a Lasso Regression model to learn the correlation between the number of years of experience of each employee and their respective salary. Once the model is trained, we will be able to predict the salary of an employee on the basis of his years of experience.

Importing Libraries

Python3

# Importing libraries

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt

With a dataset, this Python method applies Lasso Regression. It imports the required libraries, such as scikit-learn, Pandas, and NumPy. StandardScaler is used to standardize characteristics after the dataset is read from a CSV file. The Lasso Regression model is then trained, the data is divided into training and testing sets, and the outcomes are displayed using a scatter plot and the Lasso Regression line created with Matplotlib. This code demonstrates how to predict and choose key features in a machine learning environment using the Lasso Regression approach.

Defining the Lasso Regression Class

Python3

# Lasso Regression

class LassoRegression():

    def __init__(self, learning_rate, iterations, l1_penalty):

        self.learning_rate = learning_rate

        self.iterations = iterations

        self.l1_penalty = l1_penalty
 
    # Function for model training

    def fit(self, X, Y):

        # no_of_training_examples, no_of_features

        self.m, self.n = X.shape

        # weight initialization

        self.W = np.zeros(self.n)

        self.b = 0

        self.X = X

        self.Y = Y

        # gradient descent learning

        for i in range(self.iterations):

            self.update_weights()

        return self
 
    # Helper function to update weights in gradient descent

    def update_weights(self):

        Y_pred = self.predict(self.X)

        # calculate gradients

        dW = np.zeros(self.n)

        for j in range(self.n):

            if self.W[j] > 0:

                dW[j] = (-2 * (self.X[:, j]).dot(self.Y - Y_pred) +

                         self.l1_penalty) / self.m

            else:

                dW[j] = (-2 * (self.X[:, j]).dot(self.Y - Y_pred) -

                         self.l1_penalty) / self.m
 
        db = -2 * np.sum(self.Y - Y_pred) / self.m
 
        # update weights

        self.W = self.W - self.learning_rate * dW

        self.b = self.b - self.learning_rate * db

        return self
 
    # Hypothetical function h(x)

    def predict(self, X):

        return X.dot(self.W) + self.b

This Python code defines a class called Lasso Regression. The course covers initialization, training (fitting), and prediction techniques for models. To update weights and train the model over a predetermined number of iterations, the fit technique uses gradient descent. The weight updates include the regularization term (L1 penalty), which encourages sparsity in the model. Using the learnt weights as a basis, the predict method computes the predicted values. The basic phases of Lasso Regression are demonstrated in this implementation, with a focus on the iterative optimization procedure used to minimize the cost function with L1 regularization.

Training the model

Python3

def main():

    # Importing dataset

    df = pd.read_csv("Experience-Salary.csv")

    X = df.iloc[:, :-1].values

    Y = df.iloc[:, 1].values
 
    # Standardize features

    scaler = StandardScaler()

    X = scaler.fit_transform(X)
 
    # Splitting dataset into train and test set

    X_train, X_test, Y_train, Y_test = train_test_split(

        X, Y, test_size=1/3, random_state=0)
 
    # Model training

    model = LassoRegression(

        iterations=1000, learning_rate=0.01, l1_penalty=500)

    model.fit(X_train, Y_train)
 
    # Prediction on test set

    Y_pred = model.predict(X_test)
 
    print("Predicted values: ", np.round(Y_pred[:3], 2))

    print("Real values:      ", Y_test[:3])

    print("Trained W:        ", round(model.W[0], 2))

    print("Trained b:        ", round(model.b, 2))
 
    # Visualization on test set

    plt.scatter(X_test, Y_test, color='blue', label='Actual Data')

    plt.plot(X_test, Y_pred, color='orange', label='Lasso Regression Line')

    plt.title('Salary vs Experience (Lasso Regression)')

    plt.xlabel('Years of Experience (Standardized)')

    plt.ylabel('Salary')

    plt.legend()

    plt.show()
 
if __name__ == "__main__":

    main()

Output:

Predicted values:  [19.88 44.43 34.78]
Real values:       [12.40492474 42.64192391 32.61398476]
Trained W:         6.84
Trained b:         26.61

Lasso Regression

This main function shows how to use Lasso Regression on a dataset for salary prediction. After reading the dataset and using StandardScaler to standardize the features, it divides it into training and testing sets. Next, the Lasso Regression model is trained using predetermined parameters. On the test set, predictions are made, and the model’s effectiveness is assessed by contrasting the expected and actual values. Ultimately, a scatter plot of the real data points and the Lasso Regression line is used to display the results. The model’s capacity to represent the association between years of experience and pay while using L1 regularization for feature selection is demonstrated visually in the visualization.

Note: It automates certain parts of model selection and sometimes called variables eliminator.

Advantages and Disadvantages of Lasso Regression

Advantages

In the context of linear regression and machine learning, Lasso Regression, also known as Least Absolute Shrinkage and Selection Operator, has the following benefits:

Feature Selection: Lasso Regression has the intrinsic capacity to do feature selection by precisely zeroing out certain of its coefficients. This is one of its main advantages. This produces a sparse model, which streamlines the model and pinpoints the most important predictors. It is very useful when dealing with a huge number of features.
Prevention of Overfitting: A regularization term in Lasso Regression penalizes the absolute values of the coefficients. By discouraging too complex models and encouraging a balance between accuracy on the training set and generalization to new data, this regularization helps prevent overfitting.
Improved Model Interpretability: In addition to lowering the feature count, Lasso Regression creates sparsity, which improves the interpretability of the model. Understanding the correlations between features and the objective variable is made easier by the non-zero coefficients, which clearly show which predictors in the model have an influence.
Robustness to Multicollinearity: When there is multicollinearity (strong correlation between features) among predictor variables, Lasso Regression might be applied. In order to increase the stability and interpretability of the model, it often chooses one variable from a set of highly correlated variables.

Disadvantages

While there are many benefits to Lasso Regression, there are also some drawbacks that need to be taken into account:

Sensitivity to Outliers: Lasso Regression is susceptible to dataset outliers. An outlier’s disproportionate impact on the regularization term may result in biased feature selection and impair the performance of the model as a whole.
Selection of Only One Feature from a Group: When there is a high degree of correlation between the features (multicollinearity), Lasso Regression often chooses one feature at random from a set of associated characteristics. It’s possible that this instability in feature selection won’t provide a logical or consistent outcome.
Difficulty in Handling Large Numbers of Features: Although Lasso Regression is capable of feature selection, it may not function well in datasets with a high number of features. For datasets with an excessive number of predictors, the sparsity created by Lasso might not be appropriate.
Arbitrary Selection of Features: The model’s interpretation may be difficult due to the arbitrary feature selection, particularly if related features have comparable predictive power. The choice could be influenced by elements like the dataset’s minor modifications or data ordering.

Frequently Asked Questions (FAQs)

1. What is Lasso Regression?

A linear regression technique called Lasso Regression, also known as Least Absolute Shrinkage and Selection Operator, adds an L1 regularization factor to the conventional linear regression cost function. By penalizing the absolute values of the coefficients, this regularization term encourages sparsity in the model and carries out automatic feature selection.

2. How does Lasso Regression differ from Ridge Regression?

To prevent overfitting, regularization terms are introduced in both Lasso and Ridge Regression models; however, the kind of regularization used varies. In order to pick features, Lasso utilizes an L1 penalty, which drives some coefficients exactly to zero, whereas Ridge uses an L2 penalty, which decreases all coefficients towards zero.

3. What is the significance of the regularization parameter in Lasso Regression?

The Lasso Regression regularization strength is determined by the regularization parameter, which is commonly represented by the letters λ or alpha. It establishes the trade-off between keeping the model simpler and doing a good job of fitting the data. A model with fewer non-zero coefficients is more sparse when λ is larger.

4. In what scenarios is Lasso Regression particularly useful?

Lasso Regression is particularly helpful when feature selection is important and the dataset is high dimensional. It manages scenarios with multicollinearity among predictors, helps avoid overfitting, and enhances model interpretability by choosing pertinent features.

5. How do I choose the right value for the regularization parameter in Lasso Regression?

A trade-off between bias and variance must be made when determining the regularization parameter’s appropriate value. To evaluate the performance of the model for various regularization parameter values and choose the best one, cross-validation techniques like k-fold cross-validation can be used.

Article Tags :

AI-ML-DS

Machine Learning

AI-ML-DS With Python

ML-Regression