Open In App

Implementation of Lasso Regression From Scratch using Python

For regularization and feature selection, Lasso Regression, also known as the Least Absolute Shrinkage and Selection Operator, is a linear regression technique. The cost function of linear regression, which is the sum of squared discrepancies between predicted and actual values, gains a penalty element. The absolute value of the coefficients multiplied by a regularization parameter (alpha or lambda) is known as the penalty term. This motivates the model to reduce the coefficients’ absolute values, making some of them zero. Consequently, by selecting a sparse model with fewer significant predictors, Lasso Regression carries out automatic feature selection. Because it minimizes overfitting and finds the most pertinent characteristics for prediction, this characteristic makes Lasso Regression especially helpful when working with high-dimensional datasets.


Lasso Regression in Python

Python’s Lasso Regression is a linear regression technique that chooses the most important characteristics in addition to predicting results. By adding a penalty term and reducing the size of less significant feature coefficients to zero, it promotes the use of simpler models. Because of this, Lasso Regression works well for handling large datasets, avoiding overfitting, and producing a simplified model. Lasso Regression may be easily implemented with Python packages such as scikit-learn, which makes it a useful tool for balancing simplicity with predicted accuracy in machine learning applications.



Prerequisites 

  1. Linear Regression
  2. Gradient Descent

Lasso Regression is also another linear model derived from Linear Regression which shares the same hypothetical function for prediction. The cost function of Linear Regression is represented by J.

.

Linear Regression model considers all the features equally relevant for prediction. When there are many features in the dataset and even some of them are not relevant for the predictive model. This makes the model more complex with a too inaccurate prediction on the test set ( or overfitting). Such a model with high variance does not generalize on the new data. So, Lasso Regression comes for the rescue. It introduced an L1 penalty (or equal to the absolute value of the magnitude of weights) in the cost function of Linear Regression. The modified cost function for Lasso Regression is given below.

Lasso Regression performs both, variable selection and regularization too.

Mathematical Intuition

During gradient descent optimization, added l1 penalty shrunk weights close to zero or zero.  Those weights which are shrunken to zero eliminates the features present in the hypothetical function. Due to this, irrelevant features don’t participate in the predictive model. This penalization of weights makes the hypothesis more simple which encourages the sparsity ( model with few parameters ).

If the intercept is added, it remains unchanged.

We can control the strength of regularization by hyperparameter lambda. All weights are reduced by the same factor lambda. 

Different cases for tuning values of lambda.

  1. If lambda is set to be 0, Lasso Regression equals Linear Regression.
  2. If lambda is set to be infinity, all weights are shrunk to zero.

If we increase lambda, bias increases if we decrease the lambda variance increase. As lambda increases, more and more weights are shrunk to zero and eliminates features from the model.

How does the Lasso Regression work?

To avoid overfitting and promote feature selection, Lasso Regression combines conventional linear regression with a regularization term. The linear regression loss function is expanded by the regularization term, denoted by λ. During training, this term drives some of the coefficients to exactly zero by penalizing their absolute values.

The objective of the model is to simultaneously minimize the sum of the absolute values of the coefficients and the sum of squared discrepancies between the actual and anticipated values. Only the most pertinent features are kept in a sparse model, which is the result of this dual optimization goal.

The following stages can be used to summarize how Lasso Regression operates:

Implementation of Lasso Regression in Python

Dataset used in this implementation can be downloaded from the link.

It has 2 columns — “YearsExperience” and “Salary” for 30 employees in a company. So in this, we will train a Lasso Regression model to learn the correlation between the number of years of experience of each employee and their respective salary. Once the model is trained, we will be able to predict the salary of an employee on the basis of his years of experience.

 Importing Libraries

# Importing libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

                    

With a dataset, this Python method applies Lasso Regression. It imports the required libraries, such as scikit-learn, Pandas, and NumPy. StandardScaler is used to standardize characteristics after the dataset is read from a CSV file. The Lasso Regression model is then trained, the data is divided into training and testing sets, and the outcomes are displayed using a scatter plot and the Lasso Regression line created with Matplotlib. This code demonstrates how to predict and choose key features in a machine learning environment using the Lasso Regression approach.

Defining the Lasso Regression Class

# Lasso Regression
class LassoRegression():
    def __init__(self, learning_rate, iterations, l1_penalty):
        self.learning_rate = learning_rate
        self.iterations = iterations
        self.l1_penalty = l1_penalty
 
    # Function for model training
    def fit(self, X, Y):
        # no_of_training_examples, no_of_features
        self.m, self.n = X.shape
        # weight initialization
        self.W = np.zeros(self.n)
        self.b = 0
        self.X = X
        self.Y = Y
        # gradient descent learning
        for i in range(self.iterations):
            self.update_weights()
        return self
 
    # Helper function to update weights in gradient descent
    def update_weights(self):
        Y_pred = self.predict(self.X)
        # calculate gradients
        dW = np.zeros(self.n)
        for j in range(self.n):
            if self.W[j] > 0:
                dW[j] = (-2 * (self.X[:, j]).dot(self.Y - Y_pred) +
                         self.l1_penalty) / self.m
            else:
                dW[j] = (-2 * (self.X[:, j]).dot(self.Y - Y_pred) -
                         self.l1_penalty) / self.m
 
        db = -2 * np.sum(self.Y - Y_pred) / self.m
 
        # update weights
        self.W = self.W - self.learning_rate * dW
        self.b = self.b - self.learning_rate * db
        return self
 
    # Hypothetical function h(x)
    def predict(self, X):
        return X.dot(self.W) + self.b

                    

This Python code defines a class called Lasso Regression. The course covers initialization, training (fitting), and prediction techniques for models. To update weights and train the model over a predetermined number of iterations, the fit technique uses gradient descent. The weight updates include the regularization term (L1 penalty), which encourages sparsity in the model. Using the learnt weights as a basis, the predict method computes the predicted values. The basic phases of Lasso Regression are demonstrated in this implementation, with a focus on the iterative optimization procedure used to minimize the cost function with L1 regularization.

Training the model

def main():
    # Importing dataset
    df = pd.read_csv("Experience-Salary.csv")
    X = df.iloc[:, :-1].values
    Y = df.iloc[:, 1].values
 
    # Standardize features
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
 
    # Splitting dataset into train and test set
    X_train, X_test, Y_train, Y_test = train_test_split(
        X, Y, test_size=1/3, random_state=0)
 
    # Model training
    model = LassoRegression(
        iterations=1000, learning_rate=0.01, l1_penalty=500)
    model.fit(X_train, Y_train)
 
    # Prediction on test set
    Y_pred = model.predict(X_test)
 
    print("Predicted values: ", np.round(Y_pred[:3], 2))
    print("Real values:      ", Y_test[:3])
    print("Trained W:        ", round(model.W[0], 2))
    print("Trained b:        ", round(model.b, 2))
 
    # Visualization on test set
    plt.scatter(X_test, Y_test, color='blue', label='Actual Data')
    plt.plot(X_test, Y_pred, color='orange', label='Lasso Regression Line')
    plt.title('Salary vs Experience (Lasso Regression)')
    plt.xlabel('Years of Experience (Standardized)')
    plt.ylabel('Salary')
    plt.legend()
    plt.show()
 
 
if __name__ == "__main__":
    main()

                    

Output:

Predicted values:  [19.88 44.43 34.78]
Real values: [12.40492474 42.64192391 32.61398476]
Trained W: 6.84
Trained b: 26.61

Lasso Regression


This main function shows how to use Lasso Regression on a dataset for salary prediction. After reading the dataset and using StandardScaler to standardize the features, it divides it into training and testing sets. Next, the Lasso Regression model is trained using predetermined parameters. On the test set, predictions are made, and the model’s effectiveness is assessed by contrasting the expected and actual values. Ultimately, a scatter plot of the real data points and the Lasso Regression line is used to display the results. The model’s capacity to represent the association between years of experience and pay while using L1 regularization for feature selection is demonstrated visually in the visualization.

Note: It automates certain parts of model selection and sometimes called variables eliminator.

Advantages and Disadvantages of Lasso Regression

Advantages

In the context of linear regression and machine learning, Lasso Regression, also known as Least Absolute Shrinkage and Selection Operator, has the following benefits:

Disadvantages

While there are many benefits to Lasso Regression, there are also some drawbacks that need to be taken into account:

Frequently Asked Questions (FAQs)

1. What is Lasso Regression?

A linear regression technique called Lasso Regression, also known as Least Absolute Shrinkage and Selection Operator, adds an L1 regularization factor to the conventional linear regression cost function. By penalizing the absolute values of the coefficients, this regularization term encourages sparsity in the model and carries out automatic feature selection.

2. How does Lasso Regression differ from Ridge Regression?

To prevent overfitting, regularization terms are introduced in both Lasso and Ridge Regression models; however, the kind of regularization used varies. In order to pick features, Lasso utilizes an L1 penalty, which drives some coefficients exactly to zero, whereas Ridge uses an L2 penalty, which decreases all coefficients towards zero.

3. What is the significance of the regularization parameter in Lasso Regression?

The Lasso Regression regularization strength is determined by the regularization parameter, which is commonly represented by the letters λ or alpha. It establishes the trade-off between keeping the model simpler and doing a good job of fitting the data. A model with fewer non-zero coefficients is more sparse when λ is larger.

4. In what scenarios is Lasso Regression particularly useful?

Lasso Regression is particularly helpful when feature selection is important and the dataset is high dimensional. It manages scenarios with multicollinearity among predictors, helps avoid overfitting, and enhances model interpretability by choosing pertinent features.

5. How do I choose the right value for the regularization parameter in Lasso Regression?

A trade-off between bias and variance must be made when determining the regularization parameter’s appropriate value. To evaluate the performance of the model for various regularization parameter values and choose the best one, cross-validation techniques like k-fold cross-validation can be used.



Article Tags :