Open In App

LightGBM Boosting Algorithms

Last Updated : 20 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

A machine learning approach called “boosting” turns several poor learners into strong learners. A model that is a poor learner can only marginally outperform random guessing, but a model that is a strong learner can attain great accuracy and generalization. Boosting employs weak learners through iterative training on various subsets or weights of the data, integrating their predictions into a final output. Different machine learning techniques, including decision trees, neural networks, and support vector machines, can perform better with boosting.

For tree-based models, LightGBM is a well-liked and effective implementation of gradient boosting. It is a Microsoft open-source project that has a number of benefits over competing boosting frameworks, including quicker training speeds, reduced memory use, improved accuracy, parallel and distributed learning, and GPU support. Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) are two cutting-edge methods that LightGBM employs to enhance the tree learning procedure. These methods lessen the memory and computational requirements of histogram-based algorithms, which are frequently employed in various gradient-boosting frameworks.

With an emphasis on LightGBM and its characteristics, we will discuss the idea of boosting and how it operates in this post. We will also give some examples of how to do classification and regression tasks using LightGBM in Python.

Boosting

Boosting is an ensemble learning technique that combines numerous models to outperform a single model in performance. The goal of boosting is to educate a series of ineffective learners, each one attempting to fix the mistakes of its forerunner, and then combine their predictions into a final product. Either a weighted average of the individual forecasts (for regression) or a majority vote of the predictions (for classification) can be used to determine the final result.

Although boosting algorithms come in many different forms, they all follow the same basic procedures:

  • Initialize the data with equal weights.
  • For each iteration:
    • Utilize the weighted data to train a weak learner.
    • Evaluate the weak learner on the data and calculate its error rate.
    • Update the weights of the data based on the prediction errors.
    • Assign a coefficient to the weak learner based on its accuracy.
    • Output the final model as a linear combination of the weak learners and their coefficients.

Gradient Boosting, which was first presented by Jerome Friedman in 2001, is another well-liked boosting technique. Gradient Boosting updates the weights of the data by employing the negative gradient of a loss function as pseudo-residuals and regression trees (multi-level decision trees) as weak learners. A learning rate parameter that regulates each tree’s contribution establishes the coefficient of each weak learner. Regression and classification issues may both be solved with gradient boosting.

LightGBM

A gradient boosting solution for tree-based models called LightGBM attempts to be quicker and more effective than previous frameworks. Guolin Ke and his group at Microsoft Research Asia created it in 2017. The acronym LightGBM stands for “Light Gradient Boosting Machine,” where “light” alludes to the machine’s fast speed and little resource use.

Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) are two cutting-edge methods that LightGBM employs to enhance the tree learning procedure. These methods are intended to lower the memory and computational requirements of histogram-based algorithms, which are frequently employed in various gradient boosting frameworks.

In order to discover the appropriate split points for each tree node, histogram-based techniques discretize the continuous feature values into bins. By doing so, they can avoid the sometimes-expensive task of sorting the feature values for each split. However, to generate the histograms, histogram-based algorithms still need to scan all the data points and all the characteristics, which might take a lot of time and memory.

GOSS is a sampling method that chooses a portion of the data points to create histograms from. GOSS chooses data points based on their gradients, which are proportionate to their contribution to the information gain, as opposed to random sampling or uniform sampling. GOSS retains all of the data points with big gradients (i.e., those that are difficult to fit or are under-trained), and randomly samples a portion of the data points with small gradients (i.e., those that are fitting well or are over-trained). In this manner, GOSS can decrease the amount of data while maintaining the accuracy of the information gain estimation.

With the use of EFB, many unique features can be combined into a single feature. Exclusive features seldom assume non-zero values concurrently because they have no or very little link with one another. Exclusive features can be bundled together by EFB to decrease the number of features and bins, freeing up memory and accelerating the creation of histograms.

Additional characteristics that LightGBM supports include:

  • Distributed and parallel learning: LightGBM can train models simultaneously on a number of threads or computers, which drastically cuts down on training time.
  • GPU learning: LightGBM can employ GPU hardware to speed up training, which is up to 10 times quicker than CPU hardware.
  • Sparse data optimization: LightGBM uses a sparse-aware algorithm that only works with non-zero values to efficiently manage sparse data.
  • Support for categorical features: LightGBM is capable of handling categorical features without the need for one-hot encoding, which can lower the dimensionality of the feature and enhance speed.
  • Custom goal and metric functions are available to users of LightGBM. These functions might be valuable for certain activities or domains.

Different Algorithms Supported by LightGBM

LightGBM supports several boosting algorithms, each with its unique characteristics. Let’s see some of the most commonly used ones:

1. Gradient Boosting Decision Tree (GBDT)

Gradient Boosting Decision Trees (GBDT) is a machine learning ensemble technique that combines the forecasts of various decision trees to produce a reliable and precise predictive model. These decision trees are constructed consecutively by GBDT, with each tree rectifying the mistakes of the ones before it. By modifying model parameters, it uses a gradient descent optimization approach to reduce prediction errors.

GBDT is renowned for its capability to manage complicated relationships within data and is extremely effective for both regression and classification tasks. It enhances model performance with each iteration by giving examples that earlier trees incorrectly classified a higher priority. Techniques for regularization aid in preventing overfitting.

Key characteristics:

  • Sequential tree building.
  • It’s prone to overfitting if not carefully tuned.
  • Suitable for a wide range of regression and classification problems.

When to use: Gradient Boosting is a good choice when you have a sufficient amount of data and can spend time tuning hyperparameters for optimal performance.

Python Implementation

Python




#importing Libraries
import lightgbm as lgb
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
 
# Load the breast cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
 
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# Create a LightGBM dataset
train_data = lgb.Dataset(X_train, label=y_train)
 
# Define parameters for GBDT
params = {
    'objective': 'binary',
    'boosting_type': 'gbdt',
    'metric': 'binary_logloss',
    'num_leaves': 11,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}
 
# Train the GBDT model
gbm = lgb.train(params, train_data, num_boost_round=100)
 
# Make predictions on the test set
y_pred = gbm.predict(X_test)
 
# Evaluate the model
accuracy = accuracy_score(y_test, (y_pred > 0.5).astype(int))
print("Accuracy:", accuracy)


Output:

[LightGBM] [Warning] Found whitespace in feature_names, replace with underlines
[LightGBM] [Info] Number of positive: 286, number of negative: 169
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000311 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 4548
[LightGBM] [Info] Number of data points in the train set: 455, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.628571 -> initscore=0.526093
[LightGBM] [Info] Start training from score 0.526093
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Accuracy: 0.9736842105263158

In this code, we load the breast cancer dataset, split it, and train a LightGBM GBDT model. The output will be the accuracy of the model on the test data. This code demonstrates the use of LightGBM, a gradient boosting framework, for binary classification on the breast cancer dataset. It includes loading and slicing the dataset, making a LightGBM training dataset, defining the GBDT model’s parameters, and training and testing the model using accuracy.

2. LightGBM’s Gradient-Based One-Side Sampling (GOSS)

In gradient boosting algorithms like LightGBM, Gradient-Based One-Side Sampling (GOSS) is an optimization technique used to increase training efficiency without sacrificing predictive accuracy. GOSS separates the training data into two subsets: one with instances with significant gradients (denoting data points when model updates are critical), and another with examples having small gradients. GOSS selectively downsamples from the latter group as opposed to subsampling the entire dataset, which enables the model to concentrate on useful data points.

GOSS is particularly useful for large datasets since it prioritizes the informative samples, which lowers the computational overhead associated with evaluating and updating the model during training. This dynamic sampling technique aids in achieving a balance between model performance and training speed.

  • GOSS focuses on instances with larger gradients while discarding less informative instances, reducing memory and time requirements.
  • This technique enhances LightGBM’s speed and memory efficiency.

Python Implementation

Python




#importing Libraries
import lightgbm as lgb
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
 
# Load the breast cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
 
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# Create a LightGBM dataset
train_data = lgb.Dataset(X_train, label=y_train)
# ... (Data loading and splitting as in GBDT example)
 
# Define parameters for GOSS
params = {
    'objective': 'binary',
    'boosting_type': 'goss',
    'metric': 'binary_logloss',
    'num_leaves': 3,
    'learning_rate': 0.05,
    'feature_fraction': 0.9
}
 
# Train the GOSS model
gbm_goss = lgb.train(params, train_data, num_boost_round=100)
 
# Make predictions on the test set
y_pred_goss = gbm_goss.predict(X_test)
 
# Evaluate the model
accuracy_goss = accuracy_score(y_test, (y_pred_goss > 0.5).astype(int))
print("Accuracy (GOSS):", accuracy_goss)


Output:

[LightGBM] [Warning] Found boosting=goss. For backwards compatibility reasons, LightGBM interprets this as boosting=gbdt, data_sample_strategy=goss.To suppress this warning, set data_sample_strategy=goss instead.
[LightGBM] [Warning] Found whitespace in feature_names, replace with underlines
[LightGBM] [Warning] Found boosting=goss. For backwards compatibility reasons, LightGBM interprets this as boosting=gbdt, data_sample_strategy=goss.To suppress this warning, set data_sample_strategy=goss instead.
[LightGBM] [Info] Number of positive: 286, number of negative: 169
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000176 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 4548
[LightGBM] [Info] Number of data points in the train set: 455, number of used features: 30
[LightGBM] [Info] Using GOSS
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.628571 -> initscore=0.526093
[LightGBM] [Info] Start training from score 0.526093
Accuracy (GOSS): 0.9649122807017544

This code demonstrates training a LightGBM model with GOSS. You can compare the accuracy with the GBDT model to see the impact of the algorithm on performance. This code expands on the prior example by showcasing the use of LightGBM in conjunction with the GOSS (Gradient-based One-Side Sampling) boosting strategy for binary classification on the breast cancer dataset. It entails loading and dividing the dataset, producing a LightGBM dataset, and defining the parameters for the GBDT model. Following training, the model is assessed for accuracy and compared to the prior GBDT model. This demonstrates the adaptability of LightGBM in providing various boosting strategies for enhancing binary classification model performance on the breast cancer dataset.

3. LightGBM’s Exclusive Feature Bundling (EFB)

Exclusive Feature Bundling (EFB) is a feature engineering advancement that was first implemented in LightGBM, a well-liked gradient boosting framework. By grouping related features together and enabling only one feature from each group to be used for splitting decision trees, EFB increases the efficiency and interpretability of model training. EFB is especially helpful for high-dimensional datasets since it speeds up training and uses less memory by lowering the number of candidate features at each split. Additionally, EFB streamlines the model’s structure, making it less prone to overfitting and improving overall predictive performance. Using this method, training efficiency and forecasting precision are both optimized.

Python Implementation

Python




#importing Libraries
import lightgbm as lgb
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
 
# Load the breast cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
 
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# Create a LightGBM dataset
train_data = lgb.Dataset(X_train, label=y_train)
 
# ... (Data loading and splitting as in GBDT example)
 
# Define parameters for EFB
params = {
    'objective': 'binary',
    'boosting_type': 'gbdt',
    'metric': 'binary_logloss',
    'num_leaves': 5,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'enable_bundle': True  # Enable EFB
}
 
# Train the EFB model
gbm_efb = lgb.train(params, train_data, num_boost_round=100)
 
# Make predictions on the test set
y_pred_efb = gbm_efb.predict(X_test)
 
# Evaluate the model
accuracy_efb = accuracy_score(y_test, (y_pred_efb > 0.5).astype(int))
print("Accuracy (EFB):", accuracy_efb)


Output:

[LightGBM] [Warning] Found whitespace in feature_names, replace with underlines
[LightGBM] [Info] Number of positive: 286, number of negative: 169
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000183 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 4548
[LightGBM] [Info] Number of data points in the train set: 455, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.628571 -> initscore=0.526093
[LightGBM] [Info] Start training from score 0.526093
Accuracy (EFB): 0.9649122807017544

In this code, we load the breast cancer dataset, split it, and train a LightGBM GBDT model. The output will be the accuracy of the model on the test data.This code expands on the other examples by applying LightGBM with EFB (Exclusive Feature Bundling) for binary classification on the breast cancer dataset. Model training, parameter definition with EFB enabled, dataset preparation, and accuracy assessment are all included.

4. LightGBM’s Histogram-Based Learning

A key optimization method in LightGBM, a potent gradient boosting framework, is histogram-based learning. By discretizing continuous features into histograms, it speeds up model training and makes it possible to compute split points for decision trees quickly. This method is extremely effective, especially for huge datasets, as it eliminates the requirement for sorting and scanning all data points. To further increase training speed and memory effectiveness, LightGBM combines histogram-based learning with a leaf-wise growth strategy. Users can adjust several factors to tailor the learning process. Overall, LightGBM’s Histogram-Based Learning dramatically reduces memory use and training time, making it the method of choice for handling large-scale and high-dimensional datasets.

Key characteristics:

  • Faster training and lower memory usage compared to traditional gradient boosting.
  • Supports various boosting types, making it versatile for different use cases.

When to use: LightGBM is a fantastic choice when dealing with large datasets, real-time prediction requirements, or when you want a speed boost in model training without sacrificing accuracy.

Python Implementation

Python




#(Data loading and splitting same in the GBDT model)
 
# Define parameters for histogram-based learning
params = {
    'objective': 'binary',
    'boosting_type': 'gbdt',
    'metric': 'binary_logloss',
    'num_leaves': 11,
    'learning_rate': 0.05,
    'histogram_pool_size': 1024  # Adjust pool size as needed
}
 
# Train the histogram-based model
gbm_hist = lgb.train(params, train_data, num_boost_round=100)
 
# Make predictions on the test set
y_pred_hist = gbm_hist.predict(X_test)
 
# Evaluate the model
accuracy_hist = accuracy_score(y_test, (y_pred_hist > 0.5).astype(int))
print("Accuracy (Histogram-Based):", accuracy_hist)


Output:

[LightGBM] [Info] Number of positive: 286, number of negative: 169
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000184 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 4548
[LightGBM] [Info] Number of data points in the train set: 455, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.628571 -> initscore=0.526093
[LightGBM] [Info] Start training from score 0.526093
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Accuracy (Histogram-Based): 0.9736842105263158

This code provides a histogram-based learning method for binary classification on the breast cancer dataset. It uses LightGBM. It specifies particular variables for histogram-based learning, like the histogram_pool_size, which can be changed as necessary. The algorithm subsequently uses the training data to develop the histogram-based GBDT model (gbm_hist), makes predictions using the test set, and calculates accuracy as a performance parameter.

5. LightGBM’s DART (Dropouts meet Multiple Additive Regression Trees)

DART (Dropouts meet Multiple Additive Regression Trees) is a regularization method developed by LightGBM to improve the accuracy and durability of gradient boosting models. It uses dropout regularization from neural networks to decision trees. At each cycle of training, DART randomly eliminates (or “drops out”) a subset of trees. This dropout procedure lessens overfitting and promotes the model to rely on a variety of poor learners. The predictions from several subsets of trees are then combined using DART to produce forecasts that are more dependable and precise. LightGBM models become more resilient and capable of obtaining greater performance on a variety of tasks because to this regularization technique’s effectiveness in reducing overfitting and enhancing the generalization of the models.

Python Implementation

Python




# (Data loading and splitting same in the GBDT model)
 
# Define parameters for DART
params = {
    'objective': 'binary',
    'boosting_type': 'dart',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
}
 
# Train the DART model
gbm_dart = lgb.train(params, train_data, num_boost_round=100)
 
# Make predictions on the test set
y_pred_dart = gbm_dart.predict(X_test)
 
# Evaluate the model
accuracy_dart = accuracy_score(y_test, (y_pred_dart > 0.5).astype(int))
print("Accuracy (DART):", accuracy_dart)


Output:

[LightGBM] [Info] Number of positive: 286, number of negative: 169
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000190 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 4548
[LightGBM] [Info] Number of data points in the train set: 455, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.628571 -> initscore=0.526093
[LightGBM] [Info] Start training from score 0.526093
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Accuracy (DART): 0.9736842105263158

This code demonstrates how to use LightGBM for binary classification using the DART (Dropouts meet Multiple Additive Regression Trees) boosting type on a dataset of breast cancer patients. The objective, boosting type, and learning rate are among the parameters specific to the DART model that are first defined. After training on the training set, predictions are made using the test set using the DART-based GBDT model. An alternate method for binary classification with LightGBM is provided by the code, which calculates and prints the accuracy of the DART-based model as a performance metric.

Efficiency and Speed Advantages of LightGBM

LightGBM’s efficiency and speed advantages stem from its unique features:

  • Histogram-Based Splitting: LightGBM constructs histograms of features during tree building, which reduces the number of data scans required. This results in faster training times.
  • Leaf-Wise Tree Growth: Instead of the level-wise growth used in traditional gradient boosting, LightGBM adopts a leaf-wise growth strategy. This leads to more accurate models with fewer nodes, further improving efficiency.
  • Parallel and GPU Learning: LightGBM can leverage multi-core CPUs and GPUs for parallel processing, making it even faster on modern hardware.
  • Sparse Data Handling: It handles sparse data efficiently, which is often a challenge for other boosting methods.

Fine-tuning in LightGBM

In LightGBM, fine-tuning is the act of changing the model’s parameters to enhance its performance on a particular job or dataset. When the model has to adapt to new or changing data or when it was trained on a different or more general domain than the target domain, fine-tuning in LightGBM might be helpful. In LightGBM, fine-tuning can also assist in preventing over- or underfitting, which are frequent issues in machine learning.

There are different ways to fine-tune a model in LightGBM, depending on the type and complexity of the model, the size and quality of the data, and the objective and metric of the task. Some common methods are:

1. Transfer learning: A pre-trained model can be applied to a new task or area using a technique called transfer learning. The objective is to apply the information and features gained from a larger, more comprehensive dataset—like ImageNet or Wikipedia—to a smaller, more focused dataset, like CIFAR-10 or IMDB. Transfer learning can enhance the performance and generalizability of the model while also saving time and resources. Depending on how closely the source and target domains are related, transfer learning may involve freezing or fine-tuning any or all of the layers of the trained model. Using the init_model parameter in LightGBM, users can load an existing model as the starting model for additional training to perform transfer learning.

2. Hyperparameter optimization: A technique for determining the best values for the model’s hyperparameters, such as learning rate, number of trees, number of leaves, etc., is known as hyperparameter optimization. Hyperparameters are settings made by the user prior to training that the model does not learn. Hyperparameters can significantly affect the model’s effectiveness and performance, yet they are frequently challenging to adjust manually. Numerous search techniques, including grid search, random search, Bayesian optimization, etc., can be used for hyperparameter optimization. The lightgbm.cv function in LightGBM may be used to perform cross-validation with provided parameters and provide the best score and ideal settings for hyperparameter tuning.

3. Regularization: Regularization is a technique for applying restrictions or fines to the model to avoid overfitting or to scale back complexity. When a model learns too much from the training data and is unable to generalize to new or untried data, overfitting occurs. Regularization can improve the model’s stability and robustness by lowering variance and noise. Different methods, such as dropout, weight decay, early halting, etc., can be used to regularize. Regularization in LightGBM may be accomplished by adjusting certain model complexity and shrinkage parameters, such as lambda_l1, lambda_l2, min_split_gain, min_child_weight, etc.

Conclusion

With an emphasis on LightGBM and its characteristics, we have discussed the idea of boosting and how it functions in this post. Additionally, we provided several examples of how to use LightGBM to classification and regression problems in Python. As we’ve seen, Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) are two unique methods used by LightGBM to quickly and effectively perform gradient boosting for tree-based models. These methods lessen the memory and computational requirements of histogram-based algorithms, which are frequently employed in various gradient boosting frameworks. Additional capabilities supported by LightGBM include categorical feature support, parallel and distributed learning, GPU learning, sparse data optimization, and custom goal and metric functions. LightGBM can handle various types of data and problems, and achieve high accuracy and generalization.

LightGBM is a powerful tool in the field of machine learning due to its variety of boosting methods, efficiency, and speed. We’ve examined several methods in this post, including Python implementations and explanations of their results. LightGBM’s adaptability guarantees that you have the proper tools to create precise and effective models for your machine learning projects, regardless of whether you’re working with huge datasets, high-dimensional data, or noisy data. You will surely be better equipped to handle a variety of data science difficulties by experimenting with these methods and learning about their advantages.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads