Open In App
Related Articles

LightGBM (Light Gradient Boosting Machine)

Like Article
Save Article
Report issue

LightGBM is a gradient-boosting framework based on decision trees to increase the efficiency of the model and reduces memory usage. 
It uses two novel techniques:

  • Gradient-based One Side Sampling(GOSS) 
  •  Exclusive Feature Bundling (EFB)

These techniques fulfill the limitations of the histogram-based algorithm that is primarily used in all GBDT (Gradient Boosting Decision Tree) frameworks. The two techniques of GOSS and EFB described below form the characteristics of the LightGBM Algorithm. They comprise together to make the model work efficiently and provide it a cutting edge over other GBDT frameworks 

Gradient-based One Side Sampling Technique for LightGBM: 

Different data instances have varied roles in the computation of information gain. The instances with larger gradients(i.e., under-trained instances) will contribute more to the information gain. GOSS keeps those instances with large gradients (e.g., larger than a predefined threshold, or among the top percentiles), and only randomly drops those instances with small gradients to retain the accuracy of information gain estimation. This treatment can lead to a more accurate gain estimation than uniformly random sampling, with the same target sampling rate, especially when the value of information gain has a large range. 

Algorithm for GOSS: 


I: training data
d: number of iterations
a: sampling ratio of large gradient data
b: sampling ratio of small gradient data
loss: loss function
L: weak learner
Models <- {} # a list of weak models
fact <- (1-a)/b
topN <- a * len(I) # number of top samples to be included
randN <- b * len(I) # number of random samples to be included

for i = 1 to d do:
    preds <- Models.predict(I)
    g <- loss(I, preds)
    w <- {1, 1, ...} # initialize sample weights
    sorted <- GetSortedIndices(abs(g))
    topSet <- sorted[1:topN]
    randSet <- RandomPick(sorted[topN:len(I)], randN)
    usedSet <- topSet + randSet # combine the top and random samples
    w[randSet] <- w[randSet] * fact # assign weight to the small gradient data
    newModel <- L(I[usedSet], g[usedSet], w[usedSet]) # train a new model on the used samples
    Models.append(newModel) # add the new model to the model list

Mathematical Analysis for GOSS Technique (Calculation of Variance Gain at splitting feature j) 

The GOSS (Gradient-based One-Side Sampling) method is used in gradient boosting on a training set with n instances {x1, · · ·, xn}, where each instance xi is a vector with dimension s in space Xs. In each iteration of gradient boosting, the negative gradients of the loss function with respect to the output of the model are represented as {g1, · · ·, gn}. The instances in the training set are ranked in descending order based on their absolute gradient values, and the top-a × 100% instances with the largest gradients are selected to form a subset A.

For the remaining set Ac, consisting of (1- a) × 100% instances with smaller gradients, a random subset B with a size of b × |Ac| is sampled. The instances are then split based on the estimated variance gain at vector Vj (d) over subset A ? B, where

 \tilde{V}_{j}(d)=\frac{1}{n}\left(\frac{\left(\sum_{x_{i} \in A_{l}} g_{i}+\frac{1-a}{b} \sum_{x_{i} \in B_{l}} g_{i}\right)^{2}}{n_{l}^{j}(d)}+\frac{\left(\sum_{x_{i} \in A_{r}} g_{i}+\frac{1-a}{b} \sum_{x_{i} \in B_{r}} g_{i}\right)^{2}}{n_{r}^{j}(d)}\right)

Here, Al = {x_i \in A : x_{ij} \leq d}, \quad Ar = {x_i \in A : x_{ij} > d}, \quad Bl = {x_i \in B : x_{ij} \leq d}, \quad Br = {x_i \in B : x_{ij} > d}  The coefficient (1-a)/b is used to normalize the sum of the gradients over B back to the size of Ac

Exclusive Feature Bundling Technique for LightGBM

High-dimensional data are usually very sparse which provides us the possibility of designing a nearly lossless approach to reduce the number of features. Specifically, in a sparse feature space, many features are mutually exclusive, i.e., they never take nonzero values simultaneously. The exclusive features can be safely bundled into a single feature (called an Exclusive Feature Bundle).  Hence, the complexity of histogram building changes from O(data × feature) to O(data × bundle), while bundle<<feature. Hence, the speed of the training framework is improved without hurting accuracy. 

Algorithm for Exclusive Feature Bundling Technique: 

numData: the number of data points in the dataset
F: a bundle of exclusive features
newBin: a new feature vector obtained from bundling the input features in F
binRanges: a list of bin ranges used to map the original feature values to the new feature values
Initialize binRanges as [0], and totalBin as 0.
For each feature f in F, add f.numBin to totalBin and append the result to binRanges.
Create a new empty feature vector newBin with numData elements.
For each data point i in the dataset:
a. Initialize newBin[i] to 0.
b. For each feature j in F:
i. If F[j].bin[i] is not equal to 0, add F[j].bin[i] and binRanges[j] to newBin[i].
Return newBin and binRanges as the output.

Architecture of LightBGM

LightGBM splits the tree leaf-wise as opposed to other boosting algorithms that grow tree level-wise. It chooses the leaf with the maximum delta loss to grow. Since the leaf is fixed, the leaf-wise algorithm has a lower loss compared to the level-wise algorithm. Leaf-wise tree growth might increase the complexity of the model and may lead to overfitting in small datasets.
Below is a diagrammatic representation of Leaf-Wise Tree Growth: 

Architecture of LightBGM

Architecture of LightBGM 

Python Implementation of LightGBM Model 

The data set used for this example is Breast Cancer Prediction. Click on this to get the dataset: Link to Data set. 


# installing LightGBM (Required in Jupyter Notebook and
# few other compilers once)
pip install lightgbm
# Importing Required Library
import pandas as pd
import lightgbm as lgb
# Similarly LGBMRegressor can also be imported for a regression model.
from lightgbm import LGBMClassifier
# Reading the train and test dataset
data = pd.read_csv("cancer_prediction.csv)
# Removing Columns not Required
data = data.drop(columns = ['Unnamed: 32'], axis = 1)
data = data.drop(columns = ['id'], axis = 1)
# Skipping Data Exploration
# Dummification of Diagnosis Column (1-Benign, 0-Malignant Cancer)
data['diagnosis']= pd.get_dummies(data['diagnosis'])
# Splitting Dataset in two parts
train = data[0:400]
test = data[400:568]
# Separating the independent and target variable on both data set
x_train = train.drop(columns =['diagnosis'], axis = 1)
y_train = train_data['diagnosis']
x_test = test_data.drop(columns =['diagnosis'], axis = 1)
y_test = test_data['diagnosis']
# Creating an object for model and fitting it on training data set
model = LGBMClassifier(), y_train)
# Predicting the Target variable
pred = model.predict(x_test)
accuracy = model.score(x_test, y_test)

Prediction array : 
[0 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1
 1 1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1
 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1
 1 0 1 0 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 0 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0]
Accuracy Score : 

Parameter Tuning of LightBGM Model 
A few important parameters and their usage is listed below : 

  1. max_depth : It sets a limit on the depth of tree. The default value is 20. It is effective in controlling over fitting.
  2. categorical_feature : It specifies the categorical feature used for training model.
  3. bagging_fraction : It specifies the fraction of data to be considered for each iteration.
  4. num_iterations : It specifies the number of iterations to be performed. The default value is 100.
  5. num_leaves : It specifies the number of leaves in a tree. It should be smaller than the square of max_depth.
  6. max_bin : It specifies the maximum number of bins to bucket the feature values.
  7. min_data_in_bin : It specifies the minimum amount of data in one bin.
  8. task : It specifies the task we wish to perform which is either train or prediction. The default entry is train. Another possible value for this parameter is prediction.
  9. feature_fraction : It specifies the fraction of features to be considered in each iteration. The default value is one.

Advantages of LightBGM Model 

  • LightBGM Algorithm has faster speed and higher accuracy 
  • It has lower Memory usage
  • It has better accuracy 
  • It supports parallel and distributed GPU learning 
  • It is capable of handling large scale data  

Last Updated : 23 May, 2023
Like Article
Save Article
Share your thoughts in the comments
Similar Reads