Open In App
Related Articles

ML | XGBoost (eXtreme Gradient Boosting)

Like Article
Save Article
Report issue

XGBoost, short for eXtreme Gradient Boosting, is a powerful machine learning algorithm known for its efficiency, speed, and accuracy. It belongs to the family of boosting algorithms, which are ensemble learning techniques that combine the predictions of multiple weak learners. In this article, we will explore XGBoost step by step, building on existing knowledge with decision trees, boosting, and ensemble learning,

What is XGBoost (Extreme Gradient Boosting)?

XGBoost, or Extreme Gradient Boosting, is a state-of-the-art machine learning algorithm renowned for its exceptional predictive performance. It is the gold standard in ensemble learning, especially when it comes to gradient-boosting algorithms. It develops a series of weak learners one after the other to produce a reliable and accurate predictive model.Fundamentally, XGBoost builds a strong predictive model by aggregating the predictions of several weak learners, usually decision trees. It uses a boosting technique to create an extremely accurate ensemble model by having each weak learner after it correct the mistakes of its predecessors.

The optimization method (gradient) minimizes a cost function by repeatedly changing the model’s parameters in response to the gradients of the errors. The algorithm also presents the idea of “gradient boosting with decision trees,” in which the objective function is reduced by calculating the importance of each decision tree that is added to the ensemble in turn. By adding a regularization term and utilizing a more advanced optimization algorithm, XGBoost goes one step further and improves accuracy and efficiency.

It has gained popularity and widespread usage because it can handle large datasets in a variety of machine-learning tasks, including regression and classification.

To deepen your understanding of the prerequisites and delve deeper into related concepts refer to the already-published article XGboost.

What Makes XGBoost “eXtreme”?

XGBoost extends traditional gradient boosting by including regularization elements in the objective function, XGBoost improves generalization and prevents overfitting.

Preventing Overfitting

The learning rate, also known as shrinkage, is a new parameter introduced by XGBoost. It is represented by the symbol “eta.” It quantifies each tree’s contribution to the total prediction. Because each tree has less of an influence, an optimization process with a lower learning rate is more resilient. By making the model more conservative, regularization terms combined with a low learning rate assist avoid overfitting.

XGBoost constructs trees level by level, assessing whether adding a new node (split) enhances the objective function as a whole at each level. The split is trimmed if not. This level growth along with trimming makes the trees easier to understand and easier to create.

The regularization terms, along with other techniques such as shrinkage and pruning, play a crucial role in preventing overfitting, improving generalization, and making XGBoost a robust and powerful algorithm for various machine learning tasks.

Tree Structure

Conventional decision trees are frequently developed by expanding each branch until a stopping condition is satisfied, or in a depth-first fashion. On the other hand, XGBoost builds trees level-wise or breadth-first. This implies that it adds nodes for every feature at a certain depth before moving on to the next level, so growing the tree one level at a time.

Determining the Best Splits: XGBoost assesses every split that might be made for every feature at every level and chooses the one that minimizes the objective function as much as feasible (e.g., minimizing the mean squared error for regression tasks or cross-entropy for classification tasks).

In contrast, a single feature is selected for a split at each level in depth-wise expansion.

Prioritizing Important Features: The overhead involved in choosing the best split for each feature at each level is decreased by level-wise growth. XGBoost eliminates the need to revisit and assess the same feature more than once during tree construction because all features are taken into account at the same time.

This is particularly beneficial when there are complex interactions among features, as the algorithm can adapt to the intricacies of the data.

Handling Missing Data

XGBoost functions well even with incomplete datasets because of its strong mechanism for handling missing data during training.

To effectively handle missing values, XGBoost employs a “Sparsity Aware Split Finding” algorithm. The algorithm treats missing values as a separate value and assesses potential splits in accordance with them when determining the optimal split at each node. If a data point has a missing value for a particular feature during tree construction, it descends a different branch of the tree.

The potential gain from splitting the data based on the available feature values—including missing values—is taken into account by the algorithm to determine the ideal split. It computes the gain for every possible split, treating the cases where values are missing as a separate group.

If the algorithm’s path through the tree comes across a node that has missing values while generating predictions for a new instance during inference, it will proceed along the default branch made for instances that have missing values. This guarantees that the model can generate predictions in the event that there are missing values in the input data.

Cache-Aware Access in XGBoost

Cache memory located closer to the CPU offers faster access times, and modern computer architectures consist of hierarchical memory systems, By making effective use of this cache hierarchy, computational performance can be greatly enhanced. This is why XGBoost’s cache-aware access was created, with the goal of reducing memory access times during the training stage.

The most frequently accessed data is always available for computations because XGBoost processes data by storing portions of the dataset in the CPU’s cache memory. This method makes use of the spatial locality principle, which states that adjacent memory locations are more likely to be accessed concurrently. Computations are sped up by XGBoost because it arranges data in a cache-friendly manner, reducing the need to fetch data from slower main memory.

Approximate Greedy Algorithm

This algorithm uses weighted quantiles to find the optimal node split quickly rather than analyzing each possible split point in detail. When working with large datasets, XGBoost makes the algorithm more scalable and faster by approximating the optimal split, which dramatically lowers the computational cost associated with evaluating all candidate splits.

Parameters in XGBoost

  • Learning Rate (eta): An important variable that modifies how much each tree contributes to the final prediction. While more trees are needed, smaller values frequently result in more accurate models.
  • Max Depth: This parameter controls the depth of every tree, avoiding overfitting and being essential to controlling the model’s complexity.
  • Gamma: Based on the decrease in loss, it determines when a node in the tree will split. The algorithm becomes more conservative with a higher gamma value, avoiding splits that don’t appreciably lower the loss. It aids in managing tree complexity.
  • Subsample: Manages the percentage of data that is sampled at random to grow each tree, hence lowering variance and enhancing generalization. Setting it too low, though, could result in underfitting.
  • Colsample Bytree: Establishes the percentage of features that will be sampled at random for growing each tree.
  • n_estimators: Specifies the number of boosting rounds.
  • lambda (L2 regularization term) and alpha (L1 regularization term): Control the strength of L2 and L1 regularization, respectively. A higher value results in stronger regularization.
  • min_child_weight: Influences the tree structure by controlling the minimum amount of data required to create a new node.
  • scale_pos_weight: Useful in imbalanced class scenarios to control the balance of positive and negative weights.

Why XGboost?

XGBoost is highly scalable and efficient as It is designed to handle large datasets with millions or even billions of instances and features.

XGBoost implements parallel processing techniques and utilizes hardware optimization, such as GPU acceleration, to speed up the training process. This scalability and efficiency make XGBoost suitable for big data applications and real-time predictions.

It provides a wide range of customizable parameters and regularization techniques, allowing users to fine-tune the model according to their specific needs.

XGBoost offers built-in feature importance analysis, which helps identify the most influential features in the dataset. This information can be valuable for feature selection, dimensionality reduction, and gaining insights into the underlying data patterns.

XGBoost has not only demonstrated exceptional performance but has also become a go-to tool for data scientists and machine learning practitioners across various languages. It has consistently outperformed other algorithms in Kaggle competitions, showcasing its effectiveness in producing high-quality predictive models.

When to use XGboost?

  • XGBoost is well-suited for scenarios with a substantial number of training samples.
  • XGBoost is versatile in handling a mix of categorical and numeric features.
  • XGBoost remains a strong choice, where only numeric features are present. Its effectiveness in handling numeric data, coupled with regularization techniques, contributes to robust model performance.

When Not to use XGboost?

  • Limited training samples.
  • Image recognition and Natural Language Processing tasks.
  • Unstructured Data.

How to install XGboost?

Steps to Install Windows XGBoost uses Git submodules to manage dependencies. So when you clone the repo, remember to specify –recursive option:

git clone --recursive

For Windows users who use Github tools, you can open the git-shell and type the following command:

git submodule init
git submodule update

OSX(Mac) First, obtain gcc-8 with Homebrew ( to enable multi-threading (i.e. using multiple CPU threads for training). The default Apple Clang compiler does not support OpenMP, so using the default compiler would have disabled multi-threading.

brew install gcc@8

Then install XGBoost with pip:

pip3 install xgboost

You might need to run the command with –user flag if you run into permission errors. 

XGboost using Python

Let’s build and train a model for classification task using XGboost.

Step 1: Importing necessary libraries


from sklearn.metrics import accuracy_score
import xgboost as xgb
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


Step 2: Loading and splitting the dataset


dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values


Step3: Converting categorical Columns

Since, XGBoost can internally handle categorical features. The code converts the specified columns to the categorical data type. While internally representing categories with integers, the categorical type retains the semantic meaning of the categories.


X['Geography'] = X['Geography'].astype('category')
X['Gender'] = X['Gender'].astype('category')



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CreditScore 10000 non-null int64
1 Geography 10000 non-null category
2 Gender 10000 non-null category
3 Age 10000 non-null int64
4 Tenure 10000 non-null int64
5 Balance 10000 non-null float64
6 NumOfProducts 10000 non-null int64
7 HasCrCard 10000 non-null int64
8 IsActiveMember 10000 non-null int64
9 EstimatedSalary 10000 non-null float64
dtypes: category(2), float64(2), int64(6)
memory usage: 644.9 KB

Step 4: Splitting the dataset into training and testing


X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25, random_state=0)


Step 5: Converting Dataset into Dmatrix

XGBoost presents the DMatrix class, which optimizes speed and memory for effective dataset storage. To use the XGBoost API, datasets must be converted to this format. Labels and training features are both accepted by DMatrix. enable_categorical is set to True to encrypt Pandas category columns automatically.


xgb_train = xgb.DMatrix(X_train, y_train, enable_categorical=True)
xgb_test = xgb.DMatrix(X_test, y_test, enable_categorical=True)


Step 6: Create XGboost Model

  • The code initializes an XGBoost model with hyperparameters like a binary logistic objective, a maximum tree depth of 3, and a learning rate of 0.1.
  • It then trains the model using the `xgb_train` dataset for 50 boosting rounds.

The specified hyperparameters define the model’s structure and training behavior, impacting its accuracy and generalization on the given dataset. Adjusting these hyperparameters may be necessary for optimal performance in different scenarios.


params = {
    'objective': 'binary:logistic',
    'max_depth': 3,
    'learning_rate': 0.1,
model = xgb.train(params=params,dtrain=xgb_train,num_boost_round=n)


Step 7: Make Predictions and Evaluate

The code predicts labels and then converts the predicted probabilities (preds) to integer labels allowing for a straightforward accuracy comparison with the true labels.


preds = model.predict(xgb_test)
preds = preds.astype(int)
accuracy= accuracy_score(y_test,preds)
print('Accuracy of the model is:', accuracy*100)



Accuracy of the model is: 79.64 


In summary, Extreme Gradient Boosting, or XGBoost, is a machine learning powerhouse that earned recognition for its effectiveness, speed, and accuracy. Its extreme quality stems from the regularization components it incorporates, which inhibit overfitting and promote generalization. XGBoost’s robustness is demonstrated by its novel approach to tree construction, cache-aware access, and handling of missing data. Because of its adaptability, scalability, and effectiveness, the algorithm is a great option for a variety of machine learning tasks, especially ones that require a large volume of training data. Its ongoing success in Kaggle tournaments highlights its importance even more. Through its easy-to-use methods and seamless integration with Python, XGBoost is a powerful and accessible tool for building high-quality predictive models.

Frequently Based Questions(FAQs)

1. How does XGBoost work?

XGBoost constructs a robust predictive model by sequentially adding weak learners, often decision trees, to correct errors made by previous models. It employs gradient optimization to minimize a cost function, introducing regularization for better generalization. Key features include cache-aware access, approximate greedy algorithms, and sparsity-aware split finding for efficiency.

2.What is the difference between XGBoost and other boosting algorithms?

The main differences between XGBoost and other boosting algorithms like AdaBoost and Gradient Boosting are that XGBoost uses a more sophisticated algorithm for splitting trees, and it also includes a number of regularization techniques that help to prevent overfitting. As a result, XGBoost is often more accurate than other boosting algorithms, but it can also be more computationally expensive to train.

3. How do I install XGBoost in Python?

To install XGBoost in Python, you can use the pip package manager. Open a terminal window and type the following command: !pip install xgboost

4. How do I tune the hyperparameters of XGBoost in Python?

Tuning the hyperparameters of an XGBoost model in Python involves using a method like grid search or random search to evaluate different combinations of hyperparameter values and select the combination that produces the best results. There are several libraries available for hyperparameter tuning, such as `sklearn.model_selection` and `Optuna`.

5.How to save a python xgboost model to a file?

To save a Python XGBoost model to a file, use the `save_model` method. For example: `model.save_model(‘model_filename.model’)`. This saves the trained model to the specified file.

Last Updated : 06 Dec, 2023
Like Article
Save Article
Share your thoughts in the comments
Similar Reads