CART (Classification And Regression Tree) in Machine Learning

Last Updated : 04 Dec, 2023

CART( Classification And Regression Trees) is a variation of the decision tree algorithm. It can handle both classification and regression tasks. Scikit-Learn uses the Classification And Regression Tree (CART) algorithm to train Decision Trees (also called “growing” trees). CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone in 1984.

CART(Classification And Regression Tree) for Decision Tree

CART is a predictive algorithm used in Machine learning and it explains how the target variable’s values can be predicted based on other matters. It is a decision tree where each fork is split into a predictor variable and each node has a prediction for the target variable at the end.

The term CART serves as a generic term for the following categories of decision trees:

Classification Trees: The tree is used to determine which “class” the target variable is most likely to fall into when it is continuous.
Regression trees: These are used to predict a continuous variable’s value.

In the decision tree, nodes are split into sub-nodes based on a threshold value of an attribute. The root node is taken as the training set and is split into two by considering the best attribute and threshold value. Further, the subsets are also split using the same logic. This continues till the last pure sub-set is found in the tree or the maximum number of leaves possible in that growing tree.

CART Algorithm

Classification and Regression Trees (CART) is a decision tree algorithm that is used for both classification and regression tasks. It is a supervised learning algorithm that learns from labelled data to predict unseen data.

Tree structure: CART builds a tree-like structure consisting of nodes and branches. The nodes represent different decision points, and the branches represent the possible outcomes of those decisions. The leaf nodes in the tree contain a predicted class label or value for the target variable.
Splitting criteria: CART uses a greedy approach to split the data at each node. It evaluates all possible splits and selects the one that best reduces the impurity of the resulting subsets. For classification tasks, CART uses Gini impurity as the splitting criterion. The lower the Gini impurity, the more pure the subset is. For regression tasks, CART uses residual reduction as the splitting criterion. The lower the residual reduction, the better the fit of the model to the data.
Pruning: To prevent overfitting of the data, pruning is a technique used to remove the nodes that contribute little to the model accuracy. Cost complexity pruning and information gain pruning are two popular pruning techniques. Cost complexity pruning involves calculating the cost of each node and removing nodes that have a negative cost. Information gain pruning involves calculating the information gain of each node and removing nodes that have a low information gain.

How does CART algorithm works?

The CART algorithm works via the following process:

The best-split point of each input is obtained.
Based on the best-split points of each input in Step 1, the new “best” split point is identified.
Split the chosen input according to the “best” split point.
Continue splitting until a stopping rule is satisfied or no further desirable splitting is available.

CART algorithm uses Gini Impurity to split the dataset into a decision tree .It does that by searching for the best homogeneity for the sub nodes, with the help of the Gini index criterion.

Gini index/Gini impurity

The Gini index is a metric for the classification tasks in CART. It stores the sum of squared probabilities of each class. It computes the degree of probability of a specific variable that is wrongly being classified when chosen randomly and a variation of the Gini coefficient. It works on categorical variables, provides outcomes either “successful” or “failure” and hence conducts binary splitting only.

The degree of the Gini index varies from 0 to 1,

Where 0 depicts that all the elements are allied to a certain class, or only one class exists there.
The Gini index of value 1 signifies that all the elements are randomly distributed across various classes, and
A value of 0.5 denotes the elements are uniformly distributed into some classes.

Mathematically, we can write Gini Impurity as follows:

$Gini = 1 -\sum_{i=1}^{n} (p_i)^{{2}}$

where $p_i$ is the probability of an object being classified to a particular class.

CART for Classification

A classification tree is an algorithm where the target variable is categorical. The algorithm is then used to identify the “Class” within which the target variable is most likely to fall. Classification trees are used when the dataset needs to be split into classes that belong to the response variable(like yes or no)

For classification in decision tree learning algorithm that creates a tree-like structure to predict class labels. The tree consists of nodes, which represent different decision points, and branches, which represent the possible result of those decisions. Predicted class labels are present at each leaf node of the tree.

How Does CART for Classification Work?

CART for classification works by recursively splitting the training data into smaller and smaller subsets based on certain criteria. The goal is to split the data in a way that minimizes the impurity within each subset. Impurity is a measure of how mixed up the data is in a particular subset. For classification tasks, CART uses Gini impurity

Gini Impurity- Gini impurity measures the probability of misclassifying a random instance from a subset labeled according to the majority class. Lower Gini impurity means more purity of the subset.
Splitting Criteria- The CART algorithm evaluates all potential splits at every node and chooses the one that best decreases the Gini impurity of the resultant subsets. This process continues until a stopping criterion is reached, like a maximum tree depth or a minimum number of instances in a leaf node.

CART for Regression

A Regression tree is an algorithm where the target variable is continuous and the tree is used to predict its value. Regression trees are used when the response variable is continuous. For example, if the response variable is the temperature of the day.

CART for regression is a decision tree learning method that creates a tree-like structure to predict continuous target variables. The tree consists of nodes that represent different decision points and branches that represent the possible outcomes of those decisions. Predicted values for the target variable are stored in each leaf node of the tree.

How Does CART works for Regression?

Regression CART works by splitting the training data recursively into smaller subsets based on specific criteria. The objective is to split the data in a way that minimizes the residual reduction in each subset.

Residual Reduction- Residual reduction is a measure of how much the average squared difference between the predicted values and the actual values for the target variable is reduced by splitting the subset. The lower the residual reduction, the better the model fits the data.
Splitting Criteria- CART evaluates every possible split at each node and selects the one that results in the greatest reduction of residual error in the resulting subsets. This process is repeated until a stopping criterion is met, such as reaching the maximum tree depth or having too few instances in a leaf node.

Pseudo-code of the CART algorithm

d = 0, endtree = 0
Note(0) = 1, Node(1) = 0, Node(2) = 0
while endtree < 1
    if Node(2^d-1) + Node(2^d) + .... + Node(2^d+1-2) = 2 - 2^d+1
        endtree = 1
    else
        do i = 2^d-1, 2^d, .... , 2^d+1-2
            if Node(i) > -1
                Split tree
            else 
                Node(2i+1) = -1
                Node(2i+2) = -1
            end if
        end do
    end if
d = d + 1
end while

CART model representation

CART models are formed by picking input variables and evaluating split points on those variables until an appropriate tree is produced.

Steps to create a Decision Tree using the CART algorithm:

Greedy algorithm: In this The input space is divided using the Greedy method which is known as a recursive binary spitting. This is a numerical method within which all of the values are aligned and several other split points are tried and assessed using a cost function.
Stopping Criterion: As it works its way down the tree with the training data, the recursive binary splitting method described above must know when to stop splitting. The most frequent halting method is to utilize a minimum amount of training data allocated to every leaf node. If the count is smaller than the specified threshold, the split is rejected and also the node is considered the last leaf node.
Tree pruning: Decision tree’s complexity is defined as the number of splits in the tree. Trees with fewer branches are recommended as they are simple to grasp and less prone to cluster the data. Working through each leaf node in the tree and evaluating the effect of deleting it using a hold-out test set is the quickest and simplest pruning approach.
Data preparation for the CART: No special data preparation is required for the CART algorithm.

Decision Tree CART Implementations

Here is the code implements the CART algorithm for classifying fruits based on their color and size. It first encodes the categorical data using a LabelEncoder and then trains a CART classifier on the encoded data. Finally, it predicts the fruit type for a new instance and decodes the result back to its original categorical value.

Python3

from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
 
# Define the features and target variable
features = [
    ["red", "large"],
    ["green", "small"],
    ["red", "small"],
    ["yellow", "large"],
    ["green", "large"],
    ["orange", "large"],
]
target_variable = ["apple", "lime", "strawberry", "banana", "grape", "orange"]
 
# Flatten the features list for encoding
flattened_features = [item for sublist in features for item in sublist]
 
# Use a single LabelEncoder for all features and target variable
le = LabelEncoder()
le.fit(flattened_features + target_variable)
 
# Encode features and target variable
encoded_features = [le.transform(item) for item in features]
encoded_target = le.transform(target_variable)
 
# Create a CART classifier
clf = DecisionTreeClassifier()
 
# Train the classifier on the training set
clf.fit(encoded_features, encoded_target)
 
# Predict the fruit type for a new instance
new_instance = ["red", "large"]
encoded_new_instance = le.transform(new_instance)
predicted_fruit_type = clf.predict([encoded_new_instance])
decoded_predicted_fruit_type = le.inverse_transform(predicted_fruit_type)
print("Predicted fruit type:", decoded_predicted_fruit_type[0])

Output:

Predicted fruit type: apple

Advantages of CART

Results are simplistic.
Classification and regression trees are Nonparametric and Nonlinear.
Classification and regression trees implicitly perform feature selection.
Outliers have no meaningful effect on CART.
It requires minimal supervision and produces easy-to-understand models.

Limitations of CART

Overfitting.
High Variance.
low bias.
the tree structure may be unstable.

Applications of the CART algorithm

For quick Data insights.
In Blood Donors Classification.
For environmental and ecological data.
In the financial sectors.

Frequently asked Question(FAQ’s)

1. What is CART (classification and regression tree)?

CART is a decision tree algorithm that can be used for both classification and regression tasks. It works by recursively partitioning the data into smaller and smaller subsets based on certain criteria. The goal is to create a tree structure that can accurately predict the target variable for new data points.

2. What is a regression tree in machine learning?

A regression tree is a type of decision tree that is used to predict continuous target variables. It works by partitioning the data into smaller and smaller subsets based on certain criteria, and then predicting the average value of the target variable within each subset.

3. What is the difference between a regression tree and a classification tree?

A regression tree is used to predict continuous target variables, while a classification tree is used to predict categorical target variables. Regression trees predict the average value of the target variable within each subset, while classification trees predict the most likely class for each data point.

4. What is the difference between cart and decision tree?

CART is a specific implementation of the decision tree algorithm. There are other decision tree algorithms, such as ID3 and C4.5, that have different splitting criteria and pruning techniques.

5. What is the main difference between classification and regression?

Classification is the task of assigning a category to an instance, while regression is the task of predicting a continuous value. For example, classification could be used to predict whether an email is spam or not spam, while regression could be used to predict the price of a house based on its size, location, and amenities.

Suggest improvement

Python | Decision tree implementation

Decision Tree Classifiers in R Programming

Share your thoughts in the comments

Getting Started with Machine Learning

Data Preprocessing

Classification & Regression

K-Nearest Neighbors (KNN)

Support Vector Machines

Decision Tree

Ensemble Learning

Generative Model

Time Series Forecasting

Clustering Algorithm

Convolutional Neural Networks

Recurrent Neural Networks

Reinforcement Learning

Model Deployment and Productionization

Advanced Topics