Open In App

Perform Bagging in R

Last Updated : 08 Jun, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

We only utilize one training dataset when building a decision tree for a certain dataset. However, adopting a single decision tree has the drawback of having a high variance. That is, the outcomes could be very different if we divided the dataset in half and used the decision tree on each half. Bagging, also known as bootstrap aggregating, is a technique we can use to lower the variance of a single decision tree.

Using bags operates as follows:

  1. Take b samples from the initial dataset that have been bootstrapped.
  2. Create a decision tree for every sample that was bootstrapped.
  3. To get a final model, average each tree’s projections.

What is Bagging?

Machine learning practitioners frequently use the ensemble learning technique known as bagging or Bootstrap Aggregating. By creating numerous subsets of the training data and creating unique models on each subset, it is a strategy that aids in reducing the variance of a machine-learning model. Bagging assists in lowering overfitting and enhancing the model’s generalization capabilities in this way.

How to perform bagging in R?

The “random Forest” package in R offers the bagging implementation. Here is an illustration of how to use the “random Forest” package to conduct bagging in R. First, we will load the random Forest package. Then we can install the “random Forest” package using the command which is given below.

R




install.packages("randomForest")
library(randomForest)


Now Divide the data into training and testing sets and then we will train our random forest model. After the training of the model is completed we should make predictions on the test data. After that, we will evaluate the performance of the model.

R




# loading the library
library(randomForest)
  
# Here we are uploading the dataset  to use for bagging
data(iris)
  
# Now we Split the data into training and testing sets
train <- sample(nrow(iris), 0.7 * nrow(iris),
                replace = FALSE)
train_data <- iris[train, ]
test_data <- iris[-train, ]
  
# Now we are Training the random forest model
rf_model <- randomForest(Species ~ ., 
                         data = train_data)
  
# Here we are making predictions on the test data
predictions <- predict(rf_model, test_data)
  
# now for calculating the accuracy of
# the model we are using this functio.
accuracy <- sum(predictions == test_data$Species) / nrow(test_data)
print(paste("Accuracy:", round(accuracy, 2)))


Output:

[1] "Accuracy: 0.98"

Importance of Bagging Function in R

By creating multiple training sets and combining their predictions, bagging is a machine learning technique used to reduce the variance of a model. It is a function in the R package ipred. Bagging, also known as bootstrap aggregating, is the process of generating each training batch by randomly selecting data from the original dataset and replacing it.

The bagging function in R Programming Language accepts a number of parameters, such as the model’s formula, the data set to be used, the number of bags to produce, and the kind of model to be used. A decision tree is a model that bagging uses by default, but other models can also be specified. Here is an example of how to use the bagging function in R:

R




library(ipred)
library(rpart)
# Loading the iris dataset
data(iris)
  
set.seed(1)
  
#fit the bagged model
bag <- bagging(
  formula = Species ~ .,
  data = iris,
  nbagg = 50,   
  coob = TRUE,
  control = rpart.control(minsplit = 2, cp = 0, 
                         min_depth=2)
)
  
bag


Output:

Bagging classification trees with 50 bootstrap replications 

Call: bagging.data.frame(formula = Species ~ ., data = iris, nbagg = 50, 
    coob = TRUE, control = rpart.control(minsplit = 2, cp = 0, 
        min_depth = 2))

Out-of-bag estimate of misclassification error:  0.06 

To begin, we load the iris dataset in this example and specify the result variable and predictors. The outcome variable and predictors, the number of bootstrap samples (nbagg), whether to employ out-of-bag estimates (coob), and the control parameters for the decision tree method (rpart.control(maxdepth = 2)) are then specified. This creates a bagged decision tree model. Lastly, we estimate the accuracy of the bagged model’s predictions.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads