Boosting in R
Last Updated :
11 Jul, 2023
Boosting is a machine learning technique used to improve the performance of predictive models by combining weak models into a strong ensemble. R is a popular language for implementing boosting algorithms, providing several packages for this purpose. Machine learning models are gaining popularity due to their ability to solve complex issues. Boosting is one such technique that combines numerous weak models to create a strong model capable of making accurate predictions.
In this article, we will look at the idea of boosting and how it can be used in R. Boosting is a machine-learning method that combines several weak models to produce a strong model. It differs from other techniques such as bagging and random forests in that it employs a weighted strategy to combine weak models. Boosting algorithms work by iteratively improving the weak models’ predictions and then using those gains to train the next weak model. All of the weak models are combined to form the end model.
- Importing the Data: Importing data is the first stage in implementing boosting. In R Programming Language, you can extract data from a CSV file using the read.csv function, or you can import data from other file formats using other functions.
- Splitting the Data: Following that, divide the data into training and testing groups. This is required to prevent the algorithm from overfitting the training data.
- Building the Model: The following stage is to create the boosting model in R using the gbm function. The function accepts three arguments: the formula, the data, and the number of trees to create. Other parameters, such as the utmost depth of each tree and the learning rate, can also be specified.
- Tuning the Model: After building the model, you must tune it to improve its performance. This involves tweaking the parameters of the model and testing its accuracy on the validation data.
- Evaluating the Model: Finally, you need to evaluate the performance of the model using various metrics like accuracy, precision, recall, and F1 score.
Gradient Boosting Machine in R
First, we will load the dataset and then split the dataset into training and testing sets. Again, use a boosting model for the training data using the gbm() function from the “gbm” package. We will use the “boosting” method and set the number of trees to 1000 and the learning rate to 0.01. Use the model to predict the mpg of the test data. Finally, evaluate the performance of the model using the mean squared error:
R
data (mtcars)
library (caTools)
set.seed (123)
split <- sample.split (mtcars$mpg, SplitRatio = 0.7)
train <- mtcars[split, ]
test <- mtcars[!split, ]
library (gbm)
boost <- gbm (mpg ~ ., data = train,
distribution = "gaussian" ,
n.trees = 1000, shrinkage = 0.01,
interaction.depth = 4,
bag.fraction = 0.7,
n.minobsinnode = 5)
predictions <- predict (boost, newdata = test)
mse <- mean ((test$mpg - predictions)^2)
mse
|
Output:
Using 1000 trees...
10.8390088276318
AdaBoost Model in R
Here’s an example of how to use AdaBoost in R to classify the iris dataset:
R
library (adabag)
data (iris)
iris$Species <- as.factor (iris$Species)
index <- sample ( nrow (iris), nrow (iris) * 0.7)
train <- iris[index, ]
test <- iris[-index, ]
model <- boosting (Species ~ ., data = train,
boos = TRUE , mfinal = 10,
control = rpart.control (cp = 0.01,
minsplit = 3))
predictions <- predict (model, newdata = test)
confusion_matrix <- table (predictions$class, test$Species)
accuracy <- sum ( diag (confusion_matrix)) / sum (confusion_matrix)
print (confusion_matrix)
print ( paste0 ( "Accuracy: " , accuracy))
|
Output:
setosa versicolor virginica
setosa 21 0 0
versicolor 0 8 1
virginica 0 1 14
[1] "Accuracy: 0.955555555555556"
XGBoost Model in R
Here’s an example of how to use XGBoost in R.
R
library (xgboost)
library (pROC)
data (agaricus.train, package= 'xgboost' )
data (agaricus.test, package= 'xgboost' )
dtrain <- xgb.DMatrix (agaricus.train$data,
label = agaricus.train$label)
dtest <- xgb.DMatrix (agaricus.test$data,
label = agaricus.test$label)
params <- list (max_depth = 2,
objective = "binary:logistic" ,
eval_metric = "error" )
xgb_model <- xgb.train (params=params,
data = dtrain, nrounds=25,
watchlist= list (train=dtrain,
test=dtest),
verbose = FALSE )
pred <- predict (xgb_model, dtest)
auc (agaricus.test$label, pred)
|
Output:
1
So, these are the different models which can be used for the boosting purpose in R.
Share your thoughts in the comments
Please Login to comment...