Open In App

Automated Machine Learning for Supervised Learning using R

Automated Machine Learning (AutoML) is an approach that aims to automate various stages of the machine learning process, making it easier for users with limited machine learning expertise to build high-performing models. AutoML is particularly useful in supervised learning, where you have labeled data and want to create models that can make predictions or classifications based on that data. This theory will focus on the concept of AutoML for supervised learning using the R programming language.

Key Components of AutoML for Supervised Learning

Data Preparation

Feature Engineering

AutoML Framework

Choose an AutoML framework or package in R, such as mlr or caret that provides automated tools for model selection, hyperparameter tuning, and more.



Model Selection

Hyperparameter Tuning

Use the AutoML framework to automatically search for the best hyperparameters for the chosen algorithms.

Model Training and Evaluation

Model Selection

Advantages of AutoML for Supervised Learning

Challenges of AutoML

Use Cases for AutoML in R

Here’s a example for AutoML with hyperparameter tuning using the mlr package




# Load the mlr package
library(mlr)
library(xgboost)
library(ranger)
 
# Load the Iris dataset
data(iris)
 
# Define features and target variable
features <- setdiff(names(iris), "Species")
target <- "Species"
 
# Create a task object for multiclass classification
task <- makeClassifTask(data = iris, target = target)
 
# Define a single learner (e.g., Random Forest)
learner <- makeLearner("classif.ranger", predict.type = "response")
 
# Define a parameter grid for hyperparameter tuning (e.g., number of trees)
param_grid <- makeParamSet(
  makeIntegerParam("num.trees", lower = 50, upper = 500)
)
 
# Create a tuning control
ctrl <- makeTuneControlRandom(maxit = 10)
 
# Perform AutoML with hyperparameter tuning
result <- tuneParams(learner, task, resampling = makeResampleDesc("CV", iters = 5),
                     measures = list(acc), par.set = param_grid, control = ctrl)
 
# View model results
print(result)

Output:



[Tune] Started tuning learner classif.ranger for parameter set:
Type len Def Constr Req Tunable Trafo
num.trees integer - - 50 to 500 - TRUE -
With control class: TuneControlRandom
Imputation value: -0
[Tune-x] 1: num.trees=151
[Tune-y] 1: acc.test.mean=0.9533333; time: 0.0 min
[Tune-x] 2: num.trees=148
[Tune-y] 2: acc.test.mean=0.9533333; time: 0.0 min
[Tune-x] 3: num.trees=302
[Tune-y] 3: acc.test.mean=0.9600000; time: 0.0 min
[Tune-x] 4: num.trees=68
[Tune-y] 4: acc.test.mean=0.9600000; time: 0.0 min
[Tune-x] 5: num.trees=97
[Tune-y] 5: acc.test.mean=0.9533333; time: 0.0 min
[Tune-x] 6: num.trees=173
[Tune-y] 6: acc.test.mean=0.9600000; time: 0.0 min
[Tune-x] 7: num.trees=124
[Tune-y] 7: acc.test.mean=0.9533333; time: 0.0 min
[Tune-x] 8: num.trees=203
[Tune-y] 8: acc.test.mean=0.9600000; time: 0.0 min
[Tune-x] 9: num.trees=425
[Tune-y] 9: acc.test.mean=0.9600000; time: 0.0 min
[Tune-x] 10: num.trees=423
[Tune-y] 10: acc.test.mean=0.9600000; time: 0.0 min
[Tune] Result: num.trees=68 : acc.test.mean=0.9600000
Tune result:
Op. pars: num.trees=68
acc.test.mean=0.9600000

First, we load the necessary R packages, including mlr, xgboost, and ranger. You also load the Iris dataset and define the features and target variable for your supervised learning task.

In summary, the tuned model achieved an accuracy of 96% on the test data with an optimal number of 68 trees in the random forest model. This indicates that the model is performing well for the multiclass classification task on the Iris dataset.

Automated Machine Learning for Supervised Learning using caret package




# Install and load the caret library
install.packages("caret")
library(caret)
library(randomForest)
 
# Generate a random dataset
set.seed(123)
n <- 100
random_data <- data.frame(
  X1 = rnorm(n),
  X2 = rnorm(n),
  Y = rbinom(n, 1, 0.5)
)
 
# Define target variable
target <- "Y"
 
# Specify the training control and the model tuning grid
ctrl <- trainControl(method = "cv", number = 5)
tune_grid <- expand.grid(.mtry = 2:5)
 
# Run AutoML with random forests as an example
model <- train(random_data[, setdiff(names(random_data), target)],
           random_data[, target], method = "rf", trControl = ctrl, tuneGrid = tune_grid)
 
# Make predictions on synthetic data
new_data <- data.frame(X1 = 0.1, X2 = -0.2)
predictions <- predict(model, newdata = new_data)
 
# Evaluate the model and view the results
print(model)

Output:

Random Forest 
100 samples
2 predictor
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 80, 80, 80, 80, 80
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
2 0.5333827 0.04581951 0.4778230
3 0.5299803 0.04279376 0.4743377
4 0.5318672 0.04155868 0.4779127
5 0.5333452 0.04622749 0.4785377
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 3.

You will first install and load the machine learning Caret library and the RandomForest library, which provides the random forest algorithm for predictive modeling.

Finally, evaluate the performance of the model by viewing the model results via the print function. These results include information about the random forest model, such as the number of trees, the importance of the variables, and the accuracy of the model.

Conclusion

AutoML for supervised learning in R automates and streamlines the process of developing machine learning models. It is a powerful tool for users with varying levels of expertise to quickly build and deploy predictive models, and it is especially useful in cases where time, expertise, or computational resources are limited. However, it is essential to understand the fundamentals of machine learning and to carefully evaluate and interpret the results generated by AutoML tools.


Article Tags :