Ranger Function In R
Last Updated :
22 Apr, 2024
Random Forest is a versatile and widely used machine learning algorithm for both classification and regression tasks. In R Programming Language the ranger package provides an efficient and high-performance implementation of Random Forests. The ranger function offers various parameters and options to fine-tune the model and optimize its performance. In this article, we’ll delve into the ranger function, exploring its capabilities and demonstrating its usage through examples.
Introduction to the Ranger Function
The ranger function in R is designed to build Random Forest models efficiently, particularly for large datasets. It leverages parallel computation and optimized algorithms to train models quickly while maintaining high accuracy. The function offers flexibility in specifying hyperparameters and handles categorical variables seamlessly.
ranger(formula, data)
- formula: A formula specifying the model formula, e.g., y ~ x1 + x2.
- data: The dataset containing the variables specified in the formula.
Binary Classification with Ranger
Let’s illustrate the usage of the ranger function with an example of binary classification.
R
# Load required library
library(ranger)
# Load dataset
data(iris)
# Convert species to binary class
iris$Species <- ifelse(iris$Species == "setosa", 1, 0)
# Split data into training and testing sets
set.seed(123)
train_index <- sample(1:nrow(iris), 0.8 * nrow(iris))
train_data <- iris[train_index, ]
test_data <- iris[-train_index, ]
# Train Random Forest model
model <- ranger(Species ~ ., data = train_data, num.trees = 100)
# Make predictions on test data
predictions <- predict(model, data = test_data)$predictions
# Evaluate model performance
accuracy <- mean(predictions == test_data$Species)
print(paste("Accuracy:", accuracy))
Output:
[1] "Accuracy: 0.9"
Multi Classification with Ranger
The ranger function allows for extensive Multi Classification. Let’s demonstrate how to perform Multi Classification with Ranger.
R
# Load libraries
library(palmerpenguins) # Penguins dataset
library(ranger) # Random forest package
library(caret) # For data splitting and evaluation
# Load the Penguins dataset
data(penguins, package = "palmerpenguins")
# Remove rows with missing data for simplicity
penguins <- na.omit(penguins)
# Split the dataset into training and testing sets (70% training, 30% testing)
set.seed(123) # For reproducibility
train_indices <- createDataPartition(penguins$species, p = 0.7, list = FALSE)
train_data <- penguins[train_indices, ]
test_data <- penguins[-train_indices, ]
# Train a random forest model for multiclass classification with ranger
model <- ranger(
formula = species ~ ., # Formula specifying the target and predictors
data = train_data, # Training dataset
num.trees = 100, # Number of trees in the forest
mtry = 2, # Number of features to consider at each split
classification = TRUE, # Specify that it's a classification problem
importance = 'impurity', # To measure feature importance
probability = TRUE # To get probabilities for each class
)
# Make predictions on the test set
predictions <- predict(model, data = test_data)
# Extract predicted classes
predicted_classes <- apply(predictions$predictions, 1, which.max)
predicted_classes <- colnames(predictions$predictions)[predicted_classes]
# Evaluate the model's accuracy
accuracy <- mean(predicted_classes == test_data$species)
print(paste("Accuracy:", accuracy))
Output:
[1] "Accuracy: 1"
Conclusion
The ranger function in R is a powerful tool for building Random Forest models efficiently and accurately. Its flexibility in specifying hyperparameters, seamless handling of categorical variables, and support for parallel computation make it ideal for various machine learning tasks. By leveraging the ranger package, data scientists and machine learning practitioners can develop robust and high-performance models for classification and regression problems. Experimenting with different parameters and options within the ranger function allows for fine-tuning models to achieve optimal performance on diverse datasets.
Share your thoughts in the comments
Please Login to comment...