Open In App

Ranger Function In R

Last Updated : 22 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Random Forest is a versatile and widely used machine learning algorithm for both classification and regression tasks. In R Programming Language the ranger package provides an efficient and high-performance implementation of Random Forests. The ranger function offers various parameters and options to fine-tune the model and optimize its performance. In this article, we’ll delve into the ranger function, exploring its capabilities and demonstrating its usage through examples.

Introduction to the Ranger Function

The ranger function in R is designed to build Random Forest models efficiently, particularly for large datasets. It leverages parallel computation and optimized algorithms to train models quickly while maintaining high accuracy. The function offers flexibility in specifying hyperparameters and handles categorical variables seamlessly.

ranger(formula, data)
  • formula: A formula specifying the model formula, e.g., y ~ x1 + x2.
  • data: The dataset containing the variables specified in the formula.

Binary Classification with Ranger

Let’s illustrate the usage of the ranger function with an example of binary classification.

R
# Load required library
library(ranger)
# Load dataset
data(iris)

# Convert species to binary class
iris$Species <- ifelse(iris$Species == "setosa", 1, 0)

# Split data into training and testing sets
set.seed(123)
train_index <- sample(1:nrow(iris), 0.8 * nrow(iris))
train_data <- iris[train_index, ]
test_data <- iris[-train_index, ]

# Train Random Forest model
model <- ranger(Species ~ ., data = train_data, num.trees = 100)

# Make predictions on test data
predictions <- predict(model, data = test_data)$predictions

# Evaluate model performance
accuracy <- mean(predictions == test_data$Species)
print(paste("Accuracy:", accuracy))

Output:

[1] "Accuracy: 0.9"

Multi Classification with Ranger

The ranger function allows for extensive Multi Classification. Let’s demonstrate how to perform Multi Classification with Ranger.

R
# Load libraries
library(palmerpenguins)  # Penguins dataset
library(ranger)          # Random forest package
library(caret)           # For data splitting and evaluation


# Load the Penguins dataset
data(penguins, package = "palmerpenguins")

# Remove rows with missing data for simplicity
penguins <- na.omit(penguins)

# Split the dataset into training and testing sets (70% training, 30% testing)
set.seed(123)  # For reproducibility
train_indices <- createDataPartition(penguins$species, p = 0.7, list = FALSE)
train_data <- penguins[train_indices, ]
test_data <- penguins[-train_indices, ]

# Train a random forest model for multiclass classification with ranger
model <- ranger(
  formula = species ~ .,  # Formula specifying the target and predictors
  data = train_data,  # Training dataset
  num.trees = 100,  # Number of trees in the forest
  mtry = 2,  # Number of features to consider at each split
  classification = TRUE,  # Specify that it's a classification problem
  importance = 'impurity',  # To measure feature importance
  probability = TRUE  # To get probabilities for each class
)

# Make predictions on the test set
predictions <- predict(model, data = test_data)

# Extract predicted classes
predicted_classes <- apply(predictions$predictions, 1, which.max)  
predicted_classes <- colnames(predictions$predictions)[predicted_classes]

# Evaluate the model's accuracy
accuracy <- mean(predicted_classes == test_data$species)
print(paste("Accuracy:", accuracy))

Output:

[1] "Accuracy: 1"

Conclusion

The ranger function in R is a powerful tool for building Random Forest models efficiently and accurately. Its flexibility in specifying hyperparameters, seamless handling of categorical variables, and support for parallel computation make it ideal for various machine learning tasks. By leveraging the ranger package, data scientists and machine learning practitioners can develop robust and high-performance models for classification and regression problems. Experimenting with different parameters and options within the ranger function allows for fine-tuning models to achieve optimal performance on diverse datasets.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads