Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App
geeksforgeeks
Browser
Continue

Related Articles

Variable importance plot using random forest package in R

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

As the name indicates Variable Importance Plot is a which used random forest package to plot the graph based on their accuracy and Gini Coefficient. If the accuracy of the variable is high then it’s going to classify data accurately and Gini Coefficient is measured in terms of the homogeneity of nodes in a random forest. And the Mean Decrease Accuracy and Mean Decrease Gini Coefficient are directly proportional to each other.

Variable importance plot using random forest package in R

Step 1: Install and load the required package.

R




install.packages("randomForest")
library(randomForest)

Step 2: Load the iris dataset.

R




data(iris)
iris

Output:

Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa

Step 3: Next we are going to classify the dataset using the randomForest() function based on results obtained from multiple decision trees.

Syntax: randomForest(data,subset,ntree,keep.forest,importance,proximity)

Where,

  • data – data frame which need to classified
  • subset – column/ rows of that data frame should be used to classify
  • ntree – number of trees to be grown is specified here
  • keep.forest – set to TRUE if the forest need to retained in the output object
  • importance – set to TRUE if predictors to be assessed based on importance else FALSE
  • proximity  – set to TRUE if proximity measure of rows to be considered or not

R




rand_frst <- randomForest(iris, data=iris, ntree=100, keep.forest=FALSE,
                          importance=TRUE)

Step 4: Finally, using the varImpPlot() function we are going to plot the graph.

Syntax: varImpPlot(x,sort,main)

Where,

  • x – Object of randomForest class is specified here
  • sort – Sorting order of variables is specified here(by default TRUE)
  • main – title of the plot is specified here

R




randomForest::varImpPlot(rand_frst,
                         sort=FALSE,
                         main="Variable Importance Plot")

Output:

Variable importance plot using randomforest package in R

Variable importance plot for iris dataset


My Personal Notes arrow_drop_up
Last Updated : 21 Nov, 2022
Like Article
Save Article
Similar Reads
Related Tutorials