Skip to content
Related Articles

Related Articles

Random Forest Classifier using Scikit-learn
  • Last Updated : 05 Sep, 2020

In this article, we will see how to build a Random Forest Classifier using the Scikit-Learn library of Python programming language and in order to do this, we use the IRIS dataset which is quite a common and famous dataset. The Random forest or Random Decision Forest is a supervised Machine learning algorithm used for classification, regression, and other tasks using decision trees.
The Random forest classifier creates a set of decision trees from a randomly selected subset of the training set. It is basically a set of decision trees (DT) from a randomly selected subset of the training set and then It collects the votes from different decision trees to decide the final prediction.
In this classification algorithm, we will use IRIS flower datasets to train and test the model. We will build a model to classify the type of flower.

Code: Loading dataset




# importing required libraries 
# importing Scikit-learn library and datasets package
from sklearn import datasets  
  
# Loading the iris plants dataset (classification)
iris = datasets.load_iris()    

Code: checking our dataset content and features names present in it.




print(iris.target_names)

Output:

[‘setosa’ ‘versicolor’ ‘virginica’]

Code:






print(iris.feature_names)

Output:

[‘sepal length (cm)’, ’sepal width (cm)’, ’petal length (cm)’, ’petal width (cm)’]

Code:




# dividing the datasets into two parts i.e. training datasets and test datasets
X, y = datasets.load_iris( return_X_y = True)
  
# Spliting arrays or matrices into random train and test subsets
from sklearn.model_selection import train_test_split
# i.e. 80 % training dataset and 30 % test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.70)

Code: Importing required libraries and random forest classifier module.




# importing random forest classifier from assemble module
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
# creating dataframe of IRIS dataset
data = pd.DataFrame({‘sepallength’: iris.data[:, 0], ’sepalwidth’: iris.data[:, 1],
                     ’petallength’: iris.data[:, 2], ’petalwidth’: iris.data[:, 3],
                     ’species’: iris.target})

Code: Looking at a dataset




# printing the top 5 datasets in iris dataset
print(data.head())

Output:

     sepallength   sepalwidth   petallength     petalwidth   species

0          5.1             3.5               1.4                0.2           0

1          4.9             3.0               1.4                0.2           0

2          4.7             3.2               1.3                0.2           0

3          4.6             3.1               1.5               0.2            0

4          5.0             3.6               1.4               0.2            0

Code:




# creating a RF classifier
clf = RandomForestClassifier(n_estimators = 100)  
  
# Training the model on the training dataset
# fit function is used to train the model using the training sets as parameters
clf.fit(X_train, y_train)
  
# performing predictions on the test dataset
y_pred = clf.predict(X_test)
  
# metrics are used to find accuracy or error
from sklearn import metrics  
print()
  
# using metrics module for accuracy calculation
print("ACCURACY OF THE MODEL: ", metrics.accuracy_score(y_test, y_pred))

Output:

ACCURACY OF THE MODEL: 0.9238095238095239

 



Code: predicting the type of flower from the data set




# predicting which type of flower it is.
clf.predict([[3, 3, 2, 2]])

Output:

array([0])

This implies it is setosa flower type as we got the three species or classes in our data set: Setosa, Versicolor, and Virginia. Now we will also find out the important features or selecting features in the IRIS dataset by using the following lines of code.

Code:




# importing random forest classifier from assemble module
from sklearn.ensemble import RandomForestClassifier
# Create a Random forest Classifier
clf = RandomForestClassifier(n_estimators = 100)
  
# Train the model using the training sets
clf.fit(X_train, y_train)

Code: Calculating feature importance




# using the feature importance variable
import pandas as pd
feature_imp = pd.Series(clf.feature_importances_, index = iris.feature_names).sort_values(ascending = False)
feature_imp

Output:

petal width (cm)     0.458607
petal length (cm)    0.413859
sepal length (cm)    0.103600
sepal width (cm)     0.023933
dtype: float64

machine-learning

My Personal Notes arrow_drop_up
Recommended Articles
Page :