Random Forest Regression in Python

A Random Forest is an ensemble technique capable of performing both regression and classification tasks with the use of multiple decision trees and a technique called Bootstrap Aggregation, commonly known as bagging. The basic idea behind this is to combine multiple decision trees in determining the final output rather than relying on individual decision trees.
Approach :

  • Pick at random K data points from the training set.
  • Build the decision tree associated with those K data points.
  • Choose the number Ntree of trees you want to build and repeat step 1 & 2.
  • For a new data point, make each one of your Ntree trees predict the value of Y for the data point, and assign the new data point the average across all of the predicted Y values.

Below is the step by step Python implementation.
Step 1 : Import the required libraries.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

chevron_right


Step 2 : Import and print the dataset

filter_none

edit
close

play_arrow

link
brightness_4
code

data = pd.read_csv('Salaries.csv')
print(data)

chevron_right



Step 3 : Select all rows and column 1 from dataset to x and all rows and column 2 as y

filter_none

edit
close

play_arrow

link
brightness_4
code

x = data.iloc[:, 1:2].values 
print(x)
y = data.iloc[:, 2].values  

chevron_right




Step 4 : Fit Random forest regressor to the dataset

filter_none

edit
close

play_arrow

link
brightness_4
code

# Fitting Random Forest Regression to the dataset
# import the regressor
from sklearn.ensemble import RandomForestRegressor
  
 # create regressor object
regressor = RandomForestRegressor(n_estimators = 100, random_state = 0)
  
# fit the regressor with x and y data
regressor.fit(x, y)  

chevron_right



Step 5 : Predicting a new result

filter_none

edit
close

play_arrow

link
brightness_4
code

y_pred = regressor.predict(6.5# test the output by changing values

chevron_right


Step 6 : Visualising the result

filter_none

edit
close

play_arrow

link
brightness_4
code

# Visualising the Random Forest Regression results
  
# arange for creating a range of values
# from min value of x to max 
# value of x with a difference of 0.01 
# between two consecutive values
X_grid = np.arange(min(x), max(x), 0.01
  
# reshape for reshaping the data into a len(X_grid)*1 array, 
# i.e. to make a column out of the X_grid value                  
X_grid = X_grid.reshape((len(X_grid), 1))
  
# Scatter plot for original data
plt.scatter(x, y, color = 'blue')  
  
# plot predicted data
plt.plot(X_grid, regressor.predict(X_grid), 
         color = 'green'
plt.title('Random Forest Regression')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

chevron_right




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.