Skip to content
Related Articles

Related Articles

Improve Article

ML | Boston Housing Kaggle Challenge with Linear Regression

  • Difficulty Level : Medium
  • Last Updated : 19 Aug, 2021

Boston Housing Data: This dataset was taken from the StatLib library and is maintained by Carnegie Mellon University. This dataset concerns the housing prices in housing city of Boston. The dataset provided has 506 instances with 13 features.
The Description of dataset is taken from 

Let’s make the Linear Regression Model, predicting housing prices
Inputing Libraries and dataset. 


# Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Importing Data
from sklearn.datasets import load_boston
boston = load_boston()

Shape of input Boston data and getting feature_names 




Converting data from nd-array to dataframe and adding feature names to the data 


data = pd.DataFrame(
data.columns = boston.feature_names

Adding ‘Price’ column to the dataset 


# Adding 'Price' (target) column to the data


data['Price'] =

Description of Boston dataset 



Info of Boston Dataset 


Getting input and output data and further splitting data to training and testing dataset. 


# Input Data
x =
# Output Data
y =
# splitting data to training and testing dataset.
#from sklearn.cross_validation import train_test_split
#the submodule cross_validation is renamed and reprecated to model_selection
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size =0.2,
                                                    random_state = 0)
print("xtrain shape : ", xtrain.shape)
print("xtest shape  : ", xtest.shape)
print("ytrain shape : ", ytrain.shape)
print("ytest shape  : ", ytest.shape)

Applying Linear Regression Model to the dataset and predicting the prices. 


# Fitting Multi Linear regression model to training model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression(), ytrain)
# predicting the test set results
y_pred = regressor.predict(xtest)

Plotting Scatter graph to show the prediction results – ‘ytrue’ value vs ‘y_pred’ value 


# Plotting Scatter graph to show the prediction
# results - 'ytrue' value vs 'y_pred' value
plt.scatter(ytest, y_pred, c = 'green')
plt.xlabel("Price: in $1000's")
plt.ylabel("Predicted value")
plt.title("True value vs predicted value : Linear Regression")

Results of Linear Regression i.e. Mean Squred Error. 


# Results of Linear Regression.
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(ytest, y_pred)
print("Mean Square Error : ", mse)

As per the result our model is only 66.55% accurate. So, the prepared model is not very good for predicting the housing prices. One can improve the prediction results using many other possible machine learning algorithms and techniques. 

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up
Recommended Articles
Page :