ML | Boston Housing Kaggle Challenge with Linear Regression

• Difficulty Level : Medium
• Last Updated : 04 Oct, 2021

Boston Housing Data: This dataset was taken from the StatLib library and is maintained by Carnegie Mellon University. This dataset concerns the housing prices in the housing city of Boston. The dataset provided has 506 instances with 13 features.
The Description of the dataset is taken from Let’s make the Linear Regression Model, predicting housing prices
Inputting Libraries and dataset.

Python3

 # Importing Librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt  # Importing Datafrom sklearn.datasets import load_bostonboston = load_boston()

The shape of input Boston data and getting feature_names

Python3

 boston.data.shape Python3

 boston.feature_names Converting data from nd-array to data frame and adding feature names to the data

Python3

 data = pd.DataFrame(boston.data)data.columns = boston.feature_names data.head(10) Adding ‘Price’ column to the dataset

Python3

 # Adding 'Price' (target) column to the databoston.target.shape Python3 Description of Boston dataset

Python3

 data.describe() Info of Boston Dataset

Python3

 data.info() Getting input and output data and further splitting data to training and testing dataset.

Python3

 # Input Datax = boston.data  # Output Datay = boston.target    # splitting data to training and testing dataset. #from sklearn.cross_validation import train_test_split#the submodule cross_validation is renamed and reprecated to model_selectionfrom sklearn.model_selection import train_test_split xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size =0.2,                                                    random_state = 0)  print("xtrain shape : ", xtrain.shape)print("xtest shape  : ", xtest.shape)print("ytrain shape : ", ytrain.shape)print("ytest shape  : ", ytest.shape) Applying Linear Regression Model to the dataset and predicting the prices.

Python3

 # Fitting Multi Linear regression model to training modelfrom sklearn.linear_model import LinearRegressionregressor = LinearRegression()regressor.fit(xtrain, ytrain)  # predicting the test set resultsy_pred = regressor.predict(xtest)

Plotting Scatter graph to show the prediction results – ‘ytrue’ value vs ‘y_pred’ value

Python3

 # Plotting Scatter graph to show the prediction# results - 'ytrue' value vs 'y_pred' valueplt.scatter(ytest, y_pred, c = 'green')plt.xlabel("Price: in \$1000's")plt.ylabel("Predicted value")plt.title("True value vs predicted value : Linear Regression")plt.show() Results of Linear Regression i.e. Mean Squared Error.

Python3

 # Results of Linear Regression.from sklearn.metrics import mean_squared_errormse = mean_squared_error(ytest, y_pred)print("Mean Square Error : ", mse)  As per the result, our model is only 66.55% accurate. So, the prepared model is not very good for predicting housing prices. One can improve the prediction results using many other possible machine learning algorithms and techniques.

My Personal Notes arrow_drop_up