Predict Fuel Efficiency Using Tensorflow in Python
Last Updated :
21 Mar, 2024
In this article, we will learn how can we build a fuel efficiency predicting model by using TensorFlow API. The dataset we will be using contain features like the distance engine has traveled, the number of cylinders in the car, and other relevant feature.
Importing Libraries
- Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
- Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
- Matplotlib – This library is used to draw visualizations.
- Sklearn – This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.
- OpenCV – This is an open-source library mainly focused on image processing and handling.
- Tensorflow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.
Python3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import tensorflow as tf
from tensorflow import keras
from keras import layers
import warnings
warnings.filterwarnings( 'ignore' )
|
Python3
df = pd.read_csv( 'auto-mpg.csv' )
df.head()
|
Output:
Let’s check the shape of the data.
Output:
(398, 9)
Now, check the datatypes of the columns.
Output:
Here we can observe one discrepancy the horsepower is given in the object datatype whereas it should be in the numeric datatype.
Output:
Exploratory Data Analysis
As per the df.info() part first we will deal with the horsepower column and then we will move toward the analysis part.
Python3
df[ 'horsepower' ].unique()
|
Output:
Here we can observe that instead of the null they have been replaced by the string ‘?’ due to this, the data of this column has been provided in the object datatype.
Python3
print (df.shape)
df = df[df[ 'horsepower' ] ! = '?' ]
print (df.shape)
|
Output:
(398, 9)
(392, 9)
So, there were 6 such rows with a question mark.
Python3
df[ 'horsepower' ] = df[ 'horsepower' ].astype( int )
df.isnull(). sum ()
|
Output:
mpg 0
cylinders 0
displacement 0
horsepower 0
weight 0
acceleration 0
model year 0
origin 0
car name 0
dtype: int64
Output:
mpg 127
cylinders 5
displacement 81
horsepower 93
weight 346
acceleration 95
model year 13
origin 3
car name 301
dtype: int64
Python3
plt.subplots(figsize = ( 15 , 5 ))
for i, col in enumerate ([ 'cylinders' , 'origin' ]):
plt.subplot( 1 , 2 , i + 1 )
x = df.groupby(col).mean()[ 'mpg' ]
x.plot.bar()
plt.xticks(rotation = 0 )
plt.tight_layout()
plt.show()
|
Output:
Here we can observe that the mpg values are highest for the origin 3.
Python3
plt.figure(figsize = ( 8 , 8 ))
sb.heatmap(df.corr() > 0.9 ,
annot = True ,
cbar = False )
plt.show()
|
Output:
If we will remove the displacement feature then the problem of high collinearity will be removed.
Python3
df.drop( 'displacement' ,
axis = 1 ,
inplace = True )
|
Data Input Pipeline
Python3
from sklearn.model_selection import train_test_split
features = df.drop([ 'mpg' , 'car name' ], axis = 1 )
target = df[ 'mpg' ].values
X_train, X_val, \
Y_train, Y_val = train_test_split(features, target,
test_size = 0.2 ,
random_state = 22 )
X_train.shape, X_val.shape
|
Output:
((313, 6), (79, 6))
Python3
AUTO = tf.data.experimental.AUTOTUNE
train_ds = (
tf.data.Dataset
.from_tensor_slices((X_train, Y_train))
.batch( 32 )
.prefetch(AUTO)
)
val_ds = (
tf.data.Dataset
.from_tensor_slices((X_val, Y_val))
.batch( 32 )
.prefetch(AUTO)
)
|
Model Architecture
We will implement a model using the Sequential API of Keras which will contain the following parts:
- We will have two fully connected layers.
- We have included some BatchNormalization layers to enable stable and fast training and a Dropout layer before the final layer to avoid any possibility of overfitting.
- The final layer is the output layer.
Python3
model = keras.Sequential([
layers.Dense( 256 , activation = 'relu' , input_shape = [ 6 ]),
layers.BatchNormalization(),
layers.Dense( 256 , activation = 'relu' ),
layers.Dropout( 0.3 ),
layers.BatchNormalization(),
layers.Dense( 1 , activation = 'relu' )
])
|
While compiling a model we provide these three essential parameters:
- optimizer – This is the method that helps to optimize the cost function by using gradient descent.
- loss – The loss function by which we monitor whether the model is improving with training or not.
- metrics – This helps to evaluate the model by predicting the training and the validation data.
Python3
model. compile (
loss = 'mae' ,
optimizer = 'adam' ,
metrics = [ 'mape' ]
)
|
Let’s print the summary of the model’s architecture:
Output:
Model Training
Now we will train our model using the training and validation pipeline.
Python3
history = model.fit(train_ds,
epochs = 50 ,
validation_data = val_ds)
|
Output:
Epoch 45/50
10/10 [==============================] - 0s 14ms/step - loss: 2.8792 - mape: 12.5425 - val_loss: 5.3991 - val_mape: 28.6586
Epoch 46/50
10/10 [==============================] - 0s 8ms/step - loss: 2.9184 - mape: 12.7887 - val_loss: 4.1896 - val_mape: 21.4064
Epoch 47/50
10/10 [==============================] - 0s 9ms/step - loss: 2.8153 - mape: 12.3451 - val_loss: 4.3392 - val_mape: 22.3319
Epoch 48/50
10/10 [==============================] - 0s 9ms/step - loss: 2.7146 - mape: 11.7684 - val_loss: 3.6178 - val_mape: 17.7676
Epoch 49/50
10/10 [==============================] - 0s 10ms/step - loss: 2.7631 - mape: 12.1744 - val_loss: 6.4673 - val_mape: 33.2410
Epoch 50/50
10/10 [==============================] - 0s 10ms/step - loss: 2.6819 - mape: 11.8024 - val_loss: 6.0304 - val_mape: 31.6198
Python3
history_df = pd.DataFrame(history.history)
history_df.head()
|
Output:
Python3
history_df.loc[:, [ 'loss' , 'val_loss' ]].plot()
history_df.loc[:, [ 'mape' , 'val_mape' ]].plot()
plt.show()
|
Output:
The training error has gone down smoothly but the case with the validation is somewhat different.
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...