Open In App

Predict Fuel Efficiency Using Tensorflow in Python

Last Updated : 21 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn how can we build a fuel efficiency predicting model by using TensorFlow API. The dataset we will be using contain features like the distance engine has traveled, the number of cylinders in the car, and other relevant feature.

Importing Libraries

  • Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
  • Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
  • Matplotlib – This library is used to draw visualizations.
  • Sklearn – This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.
  • OpenCV – This is an open-source library mainly focused on image processing and handling.
  • Tensorflow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.

Python3




import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
  
import tensorflow as tf
from tensorflow import keras
from keras import layers
  
import warnings
warnings.filterwarnings('ignore')


Python3




df = pd.read_csv('auto-mpg.csv')
df.head()


Output:

 

Let’s check the shape of the data.

Python3




df.shape


Output:

(398, 9)

Now, check the datatypes of the columns.

Python3




df.info()


Output:

 

Here we can observe one discrepancy the horsepower is given in the object datatype whereas it should be in the numeric datatype.

Python3




df.describe()


Output:

 

Exploratory Data Analysis

As per the df.info() part first we will deal with the horsepower column and then we will move toward the analysis part.

Python3




df['horsepower'].unique()


Output:

 

Here we can observe that instead of the null they have been replaced by the string ‘?’ due to this, the data of this column has been provided in the object datatype.

Python3




print(df.shape)
df = df[df['horsepower'] != '?']
print(df.shape)


Output:

(398, 9)
(392, 9)

So, there were 6 such rows with a question mark.

Python3




df['horsepower'] = df['horsepower'].astype(int)
df.isnull().sum()


Output:

mpg             0
cylinders       0
displacement    0
horsepower      0
weight          0
acceleration    0
model year      0
origin          0
car name        0
dtype: int64

Python3




df.nunique()


Output:

mpg             127
cylinders         5
displacement     81
horsepower       93
weight          346
acceleration     95
model year       13
origin            3
car name        301
dtype: int64

Python3




plt.subplots(figsize=(15, 5))
for i, col in enumerate(['cylinders', 'origin']):
    plt.subplot(1, 2, i+1)
    x = df.groupby(col).mean()['mpg']
    x.plot.bar()
    plt.xticks(rotation=0)
plt.tight_layout()
plt.show()


Output:

 

Here we can observe that the mpg values are highest for the origin 3.

Python3




plt.figure(figsize=(8, 8))
sb.heatmap(df.corr() > 0.9,
           annot=True,
           cbar=False)
plt.show()


Output:

 

If we will remove the displacement feature then the problem of high collinearity will be removed.

Python3




df.drop('displacement',
        axis=1,
        inplace=True)


Data Input Pipeline

Python3




from sklearn.model_selection import train_test_split
features = df.drop(['mpg', 'car name'], axis=1)
target = df['mpg'].values
  
X_train, X_val, \
    Y_train, Y_val = train_test_split(features, target,
                                      test_size=0.2,
                                      random_state=22)
X_train.shape, X_val.shape


Output:

((313, 6), (79, 6))

Python3




AUTO = tf.data.experimental.AUTOTUNE
  
train_ds = (
    tf.data.Dataset
    .from_tensor_slices((X_train, Y_train))
    .batch(32)
    .prefetch(AUTO)
)
  
val_ds = (
    tf.data.Dataset
    .from_tensor_slices((X_val, Y_val))
    .batch(32)
    .prefetch(AUTO)
)


Model Architecture

We will implement a model using the  Sequential API of Keras which will contain the following parts:

  • We will have two fully connected layers.
  • We have included some BatchNormalization layers to enable stable and fast training and a Dropout layer before the final layer to avoid any possibility of overfitting.
  • The final layer is the output layer.

Python3




model = keras.Sequential([
    layers.Dense(256, activation='relu', input_shape=[6]),
    layers.BatchNormalization(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.3),
    layers.BatchNormalization(),
    layers.Dense(1, activation='relu')
])


While compiling a model we provide these three essential parameters:

  • optimizer – This is the method that helps to optimize the cost function by using gradient descent.
  • loss – The loss function by which we monitor whether the model is improving with training or not.
  • metrics – This helps to evaluate the model by predicting the training and the validation data.

Python3




model.compile(
    loss='mae',
    optimizer='adam',
    metrics=['mape']
)


Let’s print the summary of the model’s architecture:

Python3




model.summary()


Output:

 

Model Training

Now we will train our model using the training and validation pipeline.

Python3




history = model.fit(train_ds,
                    epochs=50,
                    validation_data=val_ds)


Output:

Epoch 45/50
10/10 [==============================] - 0s 14ms/step - loss: 2.8792 - mape: 12.5425 - val_loss: 5.3991 - val_mape: 28.6586
Epoch 46/50
10/10 [==============================] - 0s 8ms/step - loss: 2.9184 - mape: 12.7887 - val_loss: 4.1896 - val_mape: 21.4064
Epoch 47/50
10/10 [==============================] - 0s 9ms/step - loss: 2.8153 - mape: 12.3451 - val_loss: 4.3392 - val_mape: 22.3319
Epoch 48/50
10/10 [==============================] - 0s 9ms/step - loss: 2.7146 - mape: 11.7684 - val_loss: 3.6178 - val_mape: 17.7676
Epoch 49/50
10/10 [==============================] - 0s 10ms/step - loss: 2.7631 - mape: 12.1744 - val_loss: 6.4673 - val_mape: 33.2410
Epoch 50/50
10/10 [==============================] - 0s 10ms/step - loss: 2.6819 - mape: 11.8024 - val_loss: 6.0304 - val_mape: 31.6198

Python3




history_df = pd.DataFrame(history.history)
history_df.head()


Output:

 

Python3




history_df.loc[:, ['loss', 'val_loss']].plot()
history_df.loc[:, ['mape', 'val_mape']].plot()
plt.show()


Output:

 

The training error has gone down smoothly but the case with the validation is somewhat different.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads