Predict Fuel Efficiency Using Tensorflow in Python

Last Updated : 21 Mar, 2024

In this article, we will learn how can we build a fuel efficiency predicting model by using TensorFlow API. The dataset we will be using contain features like the distance engine has traveled, the number of cylinders in the car, and other relevant feature.

Importing Libraries

Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
Matplotlib – This library is used to draw visualizations.
Sklearn – This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.
OpenCV – This is an open-source library mainly focused on image processing and handling.
Tensorflow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.

Python3

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sb 
  
import tensorflow as tf 
from tensorflow import keras 
from keras import layers 
  
import warnings 
warnings.filterwarnings('ignore')

Python3

df = pd.read_csv('auto-mpg.csv') 
df.head()

Output:

Let’s check the shape of the data.

Python3

df.shape

Output:

(398, 9)

Now, check the datatypes of the columns.

Python3

df.info()

Output:

Here we can observe one discrepancy the horsepower is given in the object datatype whereas it should be in the numeric datatype.

Python3

df.describe()

Output:

Exploratory Data Analysis

As per the df.info() part first we will deal with the horsepower column and then we will move toward the analysis part.

Python3

df['horsepower'].unique()

Output:

Here we can observe that instead of the null they have been replaced by the string ‘?’ due to this, the data of this column has been provided in the object datatype.

Python3

print(df.shape) 
df = df[df['horsepower'] != '?'] 
print(df.shape) 

Output:

(398, 9)
(392, 9)

So, there were 6 such rows with a question mark.

Python3

df['horsepower'] = df['horsepower'].astype(int) 
df.isnull().sum() 

Output:

mpg             0
cylinders       0
displacement    0
horsepower      0
weight          0
acceleration    0
model year      0
origin          0
car name        0
dtype: int64

Python3

df.nunique()

Output:

mpg             127
cylinders         5
displacement     81
horsepower       93
weight          346
acceleration     95
model year       13
origin            3
car name        301
dtype: int64

Python3

plt.subplots(figsize=(15, 5)) 
for i, col in enumerate(['cylinders', 'origin']): 
    plt.subplot(1, 2, i+1) 
    x = df.groupby(col).mean()['mpg'] 
    x.plot.bar() 
    plt.xticks(rotation=0) 
plt.tight_layout() 
plt.show() 

Output:

Here we can observe that the mpg values are highest for the origin 3.

Python3

plt.figure(figsize=(8, 8)) 
sb.heatmap(df.corr() > 0.9, 
           annot=True, 
           cbar=False) 
plt.show() 

Output:

If we will remove the displacement feature then the problem of high collinearity will be removed.

Python3

df.drop('displacement', 
        axis=1, 
        inplace=True) 

Data Input Pipeline

Python3

from sklearn.model_selection import train_test_split 
features = df.drop(['mpg', 'car name'], axis=1) 
target = df['mpg'].values 
  
X_train, X_val, \ 
    Y_train, Y_val = train_test_split(features, target, 
                                      test_size=0.2, 
                                      random_state=22) 
X_train.shape, X_val.shape 

Output:

((313, 6), (79, 6))

Python3

AUTO = tf.data.experimental.AUTOTUNE 
  
train_ds = ( 
    tf.data.Dataset 
    .from_tensor_slices((X_train, Y_train)) 
    .batch(32) 
    .prefetch(AUTO) 
) 
  
val_ds = ( 
    tf.data.Dataset 
    .from_tensor_slices((X_val, Y_val)) 
    .batch(32) 
    .prefetch(AUTO) 
) 

Model Architecture

We will implement a model using the Sequential API of Keras which will contain the following parts:

We will have two fully connected layers.
We have included some BatchNormalization layers to enable stable and fast training and a Dropout layer before the final layer to avoid any possibility of overfitting.
The final layer is the output layer.

Python3

model = keras.Sequential([ 
    layers.Dense(256, activation='relu', input_shape=[6]), 
    layers.BatchNormalization(), 
    layers.Dense(256, activation='relu'), 
    layers.Dropout(0.3), 
    layers.BatchNormalization(), 
    layers.Dense(1, activation='relu') 
]) 

While compiling a model we provide these three essential parameters:

optimizer – This is the method that helps to optimize the cost function by using gradient descent.
loss – The loss function by which we monitor whether the model is improving with training or not.
metrics – This helps to evaluate the model by predicting the training and the validation data.

Python3

model.compile( 
    loss='mae', 
    optimizer='adam', 
    metrics=['mape'] 
) 

Let’s print the summary of the model’s architecture:

Python3

model.summary()

Output:

Model Training

Now we will train our model using the training and validation pipeline.

Python3

history = model.fit(train_ds, 
                    epochs=50, 
                    validation_data=val_ds) 

Output:

Epoch 45/50
10/10 [==============================] - 0s 14ms/step - loss: 2.8792 - mape: 12.5425 - val_loss: 5.3991 - val_mape: 28.6586
Epoch 46/50
10/10 [==============================] - 0s 8ms/step - loss: 2.9184 - mape: 12.7887 - val_loss: 4.1896 - val_mape: 21.4064
Epoch 47/50
10/10 [==============================] - 0s 9ms/step - loss: 2.8153 - mape: 12.3451 - val_loss: 4.3392 - val_mape: 22.3319
Epoch 48/50
10/10 [==============================] - 0s 9ms/step - loss: 2.7146 - mape: 11.7684 - val_loss: 3.6178 - val_mape: 17.7676
Epoch 49/50
10/10 [==============================] - 0s 10ms/step - loss: 2.7631 - mape: 12.1744 - val_loss: 6.4673 - val_mape: 33.2410
Epoch 50/50
10/10 [==============================] - 0s 10ms/step - loss: 2.6819 - mape: 11.8024 - val_loss: 6.0304 - val_mape: 31.6198

Python3

history_df = pd.DataFrame(history.history) 
history_df.head() 

Output:

Python3

history_df.loc[:, ['loss', 'val_loss']].plot() 
history_df.loc[:, ['mape', 'val_mape']].plot() 
plt.show() 

Output:

The training error has gone down smoothly but the case with the validation is somewhat different.

Suggest improvement

Waiter's Tip Prediction using Machine Learning

Microsoft Stock Price Prediction with Machine Learning

Share your thoughts in the comments

Classification Projects

Regression Projects

Computer Vision Projects

Natural Language Processing Projects

Clustering Projects

Recommender System Project

Predict Fuel Efficiency Using Tensorflow in Python

Importing Libraries

Python3

Python3

Python3

Python3

Python3

Exploratory Data Analysis

Python3

Python3

Python3

Python3

Python3

Python3

Python3

Data Input Pipeline

Python3

Python3

Model Architecture

Python3

Python3

Python3

Model Training

Python3

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?