Back Propagation with TensorFlow

Last Updated : 12 Oct, 2023

This article discusses how backpropagation works in TensorFlow, one of the most popular deep-learning libraries. Let’s learn about what is backpropagation and the other attributes related to it.

Backpropagation

Back propagation is a fundamental technique used in the training of neural networks which helps in optimizing the weights and biases of a model based on the error between the predicted output and the actual output. The basic idea behind this technique is to calculate the gradient of the loss function with respect to each weight and bias in the model. The gradient tells us how much the loss function will be affected by changing the weights and bias by a small amount. The main goal is to reduce the loss which is achieved by iteratively updating the weights and bias of the model based on the gradient.

Backpropagation consists of two phases – the first one is a feedforward pass and the later is a backward pass where the weights and bias are optimized.

Feedforward Pass:

This is the first step in the training of a neural network where the data flows from the input layer to the output layer through certain hidden layers, undergoing essential computations. Neurons in each layer perform weighted sum calculations, and apply activation functions, capturing intricate data patterns. Hidden layers transform the data into hierarchical features, aiding in understanding complex structures. The process culminates at the output layer, producing predictions or classifications. During training, neural networks optimize weights and biases through backpropagation, enhancing their predictive accuracy. This process, combined with feedforward pass, empowers neural networks to learn and excel in various applications.

Backward Pass:

The backward pass is a critical phase in neural network training, initiated after making predictions to minimize errors and enhance accuracy. It calculates the disparity between actual and predicted values, aiming to reduce this error. In this phase, error information is retroactively propagated from the output layer to the input layer. The key objective is to compute gradients with respect to the network’s weights and biases. These gradients reveal the contribution of each weight and bias to the error, helping the network understand how to adjust parameters to minimize errors systematically. Through backpropagation, neural networks iteratively fine-tune their parameters, ultimately improving their predictive capabilities.

Then the weights get updated, and both the passes run iteratively till we get reduced loss.

Back propagation in TensorFlow

TensorFlow is one of the most popular deep learning libraries which helps in efficient training of deep neural networks. Now let’s deep dive into how back propagation works in TensorFlow.

In tensorflow, back propagation is calculated using automatic differentiation, which is a technique where we don’t explicitly compute the gradients of the function. When we define the neural network, tensorflow automatically creates a computational graph that represents the flow of data through the network. Each node consists of the mathematical operation that takes place during both the forward as well as backward pass.

The goal of back propagation is to optimize the weights and biases of the model to minimize the loss. So, we use tensorflow’s automatic differentiation capabilities to compute the gradient of the loss function with respect to weights and biases. When the variable is defined, its takes a trainable parameter which can be set to True, which tells TensorFlow to keep track of its value during training and compute its gradient with respect to the loss function.

Once we have the gradients, there are certain optimizers in Tensorflow such as SGD, Adagrad, and Adam which can be used to update the weights accordingly.

Implementing Back propagation

Installing of the libraries

pip install tensorflow

First install tensorflow in your system by entering the command in your terminal

Importing Libraries

Python3

#importing libraries 
import tensorflow as tf
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split

Here, we are importing all the important libraries need to build the model.
The following libraries are:

Numpy: A Python library for numerical computations, including support for large, multi-dimensional arrays and matrices, along with a wide array of mathematical functions.
Sklearn: A python library for machine learning that provides tools for data preprocessing, modelling, evaluation, and various algorithms for classification, regression, and more.

Loading the dataset

Python

# Load the Iris dataset
iris = datasets.load_iris()
 
# Extract the features (X) and target labels (y) from the dataset
# X contains the feature data
X = iris.data
 
# y contains the target labels
y = iris.target

Here in this Code, we are gathering the data and preprocessing it. Preprocessing of the data require cleaning of data, removal of outliers, and if the numerical data is huge then scaling it to a specific range. In order to study the model , spilt the prepared data into training and testing data.

Training and Testing the model

Python3

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

Here, We divide the the iris dataset into training set (80%) and testing set(20%) to facilitate the development and evaluation of the model. The ‘random_state’ argument is set for reproducibility, ensuring that same split is obtained each time the code is run.

Defining a machine learning model

Python

# Define the neural network architecture
hidden_layer_size =32
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(3, activation='softmax')  # 3 classes for Iris dataset
])
 
model.summary()

Output:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_4 (Dense)             (None, 32)                160       
                                                                 
 dense_5 (Dense)             (None, 3)                 99        
                                                                 
=================================================================
Total params: 259
Trainable params: 259
Non-trainable params: 0
_________________________________________________________________

Here, we are defining a model using tensorflow’s Keras.
The model consist of two layers:

Dense Layer: A hidden layer with the ReLU activation function and an input shape that matches the number of the features in the training data.
Output Layer: An output layer has three neurons and uses a softmax activation function to produce class probabilities.

Loss function and optimizer

Python

# Define hyperparameters
learning_rate = 0.01
epochs = 1000
hidden_layer_size = 10
 
# Define the loss function and optimizer
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.SGD(learning_rate)

Here we are defining the loss function and optimizer used for the model.

Sparse Categorical Crossentropy: It is a loss function used in classification tasks where target and labels are integers. It calculates the cross-entropy loss between the predicted class probabilities and true class labels, automatically converting integer labels to one-hot encoded vectors internally.
Stochastic Gradient Descent(SGD): It is an optimization algorithm used for training models. It updates model parameters using small, randomly sampled subsets of the training data, which introduces randomness and helps the model converge to a solution faster and potentially escape local minima.

Backpropagation

Now implement the backpropagation on the trained model in a loop called training loop.

Python

# Training loop
 
# Iterate through a specified number of training epochs
for epoch in range(epochs):
 
    # Use TensorFlow's GradientTape to record operations for automatic differentiation
    with tf.GradientTape() as tape:
 
        # Forward pass: Compute predictions (logits) by passing training data through the neural network
        logits = model(X_train)
 
        # Calculate the loss by comparing predicted logits with the true training labels (y_train)
        loss_value = loss_fn(y_train, logits)
 
    # Backpropagation: Compute gradients of the loss with respect to model parameters
    grads = tape.gradient(loss_value, model.trainable_variables)
 
    # Apply the computed gradients to update the model's parameters using the specified optimizer
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
 
    # Print the loss at regular intervals to monitor training progress
    if (epoch + 1) % 100 == 0:
        print(f"Epoch {epoch + 1}/{epochs}, Loss: {loss_value.numpy()}")

Output:

Epoch 100/1000, Loss: 0.7505729794502258
Epoch 200/1000, Loss: 0.5097097754478455
Epoch 300/1000, Loss: 0.4034225344657898
Epoch 400/1000, Loss: 0.34098243713378906
Epoch 500/1000, Loss: 0.29588592052459717
Epoch 600/1000, Loss: 0.2603895366191864
Epoch 700/1000, Loss: 0.23153600096702576
Epoch 800/1000, Loss: 0.20790232717990875
Epoch 900/1000, Loss: 0.18850967288017273
Epoch 1000/1000, Loss: 0.17255887389183044

In the above code, it represents a training loop for a neural network. It iterates through a specified number of epochs, computing predictions, calculating predictions and loss, and updating model parameters using backpropagation and an optimizer. Training progress is monitored by printing the loss every 100 epochs.
Clearly, with increasing epochs the loss is gradually decreasing. This is a result of backpropagation, its adjusting the weights of layers according to the desired output in order to achieve higher accuracy.

Advantages

Efficient Gradient Calculation: Tensorflow’s automatic differentiation capabilities make it efficient to compute gradients during backpropagation. This is crucial for optimization the mode’s parameters.
Flexibility: Tensorflow allows you to define and customize complex neural network architectures easily, making it suitable for wide range of ML tasks.
GPU Acceleration: Tensorflow effortlessly integrates with GPUs, which can significantly speed up the training process for neural networks.
Deployment: Tensorflow provides tools for converting trained models into formats suitable for deployment on various platforms, including mobile devices and the web.

Disadvantages

Increased memory consumption: It requires storing the intermediate values during forward and backward passes to complete gradients.
Computational overhead: Using Automatic differentiation for simple functions can create a significant computational overhead. So, its better to generate gradients manually for these functions.
Updates and Compatibility: Tensorflow occasionally introduces updates and changes that may require adjustments to existing code. Compatibility with older versions can be a concern for long term projects.
Resource Intensive: Training deep neural networks with tensorflow can be resource intensive, requiring powerful GPUs or TPUs, which may not be readily available to everyone.

Suggest improvement

Tensorflow.js tf.layers.reLU() Function

Build a Neural Network Classifier in R

Share your thoughts in the comments

Back Propagation with TensorFlow