Training Loop in TensorFlow

Last Updated : 28 Mar, 2024
Training neural networks is at the core of machine learning, and understanding how to write a training loop from scratch is fundamental for any deep learning practitioner and TensorFlow provides powerful tools for building and training neural networks efficiently. In this article, we will get into the process of constructing a training loop using TensorFlow, providing a comprehensive explanation on training the model.

Constructing Training Loop in TensorFlow

A training loop is a repetitive process where the model iteratively learns from the training data to minimize a predefined loss function. Constructing a training loop involves the following steps:

Step 1: Prepare the Dataset

We have illustrated this step with a simple example of training a neural network to classify images from the CIFAR-10 dataset. The CIFAR-10 dataset is loaded, consisting of 50,000 training images and 10,000 testing images, each of size 32×32 pixels with 3 color channels.The pixel values are normalized to the range [0, 1].

import tensorflow as tf
from tensorflow.keras import datasets

# Load CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to range [0, 1]
train_images, test_images = train_images / 255.0, test_images / 255.0

# Print shape of loaded datasets
print("Shape of training images:", train_images.shape)
print("Shape of training labels:", train_labels.shape)
print("Shape of testing images:", test_images.shape)
print("Shape of testing labels:", test_labels.shape)


Shape of training images: (50000, 32, 32, 3)
Shape of training labels: (50000, 1)
Shape of testing images: (10000, 32, 32, 3)
Shape of testing labels: (10000, 1)

Define the Model:

We have defined a convolutional neural network (CNN) using TensorFlow’s Keras API. The model consists of three convolutional layers followed by max-pooling layers for downsampling, and two fully connected (dense) layers for classification.

from tensorflow.keras import layers, models

# Define the model
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Dense(64, activation='relu'),

# Print model summary
print("\nModel Summary:")


Model Summary:
Model: "sequential_1"
Layer (type) Output Shape Param #
conv2d_3 (Conv2D) (None, 30, 30, 32) 896

max_pooling2d_2 (MaxPoolin (None, 15, 15, 32) 0

conv2d_4 (Conv2D) (None, 13, 13, 64) 18496

max_pooling2d_3 (MaxPoolin (None, 6, 6, 64) 0

conv2d_5 (Conv2D) (None, 4, 4, 64) 36928

flatten_1 (Flatten) (None, 1024) 0

dense_2 (Dense) (None, 64) 65600

dense_3 (Dense) (None, 10) 650

Total params: 122570 (478.79 KB)
Trainable params: 122570 (478.79 KB)
Non-trainable params: 0 (0.00 Byte)

The model’s summary provides details about each layer, including the layer type, output shape, and number of parameters. It helps understand the flow of data through the network and the complexity of the model.

Step 3: Define Loss Function and Optimizer

In this step, we will defined a loss function and optimizer for training a neural network. We have chosen Sparse Categorical Crossentropy as the loss function and defined two metrics: train_loss to compute the training loss and train_accuracy to compute the accuracy of the models prediction during training.

# Define loss function and optimizer
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.Adam()

# Define metrics
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

Step 4: Model Training

Finally, we have implemented the training loop, to construct the training loop, we have defined the training step and training loop. Let’s explore the code in detail:

1. Training Step:

  1. We have used the @tf.function decorator to covert the python function into a TensorFlow graph to improve performance.
  2. Inside the train_step function, a gradient tape (tf.GradientTape) is employed to record operations for automatic differentiation.
  3. Predictions are obtained by passing input images through the model in training mode (training=True).
  4. The loss is computed using the specified loss function (loss_fn) by comparing the predicted labels with the true labels.
  5. Gradients of the loss with respect to the model’s trainable variables are computed using the gradient tape.
  6. The optimizer applies these gradients to update the model’s trainable variables.
  7. Additionally, the train_loss and train_accuracy metrics are updated using the computed loss and predictions, respectively.

2. Training Loop:

  1. The training loop iterates over a fixed number of epochs, where each epoch involves iterating over the entire training dataset in batches.
  2. For each batch, the train_step function is called with input images and corresponding labels.
  3. Batches are sliced from the training dataset (train_images and train_labels) based on the specified batch_size.
  4. After each epoch, training metrics are printed for monitoring the training progress.
  5. Finally, the train_loss and train_accuracy metrics are reset for the next epoch using the reset_states() method.
# Define training step
def train_step(images, labels):
    with tf.GradientTape() as tape:
        predictions = model(images, training=True)
        loss = loss_fn(labels, predictions)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    train_accuracy(labels, predictions)

# Training loop
epochs = 10
batch_size = 64
for epoch in range(epochs):
    for batch in range(len(train_images) // batch_size):
        start = batch * batch_size
        end = start + batch_size
        train_step(train_images[start:end], train_labels[start:end])
    # Print metrics
    print(f'Epoch {epoch + 1}, Loss: {train_loss.result()}, Accuracy: {train_accuracy.result() * 100}%')
    # Reset metrics for next epoch


Epoch 1, Loss: 1.6167317628860474, Accuracy: 41.04313278198242%
Epoch 2, Loss: 1.233251690864563, Accuracy: 56.099952697753906%
Epoch 3, Loss: 1.0807808637619019, Accuracy: 62.05986022949219%
Epoch 4, Loss: 0.9831880331039429, Accuracy: 65.49295806884766%
Epoch 5, Loss: 0.9078642129898071, Accuracy: 68.04977416992188%
Epoch 6, Loss: 0.8455548882484436, Accuracy: 70.3905258178711%
Epoch 7, Loss: 0.7960028648376465, Accuracy: 71.96102905273438%
Epoch 8, Loss: 0.7521368265151978, Accuracy: 73.61555480957031%
Epoch 9, Loss: 0.713749885559082, Accuracy: 74.93798065185547%
Epoch 10, Loss: 0.6778918504714966, Accuracy: 76.44245910644531%

Key Components in Model Training using TensorFlow

There are several key components in the training process:

1. Forward Pass

The forward pass refers to the process of passing input data through the neural network to obtain predictions. In the above example, inside the train_step function, the forward pass occurs when the input images are fed into the model using model(images, training=True), which computes the predictions for the given inputs.

2. Loss Computation

After obtaining predictions from the forward pass, the next step is to compute the loss, which quantifies how well the model’s predictions match the true labels. The loss function is responsible for quantifying the difference between the predictions and the actual targets.

The loss function specified in the code (loss_fn) is used to compute the loss between the predicted labels and the true labels. In this case, SparseCategoricalCrossentropy loss computes the cross-entropy loss between the predicted probabilities and the true label indices.

3. Backward Pass (Gradient Calculation)

The backward pass computes the gradients of the loss function with respect to the model parameters. These gradients indicate the direction and magnitude of the parameter updates required to minimize the loss.

Inside the train_step function, a gradient tape is used to record operations for automatic differentiation. During the forward pass, TensorFlow automatically tracks operations involving trainable variables within the gradient tape context. After the loss is computed, gradients of the loss with respect to the model’s trainable variables are calculated using the tape.gradient() method. These gradients represent the sensitivity of the loss to changes in each parameter of the model.

4. Parameter Update

Once the gradients are computed, the optimizer updates the model’s trainable parameters using an optimization algorithm (e.g., Adam, SGD). The optimizer.apply_gradients() method is used to apply the computed gradients to the model’s trainable variables, thereby updating their values to minimize the loss.

These steps are repeated over multiple epochs to train the neural network effectively.


In this article, we’ve walked through the process of constructing a training loop from scratch using TensorFlow. Understanding this process is crucial for building and training neural networks effectively. By mastering this fundamental concept, you’ll have the foundation to tackle more complex deep learning tasks and experiments in the future.

