Training Neural Networks with Validation using PyTorch

Last Updated : 19 Aug, 2021

Neural Networks are a biologically-inspired programming paradigm that deep learning is built around. Python provides various libraries using which you can create and train neural networks over given data. PyTorch is one such library that provides us with various utilities to build and train neural networks easily. When it comes to Neural Networks it becomes essential to set optimal architecture and hyper parameters. While training a neural network the training loss always keeps reducing provided the learning rate is optimal. But it’s important that our network performs better not only on data it’s trained on but also data that it has never seen before. One way to measure this is by introducing a validation set to keep track of the testing accuracy of the neural network. In this article we’ll how we can keep track of validation accuracy at each training step and also save the model weights with the best validation accuracy.

Installing PyTorch

Installing PyTorch is pretty similar to any other python library. We can use pip or conda to install PyTorch:-

pip install torch torchvision

This command will install PyTorch along with torchvision which provides various datasets, models, and transforms for computer vision. To install using conda you can use the following command:-

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

Loading Data

For this tutorial, we are going to use the MNIST dataset that’s provided in the torchvision library. In Deep Learning we often train our neural networks in batches of a certain size, DataLoader is a data loading utility in PyTorch that creates an iterable over these batches of the dataset. Let’s start by loading our data:-

from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split

transforms = transforms.Compose([
                                 transforms.ToTensor()
])

In the above code, we declared a variable called transform which essentially helps us transform the raw data in the defined format. Here our transform is simply taking the raw data and converting it to a Tensor. A Tensor is a fancy way of saying a n-dimensional matrix.

train = datasets.MNIST('', train = True, transform = transforms, download = True)
train, valid = random_split(train,[50000,10000])

Now we are downloading our raw data and apply transform over it to convert it to Tensors, train tells if the data that’s being loaded is training data or testing data. In the end, we did a split the train tensor into 2 tensors of 50000 and 10000 data points which become our train and valid tensors.

trainloader = DataLoader(train, batch_size=32)
validloader = DataLoader(valid, batch_size=32)

Now we just created our DataLoaders of the above tensors of 32 batch size. Now that we have the data let’s start by creating our neural network.

Building our Model

There are 2 ways we can create neural networks in PyTorch i.e. using the Sequential() method or using the class method. We’ll use the class method to create our neural network since it gives more control over data flow. The format to create a neural network using the class method is as follows:-

from torch import nn

class model(nn.Module):
    def __init__(self):
        # Define Model Here
        
    def forward(self, x):
        # Define Forward Pass Here

So in the __init__() method we define our layers and other variables and in the forward() method we define our forward pass i.e. how data flows through the layers.

import torch
from torch import nn
import torch.nn.functional as F

class Network(nn.Module):
    def __init__(self):
        super(Network,self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(1,-1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = Network()

if torch.cuda.is_available():
    model = model.cuda()

In the above code, we defined a neural network with the following architecture:-

Input Layer: 784 nodes, MNIST images are of dimension 28*28 which have 784 pixels so when flatted it’ll become the input to the neural network with 784 input nodes.
Hidden Layer 1: 256 nodes
Hidden Layer 2: 128 nodes
Output Layer: 10 nodes, for 10 classes i.e. numbers 0-9

nn.Linear() or Linear Layer is used to apply a linear transformation to the incoming data. If you are familiar with TensorFlow it’s pretty much like the Dense Layer.

In the forward() method we start off by flattening the image and passing it through each layer and applying the activation function for the same. After that, we create our neural network instance, and lastly, we are just checking if the machine has a GPU and if it has we’ll transfer our model there for faster computation.

Defining Criterion and Optimizer

Optimizers define how the weights of the neural network are to be updated, in this tutorial we’ll use SGD Optimizer or Stochastic Gradient Descent Optimizer. Optimizers take model parameters and learning rate as the input arguments. There are various optimizers you can try like Adam, Adagrad, etc.

The criterion is the loss that you want to minimize which in this case is the CrossEntropyLoss() which is the combination of log_softmax() and NLLLoss().

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

Training Neural Network with Validation

The training step in PyTorch is almost identical almost every time you train it. But before implementing that let’s learn about 2 modes of the model object:-

Training Mode: Set by model.train(), it tells your model that you are training the model. So layers like dropout etc. which behave differently while training and testing can behave accordingly.
Evaluation Mode: Set by model.eval(), it tells your model that you are testing the model.

Even though you don’t need it here it’s still better to know about them. Now that we have that clear let’s understand the training steps:-

Move data to GPU (Optional)
Clear the gradients using optimizer.zero_grad()
Make a forward pass
Calculate the loss
Perform a backward pass using loss.backward() to calculate the gradients
Take optimizer step using optimizer.step() to update the weights

The validation and Testing steps are also similar but there you just make a forward pass and calculate the loss. A Simple training loop without validation is written like the following:-

epochs = 5

for e in range(epochs):
    train_loss = 0.0
    for data, labels in tqdm(trainloader):
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
        
        # Clear the gradients
        optimizer.zero_grad()
        # Forward Pass
        target = model(data)
        # Find the Loss
        loss = criterion(target,labels)
        # Calculate gradients 
        loss.backward()
        # Update Weights
        optimizer.step()
        # Calculate Loss
        train_loss += loss.item()
    
    print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(trainloader)}')

If you add the validation loop it’ll be the same but with forward pass and loss calculation only. But it may happen that your last iteration isn’t the one that gave you the least validation loss. To tackle this we can set a max valid loss which can be np.inf and if the current valid loss is lesser than we can save the state dictionary of the model which we can load later, like a checkpoint. state_dict is an OrderedDict object that maps each layer to its parameter tensor.

import numpy as np
epochs = 5
min_valid_loss = np.inf

for e in range(epochs):
    train_loss = 0.0
    model.train()     # Optional when not using Model Specific layer
    for data, labels in trainloader:
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
        
        optimizer.zero_grad()
        target = model(data)
        loss = criterion(target,labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    
    valid_loss = 0.0
    model.eval()     # Optional when not using Model Specific layer
    for data, labels in validloader:
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
        
        target = model(data)
        loss = criterion(target,labels)
        valid_loss = loss.item() * data.size(0)

    print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(trainloader)} \t\t Validation Loss: {valid_loss / len(validloader)}')
    if min_valid_loss > valid_loss:
        print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving The Model')
        min_valid_loss = valid_loss
        # Saving State Dict
        torch.save(model.state_dict(), 'saved_model.pth')

After running the above code you should get the following output, although your loss might vary:-

Code

Python3

import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
import numpy as np
 
#Declare transform to convert raw data to tensor
transforms = transforms.Compose([
                                 transforms.ToTensor()
])
 
# Loading Data and splitting it into train and validation data
train = datasets.MNIST('', train = True, transform = transforms, download = True)
train, valid = random_split(train,[50000,10000])
 
# Create Dataloader of the above tensor with batch size = 32
trainloader = DataLoader(train, batch_size=32)
validloader = DataLoader(valid, batch_size=32)
 
# Building Our Mode
class Network(nn.Module):
    # Declaring the Architecture
    def __init__(self):
        super(Network,self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)
 
    # Forward Pass
    def forward(self, x):
        x = x.view(x.shape[0],-1)    # Flatten the images
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
 
model = Network()
if torch.cuda.is_available():
    model = model.cuda()
 
# Declaring Criterion and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
 
# Training with Validation
epochs = 5
min_valid_loss = np.inf
 
for e in range(epochs):
    train_loss = 0.0
    for data, labels in trainloader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
         
        # Clear the gradients
        optimizer.zero_grad()
        # Forward Pass
        target = model(data)
        # Find the Loss
        loss = criterion(target,labels)
        # Calculate gradients 
        loss.backward()
        # Update Weights
        optimizer.step()
        # Calculate Loss
        train_loss += loss.item()
     
    valid_loss = 0.0
    model.eval()     # Optional when not using Model Specific layer
    for data, labels in validloader:
        # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
         
        # Forward Pass
        target = model(data)
        # Find the Loss
        loss = criterion(target,labels)
        # Calculate Loss
        valid_loss += loss.item()
 
    print(f'Epoch {e+1} \t\t Training Loss: {\
    train_loss / len(trainloader)} \t\t Validation Loss: {\
    valid_loss / len(validloader)}')
     
    if min_valid_loss > valid_loss:
        print(f'Validation Loss Decreased({min_valid_loss:.6f\
        }--->{valid_loss:.6f}) \t Saving The Model')
        min_valid_loss = valid_loss
         
        # Saving State Dict
        torch.save(model.state_dict(), 'saved_model.pth')