Open In App

Adjusting Learning Rate of a Neural Network in PyTorch

Last Updated : 22 Jan, 2021
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

Learning Rate is an important hyperparameter in Gradient Descent. Its value determines how fast the Neural Network would converge to minima. Usually, we choose a learning rate and depending on the results change its value to get the optimal value for LR. If the learning rate is too low for the Neural Network the process of convergence would be very slow and if it’s too high the converging would be fast but there is a chance that the loss might overshoot. So we usually tune our parameters to find the best value for the learning rate. But is there a way we can improve this process?

Why adjust Learning Rate?

Instead of taking a constant learning rate, we can start with a higher value of LR and then keep decreasing its value periodically after certain iterations. This way we can initially have faster convergence whilst reducing the chances of overshooting the loss. In order to implement this we can use various scheduler in optim library in PyTorch. The format of a training loop is as following:-

epochs = 10
scheduler = <Any scheduler>

for epoch in range(epochs):
    # Training Steps
     
    # Validation Steps
    
    scheduler.step()

Commonly used Schedulers in torch.optim.lr_scheduler

PyTorch provides several methods to adjust the learning rate based on the number of epochs. Let’s have a look at a few of them:

  • StepLR:  Multiplies the learning rate with gamma every step_size epochs. For example, if lr = 0.1, gamma = 0.1 and step_size = 10 then after 10 epoch lr changes to lr*step_size in this case 0.01 and after another 10 epochs it becomes 0.001.
# Code format:-
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

# Procedure:-
lr = 0.1, gamma = 0.1 and step_size = 10
lr = 0.1               for epoch < 10
lr = 0.01              for epoch >= 10 and epoch < 20
lr = 0.001             for epoch >= 20 and epoch < 30
... and so on
  • MultiStepLR: This is a more customized version of StepLR in which the lr is changed after it reaches one of its epochs. Here we provide milestones that are epochs at which we want to update our learning rate.
# Code format:-
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = MultiStepLR(optimizer, milestones=[10,30], gamma=0.1)

# Procedure:-
lr = 0.1, gamma = 0.1 and milestones=[10,30]
lr = 0.1               for epoch < 10
lr = 0.01              for epoch >= 10 and epoch < 30
lr = 0.001             for epoch >= 30
  • ExponentialLR: This is an aggressive version of StepLR in LR is changed after every epoch. You can think of it as StepLR with step_size = 1.
# Code format:-
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = ExponentialLR(optimizer, gamma=0.1)

# Procedure:-
lr = 0.1, gamma = 0.1
lr = 0.1               for epoch = 1
lr = 0.01              for epoch = 2
lr = 0.001             for epoch = 3
... and so on
  • ReduceLROnPlateau: Reduces learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a patience number of epochs, the learning rate is reduced.
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
scheduler = ReduceLROnPlateau(optimizer, 'min', patience = 5)

# In min mode, lr will be reduced when the metric has stopped decreasing. 
# In max mode, lr will be reduced when the metric has stopped increasing. 

Training Neural Networks using Schedulers

For this tutorial we are going to be using MNIST dataset, so we’ll start by loading our data and defining the model afterwards. Its recommended that you know how to create and train a Neural Network in PyTorch. Let’s start by loading our data.

from torchvision import datasets,transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.ToTensor()
])

train = datasets.MNIST('',train = True, download = True, transform=transform)
valid = datasets.MNIST('',train = False, download = True, transform=transform)

trainloader = DataLoader(train, batch_size= 32, shuffle=True)
validloader = DataLoader(test, batch_size= 32, shuffle=True)

Now that we have our dataloader ready we can now proceed to create our model. PyTorch model follows the following format:-

from torch import nn

class model(nn.Module):
    def __init__(self):
        # Define Model Here
        
    def forward(self, x):
        # Define Forward Pass Here

With that clear let’s define our model:-

import torch
from torch import nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.fc1 = nn.Linear(28*28,256)
        self.fc2 = nn.Linear(256,128)
        self.out = nn.Linear(128,10)
        self.lr = 0.01
        self.loss = nn.CrossEntropyLoss()
    
    def forward(self,x):
        batch_size, _, _, _ = x.size()
        x = x.view(batch_size,-1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.out(x)

model = Net()

# Send the model to GPU if available
if torch.cuda.is_available():
    model = model.cuda()

Now that we have our model we can specify our optimizer, loss function and our lr_scheduler. We’ll be using SGD optimizer, CrossEntropyLoss for loss function and ReduceLROnPlateau for lr scheduler.

from torch.optim import SGD
from torch.optim.lr_scheduler import ReduceLROnPlateau

optimizer = SGD(model.parameters(), lr = 0.1)
loss = nn.CrossEntropyLoss()
scheduler = ReduceLROnPlateau(optimizer, 'min', patience = 5)

Let’s define the training loop. The training loop is pretty much the same as before except this time we’ll call our scheduler step method at the end of the loop.

from tqdm.notebook import trange

epoch = 25
for e in trange(epoch):
    train_loss, valid_loss = 0.0, 0.0
    
    # Set model to training mode
    model.train()
    for data, label in trainloader:
        if torch.cuda.is_available():
            data, label = data.cuda(), label.cuda()

        optimizer.zero_grad()
        target = model(data)
        train_step_loss = loss(target, label)
        train_step_loss.backward()
        optimizer.step()

        train_loss += train_step_loss.item() * data.size(0)

    # Set model to Evaluation mode
    model.eval()
    for data, label in validloader:
        if torch.cuda.is_available():
            data, label = data.cuda(), label.cuda()

        target = model(data)
        valid_step_loss = loss(target, label)

        valid_loss += valid_step_loss.item() * data.size(0)
    
    curr_lr = optimizer.param_groups[0]['lr']

    print(f'Epoch {e}\t \
            Training Loss: {train_loss/len(trainloader)}\t \
            Validation Loss:{valid_loss/len(validloader)}\t \
            LR:{curr_lr}')
    scheduler.step(valid_loss/len(validloader))

As you can see the scheduler kept adjusting lr when the validation loss stopped decreasing.

Code:




import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets,transforms
from torch.utils.data import DataLoader
from torch.optim import SGD
from torch.optim.lr_scheduler import ReduceLROnPlateau
from tqdm.notebook import trange
  
# LOADING DATA
transform = transforms.Compose([
    transforms.ToTensor()
])
  
train = datasets.MNIST('',train = True, download = True, transform=transform)
valid = datasets.MNIST('',train = False, download = True, transform=transform)
  
trainloader = DataLoader(train, batch_size= 32, shuffle=True)
validloader = DataLoader(test, batch_size= 32, shuffle=True)
  
# CREATING OUR MODEL
class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.fc1 = nn.Linear(28*28,64)
        self.fc2 = nn.Linear(64,32)
        self.out = nn.Linear(32,10)
        self.lr = 0.01
        self.loss = nn.CrossEntropyLoss()
      
    def forward(self,x):
        batch_size, _, _, _ = x.size()
        x = x.view(batch_size,-1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.out(x)
  
model = Net()
  
# Send the model to GPU if available
if torch.cuda.is_available():
    model = model.cuda()
  
# SETTING OPTIMIZER, LOSS AND SCHEDULER
optimizer = SGD(model.parameters(), lr = 0.1)
loss = nn.CrossEntropyLoss()
scheduler = ReduceLROnPlateau(optimizer, 'min', patience = 5)
  
# TRAINING THE NEURAL NETWORK
epoch = 25
for e in trange(epoch):
    train_loss, valid_loss = 0.0, 0.0
      
    # Set model to training mode
    model.train()
    for data, label in trainloader:
        if torch.cuda.is_available():
            data, label = data.cuda(), label.cuda()
  
        optimizer.zero_grad()
        target = model(data)
        train_step_loss = loss(target, label)
        train_step_loss.backward()
        optimizer.step()
  
        train_loss += train_step_loss.item() * data.size(0)
  
    # Set model to Evaluation mode
    model.eval()
    for data, label in validloader:
        if torch.cuda.is_available():
            data, label = data.cuda(), label.cuda()
  
        target = model(data)
        valid_step_loss = loss(target, label)
  
        valid_loss += valid_step_loss.item() * data.size(0)
      
    curr_lr = optimizer.param_groups[0]['lr']
  
    print(f'Epoch {e}\t \
            Training Loss: {train_loss/len(trainloader)}\t \
            Validation Loss:{valid_loss/len(validloader)}\t \
            LR:{curr_lr}')
    scheduler.step(valid_loss/len(validloader))




Similar Reads

Understanding PyTorch Learning Rate Scheduling
In the realm of deep learning, PyTorch stands as a beacon, illuminating the path for researchers and practitioners to traverse the complex landscapes of artificial intelligence. Its dynamic computational graph and user-friendly interface have solidified its position as a preferred framework for developing neural networks. As we delve into the nuanc
8 min read
How to Define a Simple Convolutional Neural Network in PyTorch?
In this article, we are going to see how to Define a Simple Convolutional Neural Network in PyTorch using Python. Convolutional Neural Networks(CNN) is a type of Deep Learning algorithm which is highly instrumental in learning patterns and features in images. CNN has a unique trait which is its ability to process data with a grid-like topology wher
5 min read
Select the right Weight for deep Neural Network in Pytorch
PyTorch has developed a strong and adaptable framework for creating deep neural networks (DNNs) in the field of deep learning. Choosing the proper weight for your model is an important component in designing an efficient DNN. Initialization of weights is critical in deciding how successfully your neural network will learn from input and converge to
11 min read
Create Custom Neural Network in PyTorch
PyTorch is a popular deep learning framework, empowers you to build and train powerful neural networks. But what if you need to go beyond the standard layers offered by the library? Here's where custom layers come in, allowing you to tailor the network architecture to your specific needs. This comprehensive guide explores how to create custom layer
5 min read
What are PyTorch Hooks and how are they applied in neural network layers?
PyTorch hooks are a powerful mechanism for gaining insights into the behavior of neural networks during both forward and backward passes. They allow you to attach custom functions (hooks) to tensors and modules within your neural network, enabling you to monitor, modify, or record various aspects of the computation graph. Hooks provides us with a w
7 min read
Building a Convolutional Neural Network using PyTorch
Building a Convolutional Neural Network (CNN) using PyTorch involves several steps, including defining the architecture of the network, preparing the data, training the model, and evaluating its performance. In this article, we will see how we can build a CNN network in PyTorch. Table of Content What are Convolutional Neural Networks?Code Implement
8 min read
Adjusting Title Font Size for a Bokeh Figure
Bokeh is a powerful and flexible visualization library in Python that allows you to create interactive plots and dashboards. When creating visualizations, it's often necessary to customize the appearance of your plots to make them more visually appealing and easier to understand. One such customization is adjusting the title font size of your Bokeh
4 min read
Adjusting Scale Ranges in Altair
Altair is a powerful Python library for creating declarative statistical visualizations. It is built on top of Vega and Vega-Lite, which provide a robust framework for creating complex visualizations with minimal code. One of the key features of Altair is its ability to customize scale ranges, allowing users to fine-tune the appearance of their cha
4 min read
ANN - Self Organizing Neural Network (SONN) Learning Algorithm
Prerequisite: ANN | Self Organizing Neural Network (SONN) In the Self Organizing Neural Network (SONN), learning is performed by shifting the weights from inactive connections to active ones. The neurons which were won are selected to learn along with their neighborhood neurons. If a neuron does not respond for a specific input pattern, then learni
3 min read
Architecture and Learning process in neural network
In order to learn about Backpropagation, we first have to understand the architecture of the neural network and then the learning process in ANN. So, let's start about knowing the various architectures of the ANN: Architectures of Neural Network: ANN is a computational system consisting of many interconnected units called artificial neurons. The co
9 min read
Transformer Neural Network In Deep Learning - Overview
In this article, we are going to learn about Transformers. We'll start by having an overview of Deep Learning and its implementation. Moving ahead, we shall see how Sequential Data can be processed using Deep Learning and the improvement that we have seen in the models over the years. Deep Learning So now what exactly is Deep Learning? But before w
10 min read
Difference between a Neural Network and a Deep Learning System
Since their inception in the late 1950s, Artificial Intelligence and Machine Learning have come a long way. These technologies have gotten quite complex and advanced in recent years. While technological advancements in the Data Science domain are commendable, they have resulted in a flood of terminologies that are beyond the understanding of the av
7 min read
Recursive Neural Network in Deep Learning
Recursive Neural Networks are a type of neural network architecture that is specially designed to process hierarchical structures and capture dependencies within recursively structured data. Unlike traditional feedforward neural networks (RNNs), Recursive Neural Networks or RvNN can efficiently handle tree-structured inputs which makes them suitabl
5 min read
Convolutional Neural Network (CNN) in Machine Learning
Convolutional Neural Networks (CNNs) are a powerful tool for machine learning, especially in tasks related to computer vision. Convolutional Neural Networks, or CNNs, are a specialized class of neural networks designed to effectively process grid-like data, such as images. In this article, we are going to discuss convolutional neural networks (CNN)
13 min read
Siamese Neural Network in Deep Learning
Siamese Neural Networks (SNNs) are a specialized type of neural network designed to compare two inputs and determine their similarity. Unlike traditional neural networks, which process a single input to produce an output, SNNs take two inputs and pass them through identical subnetworks. In this article, we are going to delve more into the fundament
7 min read
Difference Between Reinforcement Learning and a Neural Network
Artificial Intelligence (AI) is a broad field encompassing various techniques and methods to create systems that can perform tasks that usually require human intelligence. Among these methods, Reinforcement Learning (RL) and Neural Networks (NN) are two essential components, each playing a unique role in the AI ecosystem. While they are often used
5 min read
Neural Network Pruning in Deep Learning
As deep learning models have grown larger and more complex, they have also become more resource-intensive in terms of computational power and memory. In many real-world applications, especially on edge devices like mobile phones or embedded systems, these resource-heavy models are not feasible to deploy. This is where neural network pruning comes i
8 min read
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks
Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Neural Networks (NN) are terms often used interchangeably. However, they represent different layers of complexity and specialization in the field of intelligent systems. This article will clarify the Difference between AI vs. machine learning vs. deep learning vs. neural n
6 min read
Difference Between Feed-Forward Neural Networks and Recurrent Neural Networks
Pre-requisites: Artificial Neural Networks and its Applications Neural networks are artificial systems that were inspired by biological neural networks. These systems learn to perform tasks by being exposed to various datasets and examples without any task-specific rules. In this article, we will see the difference between Feed-Forward Neural Netwo
2 min read
Training Neural Networks using Pytorch Lightning
Introduction: PyTorch Lightning is a library that provides a high-level interface for PyTorch. Problem with PyTorch is that every time you start a project you have to rewrite those training and testing loop. PyTorch Lightning fixes the problem by not only reducing boilerplate code but also providing added functionality that might come handy while t
7 min read
Training Neural Networks with Validation using PyTorch
Neural Networks are a biologically-inspired programming paradigm that deep learning is built around. Python provides various libraries using which you can create and train neural networks over given data. PyTorch is one such library that provides us with various utilities to build and train neural networks easily. When it comes to Neural Networks i
8 min read
How to implement neural networks in PyTorch?
Neural networks can be created and trained in Python with the help of the well-known open-source PyTorch framework. This tutorial will teach you how to use PyTorch to create a basic neural network and classify handwritten numbers from the MNIST dataset. Modern artificial intelligence relies on neural networks, which give machines the ability to lea
13 min read
Graph Neural Networks with PyTorch
Graph Neural Networks (GNNs) represent a powerful class of machine learning models tailored for interpreting data described by graphs. This is particularly useful because many real-world structures are networks composed of interconnected elements, such as social networks, molecular structures, and communication systems. In this article, we will see
4 min read
Implementing Recurrent Neural Networks in PyTorch
Recurrent Neural Networks (RNNs) are a class of neural networks that are particularly effective for sequential data. Unlike traditional feedforward neural networks, RNNs have connections that form directed cycles, allowing them to maintain a hidden state that can capture information from previous inputs. This makes them suitable for tasks such as t
8 min read
Visualizing PyTorch Neural Networks
Visualizing neural network models is a crucial step in understanding their architecture, debugging, and conveying their design. PyTorch, a popular deep learning framework, offers several tools and libraries that facilitate model visualization. This article will guide you through the process of visualizing a PyTorch model using two powerful librarie
4 min read
How to Visualize PyTorch Neural Networks
Visualizing neural networks is crucial for understanding their architecture, debugging, and optimizing models. PyTorch offers several ways to visualize both simple and complex neural networks. In this article, we'll explore how to visualize different types of neural networks, including a simple feedforward network, a larger network with multiple la
7 min read
PyTorch vs PyTorch Lightning
The PyTorch research team at Facebook AI Research (FAIR) introduced PyTorch Lightning to address these challenges and provide a more organized and standardized approach. In this article, we will see the major differences between PyTorch Lightning and Pytorch. Table of Content PytorchPytorch Lightning: Advanced Framework of PytorchPytorch vs Pytorch
9 min read
How to disable GPU in PyTorch (force Pytorch to use CPU instead of GPU)?
PyTorch is a deep learning framework that offers GPU acceleration. This enables the users to utilize the GPU's processing power. The main goal is to accelerate the training and interference processes of deep learning models. PyTorch automatically utilizes the GPU for operations and this leads to quicker computation times. Using the GPU for PyTorch
5 min read
Impact of learning rate on a model
In Machine Learning, there are two types of parameters: machine learnable parameters and hyper-parameters. Machine-learnable parameters are estimated by the algorithm during training, while hyper-parameters are set by the data scientist or ML engineer to regulate how the algorithm learns and modifies the model's performance. One such hyper-paramete
6 min read
Learning Rate Decay
Imagine you're looking for a coin you dropped in a big room. At first, you take big steps, covering a lot of ground quickly. But as you get closer to the coin, you take tinier steps to look more precisely. This is similar to how learning rate decay works in machine learning. In training a machine learning model, the "learning rate" decides how much
12 min read
Practice Tags :