Open In App

Understanding PyTorch Learning Rate Scheduling

In the realm of deep learning, PyTorch stands as a beacon, illuminating the path for researchers and practitioners to traverse the complex landscapes of artificial intelligence. Its dynamic computational graph and user-friendly interface have solidified its position as a preferred framework for developing neural networks. As we delve into the nuances of model training, one essential aspect that demands meticulous attention is the learning rate. To navigate the fluctuating terrains of optimization effectively, PyTorch introduces a potent ally—the learning rate scheduler. This article aims to demystify the PyTorch learning rate scheduler, providing insights into its syntax, parameters, and indispensable role in enhancing the efficiency and efficacy of model training.

PyTorch Learning Rate Scheduler

PyTorch, an open-source machine learning library, has gained immense popularity for its dynamic computation graph and ease of use. Developed by Facebook’s AI Research lab (FAIR), PyTorch has become a go-to framework for building and training deep learning models. Its flexibility and dynamic nature make it particularly well-suited for research and experimentation, allowing practitioners to iterate swiftly and explore innovative approaches in the ever-evolving field of artificial intelligence.



What is Learning Rate Scheduler?

At the heart of effective model training lies the learning rate—a hyperparameter crucial for controlling the step size during optimization. PyTorch provides a sophisticated mechanism, known as the learning rate scheduler, to dynamically adjust this hyperparameter as the training progresses. The syntax for incorporating a learning rate scheduler into your PyTorch training pipeline is both intuitive and flexible. At its core, the scheduler is integrated into the optimizer, working hand in hand to regulate the learning rate based on predefined policies. The typical syntax for implementing a learning rate scheduler involves instantiating an optimizer and a scheduler, then stepping through epochs or batches, updating the learning rate accordingly. The versatility of the scheduler is reflected in its ability to accommodate various parameters, allowing practitioners to tailor its behavior to meet specific training requirements.

Parameters and their Significance

Need for Learning Rate Scheduler

The importance of learning rate schedulers becomes evident when considering the dynamic nature of model training. As models traverse complex loss landscapes, a fixed learning rate may hinder convergence or cause overshooting. Learning rate schedulers address this challenge by adapting the learning rate based on the model’s performance during training. This adaptability is crucial for avoiding divergence, accelerating convergence, and facilitating the discovery of optimal model parameters.



Demonstrating PyTorch Learning Rate Scheduling

Colab link: Learning rate scheduler

Importing libraries




import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import StandardScaler

Loading dataset

You can download the dataset from here.




df = pd.read_csv("breast-cancer.csv")
df.head()

Output:

         id diagnosis  radius_mean  texture_mean  perimeter_mean  area_mean  \
0 842302 M 17.99 10.38 122.80 1001.0
1 842517 M 20.57 17.77 132.90 1326.0
2 84300903 M 19.69 21.25 130.00 1203.0
3 84348301 M 11.42 20.38 77.58 386.1
4 84358402 M 20.29 14.34 135.10 1297.0
smoothness_mean compactness_mean concavity_mean concave points_mean \
0 0.11840 0.27760 0.3001 0.14710
1 0.08474 0.07864 0.0869 0.07017
2 0.10960 0.15990 0.1974 0.12790
3 0.14250 0.28390 0.2414 0.10520
4 0.10030 0.13280 0.1980 0.10430
... radius_worst texture_worst perimeter_worst area_worst \
0 ... 25.38 17.33 184.60 2019.0
1 ... 24.99 23.41 158.80 1956.0
2 ... 23.57 25.53 152.50 1709.0
3 ... 14.91 26.50 98.87 567.7
4 ... 22.54 16.67 152.20 1575.0
smoothness_worst compactness_worst concavity_worst concave points_worst \
0 0.1622 0.6656 0.7119 0.2654
1 0.1238 0.1866 0.2416 0.1860
2 0.1444 0.4245 0.4504 0.2430
3 0.2098 0.8663 0.6869 0.2575
4 0.1374 0.2050 0.4000 0.1625
symmetry_worst fractal_dimension_worst
0 0.4601 0.11890
1 0.2750 0.08902
2 0.3613 0.08758
3 0.6638 0.17300
4 0.2364 0.07678
[5 rows x 32 columns]

Data extraction and encoding




X = df.drop(["diagnosis", "id"],axis=1)
y= df['diagnosis']
y = y.map({'M':1, 'B':0})

Train test split and stadardisation




X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2, random_state=2)
scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)
X_test_std = scaler.transform(X_test)

Tensor dataset and Dataloader




X_train_std_tensor = torch.FloatTensor(X_train_std)
Y_train_tensor = torch.FloatTensor(Y_train.values).view(-1, 1)
 
X_test_std_tensor = torch.FloatTensor(X_test_std)
Y_test_tensor = torch.FloatTensor(Y_test.values).view(-1, 1)
 
train_dataset = TensorDataset(X_train_std_tensor, Y_train_tensor)
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)

Model creation




model = nn.Sequential(
    nn.Linear(30, 64),  # Input layer with 30 features, hidden layer with 64 units
    nn.ReLU(),
    nn.Linear(64, 32),  # Hidden layer with 32 units
    nn.ReLU(),
    nn.Linear(32, 1),   # Output layer with 1 unit (for binary classification)
    nn.Sigmoid()
)

Loss function and optimizer




criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Learning Rate Scheduler




scheduler = lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
 
num_epochs = 50

Training Loop




# Training loop
for epoch in range(num_epochs):
    model.train()
 
    for inputs, targets in train_loader:
        outputs = model(inputs)
        targets = targets.unsqueeze(1).float()  # Fix the shape of the targets
        loss = criterion(outputs, targets.view(-1, 1))
 
 
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
 
    # Adjust learning rate
    scheduler.step()
 
    # Print loss for monitoring
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')

Output:

Epoch [1/50], Loss: 0.5196633338928223
Epoch [2/50], Loss: 0.29342177510261536
Epoch [3/50], Loss: 0.19762122631072998
Epoch [4/50], Loss: 0.19884507358074188
Epoch [5/50], Loss: 0.028389474377036095
Epoch [6/50], Loss: 0.007852290757000446
Epoch [7/50], Loss: 0.040723469108343124
Epoch [8/50], Loss: 0.04233770817518234
Epoch [9/50], Loss: 0.2953278720378876
Epoch [10/50], Loss: 0.020912442356348038

Evaluation metrics




model.eval()
with torch.no_grad():
    test_outputs = model(X_test_std_tensor)
    test_predictions = (test_outputs >= 0.5).float()  # Convert probabilities to binary predictions
 
    # Evaluation metrics (you can use appropriate metrics based on your problem)
    accuracy = (test_predictions == Y_test_tensor).float().mean().item()
    print(f'Test Accuracy: {accuracy}')

Output:

Test Accuracy: 0.9561403393745422

The provided test accuracy of approximately 95.6% suggests that the trained neural network model performs well on the test set.

Applications of PyTorch learning rate schedulers

The applications of PyTorch learning rate schedulers are multifaceted. They play a pivotal role in fine-tuning models for specific tasks, improving convergence speed, and aiding in the exploration of diverse hyperparameter spaces. Learning rate schedulers find particular relevance in scenarios where the loss landscape is non-uniform, and traditional fixed learning rates prove suboptimal. Applications range from image classification and object detection to natural language processing, where the ability to dynamically adjust the learning rate can be a game-changer in achieving superior model performance.


Article Tags :