Open In App

How to implement neural networks in PyTorch?

Last Updated : 08 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Neural networks can be created and trained in Python with the help of the well-known open-source PyTorch framework. This tutorial will teach you how to use PyTorch to create a basic neural network and classify handwritten numbers from the MNIST dataset.

Modern artificial intelligence relies on neural networks, which give machines the ability to learn and make judgments that are akin to those made by humans. Regression, classification and creation are just a few of the tasks that neural networks, as computer models, may do after learning from input. The popular open-source PyTorch framework may be used to design and train neural networks in Python. In this tutorial, you will learn how to use PyTorch to classify handwritten numbers from the MNIST dataset using a rudimentary neural network.

How to Create a Neural Network in PyTorch?

Via the nn.Module class or the nn.Sequential container, PyTorch offers two primary methods for building neural networks. If you subclass the nn.Module class and implement the __init__ and forward functions, you may construct your own unique network. Whereas the forward function specifies how input is transferred through levels and returned as output, the __init__ method establishes the network’s layers and parameters. If you supply a list of layers as parameters, the nn.Sequential container lets you establish a network. After being assigned an order, the layers are automatically joined. Several modules and methods offered by PyTorch make neural network implementation in Python simple.

The primary actions to take are as follows:

  • Bring in (import) all required modules, including torch, torch.nn and torch.optim.
  • Describe the data including the target labels and input feature sets. You may build your own tensors or utilize the built-in datasets in PyTorch.
  • Describe the architecture of the neural network including its number and kind of layers, activation functions and output size. You may subclass torch.nn.Module to construct your own unique layers, or you can utilize PyTorch preset layers, such torch.nn.Linear, torch.nn.Conv2d or torch.nn.LSTM.
  • Specify the loss function (torch.nn.MSELoss, torch.nn.CrossEntropyLoss, torch.nn.BCELoss, etc.). How closely the network’s output resembles the goal is gauged by the loss function.
  • Specify the optimizer (torch.optim.SGD, torch.optim.Adam or torch.optim.RMSprop). Utilizing the gradient and learning rate – the optimizer modifies the weights of the network.
  • To train the network run the forward and backward passes and apply the optimizer with loop over the data. By publishing the loss or additional metrics, such accuracy or precision you can keep an eye on how well the training is going.
  • Test the network using fresh data, such as a validation or test set, to assess its performance. Moreover, torch.save and torch.load allow you to load and save the state of the network.

Implementing Feedforward Neural Network for MNIST

For a better understanding, let’s see how to create neural networks in PyTorch. Please be aware that these are only brief samples that you might expand and alter to suit your needs; they are not comprehensive solutions. In this example, handwritten digits from the MNIST dataset are classified using a simple feedforward neural network.

  • A straightforward feedforward neural network with two completely connected layers is what we define in this example. When a weight matrix and bias vector are used to link each input and output unit the layer is said to be completely linked.
  • The first layer produces 512 features after receiving the flattened picture (28×28 pixels) as input. Ten classes, or the numbers 0 through 9, are produced by the second layer using the 512 characteristics as input.
  • To generate the completely linked layers and provide them as network object characteristics, we utilize the nn.Linear class. In order to give the network some non-linearity and aid in its ability to learn intricate patterns. We additionally apply the ReLU activation function to the first layer using the F.relu function.
  • The input picture is only flattened in the forward approach which then applies the first layer the ReLU function and the second layer. A tensor of ten logits for each class is the network output.

Step 1: Import the necessary libraries

Python




# Import the necessary libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np


Step 2 : Define the hyperparameters and transformation

The provided code defines hyperparameters and a transformation to apply to images in a machine learning context. Hyperparameters, including batch_size, num_epochs, and learning_rate, are initialized to control the training process. Additionally, a transformation pipeline, transform, is defined to preprocess input images. This pipeline employs two sequential transformations: transforms.ToTensor() converts the images into PyTorch tensors, a requisite format for neural network computations, while transforms.Normalize() standardizes the pixel values by subtracting the mean (0.1307) and dividing by the standard deviation (0.3081).

Python




# Define the hyperparameters
batch_size = 64 # The number of samples per batch
num_epochs = 10 # The number of times to iterate over the whole dataset
learning_rate = 0.01 # The learning rate for the optimizer
 
# Define the transformation to apply to the images
transform = transforms.Compose([
    transforms.ToTensor(), # Convert the images to tensors
    transforms.Normalize((0.1307,), (0.3081,)) # Normalize the pixel values with mean and std
])


Step 3 : Load and prepare the dataset

The provided code loads the MNIST dataset from the web, consisting of handwritten digit images and their corresponding labels. It initializes two datasets: train_dataset for training data and test_dataset for testing data. Both datasets are configured with transformations defined earlier, enabling image tensor conversion and pixel value normalization. Subsequently, data loaders, train_loader and test_loader, are created to facilitate batching and shuffling of data during training and testing phases, respectively.

Python




# Load the MNIST dataset from the web
train_dataset = datasets.MNIST(root='.', train=True, download=True, transform=transform) # The training set
test_dataset = datasets.MNIST(root='.', train=False, download=True, transform=transform) # The test set
 
# Create the data loaders for batching and shuffling the data
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True) # The training loader
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False) # The test loader


Output:

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:00<00:00, 78077039.54it/s]
Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 65021843.17it/s]Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 22545472.73it/s]
Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 12298598.30it/s]Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw

Step 4 : Define the neural network model

We define a simple neural network class Net using PyTorch’s nn.Module. The network consists of two fully connected layers (fc1 and fc2). Here’s a breakdown of the code:

  • __init__(self): This is the constructor method where the network architecture is defined. It initializes two fully connected layers using nn.Linear. The first layer (fc1) takes an input of size 28*28 (assuming the input images are 28×28 pixels and flattened into a vector) and outputs 512 features. The second layer (fc2) takes the 512 features from the first layer as input and outputs 10 classes (assuming this is a classification task with 10 classes).
  • forward(self, x): This method defines the forward pass of the network. It takes an input tensor x (representing an image batch) and performs the following operations:
    • Flattens the input tensor into a vector using x.view(-1, 28*28).
    • Passes the flattened input through the first fully connected layer (fc1) and applies the ReLU activation function using F.relu(self.fc1(x)).
    • Passes the output of the first layer through the second fully connected layer (fc2) to get the final output logits.

Python




# Define the neural network model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # The network has two fully connected layers
        self.fc1 = nn.Linear(28*28, 512) # The first layer takes the flattened image as input and outputs 512 features
        self.fc2 = nn.Linear(512, 10) # The second layer takes the 512 features as input and outputs 10 classes
 
    def forward(self, x):
        # The forward pass of the network
        x = x.view(-1, 28*28) # Flatten the image into a vector
        x = F.relu(self.fc1(x)) # Apply the ReLU activation function to the first layer
        x = self.fc2(x) # Apply the second layer
        return x # Return the output logits


Step 5 : Define the loss function, the optimizer and instance of the model

The provided code segment initializes the neural network model, moves it to the available device (either CPU or GPU), and defines the loss function along with the optimizer.

Python




# Create an instance of the model and move it to the device (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Get the device
model = Net().to(device) # Move the model to the device
print(model) # Print the model summary
 
# Define the loss function and the optimizer
criterion = nn.CrossEntropyLoss() # The cross entropy loss for multi-class classification
optimizer = optim.SGD(model.parameters(), lr=learning_rate) # The stochastic gradient descent optimizer
 
# Define a function to calculate the accuracy of the model
def accuracy(outputs, labels):
    # The accuracy is the percentage of correct predictions
    _, preds = torch.max(outputs, 1) # Get the predicted classes from the output logits
    return torch.sum(preds == labels).item() / len(labels) # Return the ratio of correct predictions


Output:

Net(
  (fc1): Linear(in_features=784, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=10, bias=True)
)

Step 6 : Define the training and test loop

  1. train(model, device, train_loader, criterion, optimizer, epoch): This function trains the model using the training data. It sets the model to training mode, loops over batches of data from the train_loader, moves the inputs and labels to the specified device, performs a forward pass through the model to get the output logits, calculates the loss using the specified criterion, performs a backward pass to compute gradients, and updates the model parameters using the specified optimizer. It also prints the average loss and accuracy over the batches.
  2. test(model, device, test_loader, criterion): This function evaluates the model using the test data. It sets the model to evaluation mode, loops over batches of data from the test_loader, moves the inputs and labels to the specified device, performs a forward pass through the model to get the output logits, calculates the loss using the specified criterion, and prints the average loss and accuracy over the batches.

Python




# Define the training loop
def train(model, device, train_loader, criterion, optimizer, epoch):
    # Set the model to training mode
    model.train()
    # Initialize the running loss and accuracy
    running_loss = 0.0
    running_acc = 0.0
    # Loop over the batches of data
    for i, (inputs, labels) in enumerate(train_loader):
        # Move the inputs and labels to the device
        inputs = inputs.to(device)
        labels = labels.to(device)
        # Zero the parameter gradients
        optimizer.zero_grad()
        # Forward pass
        outputs = model(inputs) # Get the output logits from the model
        loss = criterion(outputs, labels) # Calculate the loss
        # Backward pass and optimize
        loss.backward() # Compute the gradients
        optimizer.step() # Update the parameters
        # Print the statistics
        running_loss += loss.item() # Accumulate the loss
        running_acc += accuracy(outputs, labels) # Accumulate the accuracy
        if (i + 1) % 200 == 0: # Print every 200 batches
            print(f'Epoch {epoch}, Batch {i + 1}, Loss: {running_loss / 200:.4f}, Accuracy: {running_acc / 200:.4f}')
            running_loss = 0.0
            running_acc = 0.0
 
# Define the test loop
def test(model, device, test_loader, criterion):
    # Set the model to evaluation mode
    model.eval()
    # Initialize the loss and accuracy
    test_loss = 0.0
    test_acc = 0.0
    # Loop over the batches of data
    with torch.no_grad(): # No need to track the gradients
        for inputs, labels in test_loader:
            # Move the inputs and labels to the device
            inputs = inputs.to(device)
            labels = labels.to(device)
            # Forward pass
            outputs = model(inputs) # Get the output logits from the model
            loss = criterion(outputs, labels) # Calculate the loss
            # Print the statistics
            test_loss += loss.item() # Accumulate the loss
            test_acc += accuracy(outputs, labels) # Accumulate the accuracy
    # Print the average loss and accuracy
    print(f'Test Loss: {test_loss / len(test_loader):.4f}, Test Accuracy: {test_acc / len(test_loader):.4f}')


Step 7 : Train and test the model along Visualize some sample images and predictions

This code segment trains and tests the model for the specified number of epochs and then visualizes some sample images along with their predictions.

Python




# Train and test the model for the specified number of epochs
for epoch in range(1, num_epochs + 1):
    train(model, device, train_loader, criterion, optimizer, epoch) # Train the model
    test(model, device, test_loader, criterion) # Test the model
 
# Visualize some sample images and predictions
samples, labels = next(iter(test_loader)) # Get a batch of test data
samples = samples.to(device) # Move the samples to the device
outputs = model(samples) # Get the output logits from the model
_, preds = torch.max(outputs, 1) # Get the predicted classes from the output logits
samples = samples.cpu().numpy() # Move the samples back to CPU and convert to numpy array
fig, axes = plt.subplots(3, 3, figsize=(8, 8)) # Create a 3x3 grid of subplots
for i, ax in enumerate(axes.ravel()):
    ax.imshow(samples[i].squeeze(), cmap='gray') # Plot the image
    ax.set_title(f'Label: {labels[i]}, Prediction: {preds[i]}') # Set the title
    ax.axis('off') # Hide the axes
plt.tight_layout() # Adjust the spacing
plt.show() # Show the plot


Output:

Epoch 1, Batch 200, Loss: 1.1144, Accuracy: 0.7486
Epoch 1, Batch 400, Loss: 0.4952, Accuracy: 0.8739
Epoch 1, Batch 600, Loss: 0.3917, Accuracy: 0.8903
Epoch 1, Batch 800, Loss: 0.3515, Accuracy: 0.9042
Test Loss: 0.3018, Test Accuracy: 0.9155
Epoch 2, Batch 200, Loss: 0.3067, Accuracy: 0.9123
Epoch 2, Batch 400, Loss: 0.2929, Accuracy: 0.9168
Epoch 2, Batch 600, Loss: 0.2878, Accuracy: 0.9185
Epoch 2, Batch 800, Loss: 0.2735, Accuracy: 0.9210
Test Loss: 0.2471, Test Accuracy: 0.9314
Epoch 3, Batch 200, Loss: 0.2580, Accuracy: 0.9256
Epoch 3, Batch 400, Loss: 0.2442, Accuracy: 0.9301
Epoch 3, Batch 600, Loss: 0.2354, Accuracy: 0.9338
Epoch 3, Batch 800, Loss: 0.2281, Accuracy: 0.9359
Test Loss: 0.2130, Test Accuracy: 0.9403
Epoch 4, Batch 200, Loss: 0.2149, Accuracy: 0.9403
Epoch 4, Batch 400, Loss: 0.2055, Accuracy: 0.9441
Epoch 4, Batch 600, Loss: 0.2050, Accuracy: 0.9395
Epoch 4, Batch 800, Loss: 0.2018, Accuracy: 0.9425
Test Loss: 0.1860, Test Accuracy: 0.9465
Epoch 5, Batch 200, Loss: 0.1925, Accuracy: 0.9464
Epoch 5, Batch 400, Loss: 0.1850, Accuracy: 0.9473
Epoch 5, Batch 600, Loss: 0.1813, Accuracy: 0.9481
Epoch 5, Batch 800, Loss: 0.1753, Accuracy: 0.9503
Test Loss: 0.1691, Test Accuracy: 0.9517
Epoch 6, Batch 200, Loss: 0.1719, Accuracy: 0.9521
Epoch 6, Batch 400, Loss: 0.1599, Accuracy: 0.9557
Epoch 6, Batch 600, Loss: 0.1627, Accuracy: 0.9521
Epoch 6, Batch 800, Loss: 0.1567, Accuracy: 0.9562
Test Loss: 0.1549, Test Accuracy: 0.9547
Epoch 7, Batch 200, Loss: 0.1441, Accuracy: 0.9620
Epoch 7, Batch 400, Loss: 0.1474, Accuracy: 0.9587
Epoch 7, Batch 600, Loss: 0.1447, Accuracy: 0.9601
Epoch 7, Batch 800, Loss: 0.1426, Accuracy: 0.9580
Test Loss: 0.1404, Test Accuracy: 0.9602
Epoch 8, Batch 200, Loss: 0.1360, Accuracy: 0.9627
Epoch 8, Batch 400, Loss: 0.1359, Accuracy: 0.9620
Epoch 8, Batch 600, Loss: 0.1304, Accuracy: 0.9631
Epoch 8, Batch 800, Loss: 0.1322, Accuracy: 0.9634
Test Loss: 0.1308, Test Accuracy: 0.9624
Epoch 9, Batch 200, Loss: 0.1152, Accuracy: 0.9690
Epoch 9, Batch 400, Loss: 0.1188, Accuracy: 0.9674
Epoch 9, Batch 600, Loss: 0.1303, Accuracy: 0.9637
Epoch 9, Batch 800, Loss: 0.1236, Accuracy: 0.9645
Test Loss: 0.1234, Test Accuracy: 0.9633
Epoch 10, Batch 200, Loss: 0.1112, Accuracy: 0.9679
Epoch 10, Batch 400, Loss: 0.1120, Accuracy: 0.9707
Epoch 10, Batch 600, Loss: 0.1158, Accuracy: 0.9681
Epoch 10, Batch 800, Loss: 0.1138, Accuracy: 0.9688
Test Loss: 0.1145, Test Accuracy: 0.9665

Output

Conclusion

This post taught us how to identify handwritten digits from the MNIST dataset using a basic neural network that we could build in PyTorch. We also learned how to use the nn.Module class, the nn.Sequential container, the loss function, the optimizer, and the data loader to build, train and test a neural network in PyTorch. You may create and test out different neural network models using PyTorch a strong and adaptable framework. The PyTorch website has many materials and lessons.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads