Start learning PyTorch for Beginners

Last Updated : 27 Feb, 2024

Machine Learning helps us to extract meaningful insights from the data. But now, it is capable of mimicking the human brain. This is done using neural networks, which contain the various interconnected layers of nodes containing the data. This data is passed to forward layers. Subsequently, the model learns from the data and predicts output for the new data.

PyTorch helps us to create and train these neural networks that act like our brains and learn from the data.

Table of Content

What is Pytorch?
Why use PyTorch?
How to install Pytorch ?
PyTorch Basics
Autograd: Automatic Differentiation in PyTorch
Neural Networks in PyTorch
Working with Data in PyTorch
Intermediate Topics in PyTorch
Validation and Testing
Frequently Asked Questions

What is Pytorch?

PyTorch is an open-source machine learning library for Python developed by Facebook’s AI Research Lab (FAIR). It is widely used for building deep learning models and conducting research in various fields like computer vision, natural language processing, and reinforcement learning. One of the key features of PyTorch is its dynamic computational graph, which allows for more flexible and intuitive model construction compared to static graph frameworks. PyTorch also offers seamless integration with other popular libraries like NumPy, making it easier to work with tensors and multidimensional arrays.

Why use PyTorch?

It supports tensor computation: Tensor is the data structure that is similar to the networks, array. It is an n-dimensional array that contains the data. We can perform arbitrary numeric computation on these arrays using the APIs.
It provides Dynamic Graph Computation: This feature allows us to define the computational graphs dynamically during runtime. This makes it more flexible than the static computation graphs approach in which where the graph structure is fixed and defined before execution,
It provides the Automatic Differentiation: The Autograd package automatically computes the gradients that are crucial for training the model using optimization algorithms. Thus, we can perform operations on tensors without manually calculating gradients.
It has Support for Python: It has native support for the Python programming language. Thus, we can easily integrate with existing Python workflows and libraries. This is the reason why it is used by the machine learning and data science communities.
It has its production environment: PyTorch has the TorchScript which is the high-performance environment for serializing and executing PyTorch models. You can easily compile PyTorch models into a portable intermediate representation (IR) format. Due to this, we can deploy the model on various platforms and devices without requiring the original Python code.

How to install Pytorch ?

To install PyTorch, you can use the pip package manager, which is the standard tool for installing Python packages. You can install PyTorch using the following command:

pip3 install torch torchvision torchaudio

PyTorch Basics

PyTorch Tensors: Creation, Manipulation, and Operations

The basic building block of PyTorch are the Tensors which are data structures similar to the NumPy Arrays. They are similar to the arrays and matrices that we can use to encode and decode inputs and outputs of a model as well as the model’s parameters. The main difference between the NumPy array and tensors is that tensors can run on tensors can run on GPUs or other hardware accelerators.

Tensor_name = torch.tensor([value 1 , value 2 , ….. Value n ])

Where the ‘torch.tensor() is the method to create the tensors.

The below code snippets show the creation of Tensors and their manipulation through the operations. In this example, we are creating tensor1 and tensor2 to store the data and perform the operations.

Python

import torch
 
# Create a tensor from a list
tensor1 = torch.tensor([1, 2, 3])
print("Tensor from list:", tensor1)
 
# Create a tensor of zeros with shape (2, 3)
tensor2 = torch.zeros(2, 3)
print("Tensor of zeros:", tensor2)
 
# Create a random tensor with shape (3, 2)
tensor3 = torch.rand(3, 2)
print("Random tensor:", tensor3)
 
# Performing operations on Tensors
 
# Addition
result_add = tensor1 + tensor2
print("Addition result:", result_add)
 
 
# Multiplication
result_mul = tensor2 * 5
print("Multiplication result:", result_mul)
 
 
# Matrix multiplication
result_matmul = torch.matmul(tensor2, tensor3)
print("Matrix multiplication result:", result_matmul)

Output:

Tensor from list: tensor([1, 2, 3])
Tensor of zeros: tensor([[0., 0., 0.],
        [0., 0., 0.]])
Random tensor: tensor([[0.9161, 0.3915],
        [0.7185, 0.7726],
        [0.4831, 0.0832]])
Addition result: tensor([[1., 2., 3.],
        [1., 2., 3.]])
Multiplication result: tensor([[0., 0., 0.],
        [0., 0., 0.]])
Matrix multiplication result: tensor([[0., 0.],
        [0., 0.]])

Autograd: Automatic Differentiation in PyTorch

Now, we will shift our focus on Autograd which is one of the most important topics in the PyTorch basics. The Autograd Module of PyTorch provides the automatic calculation of the gradients. It means that we do not need to calculate the gradients explicitly. You might be thinking what gradient is. So, the gradient represents the rate of change of functions with respect to parameters. This helps us to identify the difference between the predicted outputs and actual labels.

Let us take an example to understand this. Suppose, we create two tensors with names ‘x’ and ‘y’ and perform some computation on them. The result is stored in the variable ‘z.’ Then, we can call the backward() method to calculate the gradient of the z with respect to x and y. This is shown in the below code snippet.

Python

# Define tensors with requires_grad=True to track computation history
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
 
# Perform a computation
z = x**2 + y**3
print("Output tensor z:", z)
 
# Compute gradients
z.backward()
print("Gradient of x:", x.grad)
print("Gradient of y:", y.grad)

Output:

Output tensor z: tensor(31., grad_fn=<AddBackward0>)
Gradient of x: tensor(4.)
Gradient of y: tensor(27.)

Neural Networks in PyTorch

Basics of nn.Module and nn.Parameter

The ‘nn.Module’ is a base class in PyTorch for all neural network modules. It includes the trainable parameters and defines the forward method for performing forward-pass computations. Thus, it is responsible for parameter management and submodule management, Serialization, and Loading.

On the other hand, nn.Parameter is the subclass of the torch.Tensor that is responsible for parameter initialization, optimization, and access. The nn.Parameter tensors are defined as attributes within the nn.Module subclass. The nn.Parameter tensors behave like regular tensors but are recognized as model parameters by PyTorch’s Autograd system.

Building Neural Network using PyTorch

Let’s create simple neural network model using the Iris dataset, which is popular dataset for classification tasks. The Iris dataset contains measurements of iris flowers, including sepal length, sepal width, petal length, and petal width, along with their corresponding species (setosa, Versicolor, or Virginia).

Defining Neural Network Architecture

We are first loading the Iris dataset and split it into training and testing sets.
After this, we define simple neural network architecture. Here, we are defining the input layer (fc1) that contains the linear transformation (nn.Linear) to map the input features to the hidden layer.
Then, we have to apply the ReLU activation function (nn.ReLU) to introduce non-linearity in the model.

Python

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
 
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
 
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
 
# Define the neural network architecture
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)  # Input layer
        self.relu = nn.ReLU()                          # Activation function
        self.fc2 = nn.Linear(hidden_size, output_size) # Output layer
         
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

Training a Neural Network on a Simple Dataset

After defining the architecture, we have to train the model. In the following code snippet, we set a random seed for reproducibility, and define the input, hidden, and output sizes of the neural network architecture. After this, we instantiate the neural network model, define the loss function (CrossEntropyLoss) and optimizer (Adam), convert the training data to PyTorch tensors, and train the model for a fixed number of epochs.

During training, we perform forward pass computations to obtain predicted outputs and calculate the loss between predicted and actual labels. Also, we have to update model parameters using the optimizer.

Python

# Set random seed for reproducibility
torch.manual_seed(42)
 
 
# Define the input size, hidden size, and output size of the neural network
input_size = X.shape[1]
hidden_size = 10
output_size = len(iris.target_names)
 
 
# Instantiate the neural network
model = SimpleNN(input_size, hidden_size, output_size)
 
 
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
 
 
# Convert datto PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.LongTensor(y_train)
 
 
# Train the model
num_epochs = 100
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    
    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Print the loss every 10 epochs
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Evaluating the Trained Model

Now, our Model has been trained. We will evaluate the model on the test dataset. For this, we have to convert the test dataset from NumPy Arrays into the PyTorch Sensors using the torch.FloatTensor() and torch.LongTensor(). Then, we pass the test input X_test_tensor through the trained model to obtain the output. At last, these are compared with the actual predicted labels y_test_tensor to calculate the accuracy.

Python

# Evaluate the model
with torch.no_grad():
    X_test_tensor = torch.FloatTensor(X_test)
    y_test_tensor = torch.LongTensor(y_test)
    outputs = model(X_test_tensor)
    _, predicted = torch.max(outputs, 1)
    accuracy = (predicted == y_test_tensor).sum().item() / len(y_test_tensor)
    print(f'Accuracy on the test set: {accuracy:.2f}')

Output:

Epoch [10/100], Loss: 0.7783
Epoch [20/100], Loss: 0.5399
Epoch [30/100], Loss: 0.3921
Epoch [40/100], Loss: 0.2934
Epoch [50/100], Loss: 0.2166
Epoch [60/100], Loss: 0.1639
Epoch [70/100], Loss: 0.1284
Epoch [80/100], Loss: 0.1050
Epoch [90/100], Loss: 0.0902
Epoch [100/100], Loss: 0.0800

Working with Data in PyTorch

The development of Machine Learning involves working with data. Thus, the techniques of efficient data handling are crucial while learning PyTorch. So in this section, we will learn about various data handling techniques like Data Loading and Preprocessing.

Loading Data: Using DataLoader and Dataset

DataLoader and Dataset classes in PyTorch are the main components for loading and iterating over datasets. Among these two, the Datasets class acts as the interface for custom datasets. You have to use the ‘len’ and ‘getitem’ methods to create Custom dataset for model building using PyTorch.

On the other hand, DataLoader iterates over the dataset and fetches batches of samples. After this, it transfers them to the appropriate device (CPU or GPU) so that the model can process them. This is shown in the below code snippet.

Python

import torch
from torch.utils.data import DataLoader, Dataset
 
 
# Custom Dataset class
class CustomDataset(Dataset):
    def __init__(self, data, targets):
        self.data = data
        self.targets = targets
 
    def __len__(self):
        return len(self.data)
 
    def __getitem__(self, idx):
        return self.data[idx], self.targets[idx]
 
 
# Example data
data = torch.randn(100, 3, 32, 32)  # Example image data
targets = torch.randint(0, 10, (100,))  # Example target labels
 
 
# Create custom dataset
custom_dataset = CustomDataset(data, targets)
 
 
# Create DataLoader
batch_size = 32
shuffle = True
num_workers = 4
data_loader = DataLoader(custom_dataset, batch_size=batch_size,
                         shuffle=shuffle, num_workers=num_workers)
 
 
# Iterate over batches
for batch_idx, (inputs, targets) in enumerate(data_loader):
    print(
        f"Batch {batch_idx+1}: Inputs shape: {inputs.shape}, Targets shape: {targets.shape}")

Output:

Batch 1: Inputs shape: torch.Size([32, 3, 32, 32]), Targets shape: torch.Size([32])
Batch 2: Inputs shape: torch.Size([32, 3, 32, 32]), Targets shape: torch.Size([32])
Batch 3: Inputs shape: torch.Size([32, 3, 32, 32]), Targets shape: torch.Size([32])
Batch 4: Inputs shape: torch.Size([4, 3, 32, 32]), Targets shape: torch.Size([4])

Preprocessing Data: Transformations and Normalization

Preprocessing of the data means bringing the data into the standard format so that data can be fitted into the model. Here, the two main methods are Transformation and Normalization. The transformation techniques include various methods including resizing, cropping, rotating, and flipping images.

On the other hand, Normalization means to scale the data in such a way that it has zero mean and unit variance. The aim of this method is to stabilize the training process and improve the model’s efficiency. The preprocessing of data is demonstrated through the following code snippet.

Python

import torchvision.transforms as transforms
 
# Define transformations
transform = transforms.Compose([
    transforms.Resize(256),              # Resize images to 256x256
    transforms.RandomCrop(224),          # Randomly crop images to 224x224
    transforms.RandomHorizontalFlip(),   # Randomly flip images horizontally
    transforms.ToTensor(),               # Convert images to PyTorch tensors
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[
                         0.229, 0.224, 0.225])  # Normalize images
])
 
# Example of applying transformations to image
example_image = transforms.ToPILImage()(
    torch.randn(3, 256, 256))  # Example image tensor
transformed_image = transform(example_image)
 
 
print("Transformed image shape:", transformed_image.shape)

Output:

Transformed image shape: torch.Size([3, 224, 224])

Handling Custom Datasets

Handling the custom dataset means creating a dataset of a specific structure and format.
For this, we have to create a custom dataset class that inherits from the ‘torch.utils.data.Dataset’ class.’ Mainly, the ‘__len__’ and ‘__getitem__’ methods are used to handle the custom dataset.
The ‘__len__’ method returns the total number of samples in the dataset and the ‘__getitem__’ method fetches the sample and its corresponding target. This is shown in the following code snippet.

Python

import torch
from torch.utils.data import Dataset, DataLoader
 
 
# Define custom dataset class by subclassing torch.utils.data.Dataset
class CustomDataset(Dataset):
    def __init__(self, data, targets):
        self.data = data
        self.targets = targets
         
    def __len__(self):
        # Return the total number of samples in the dataset
        return len(self.data)
       
    def __getitem__(self, index):
        # Retrieve and return a sample and its corresponding target based on the given index
        sample = self.data[index]
        target = self.targets[index]
        return sample, target
 
 
# Example data and targets
data = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
targets = torch.tensor([0, 1, 0, 1])
 
# Create instance of the custom dataset
custom_dataset = CustomDataset(data, targets)
 
# Create a data loader to iterate over the dataset in batches
batch_size = 2
data_loader = DataLoader(custom_dataset, batch_size=batch_size, shuffle=True)
 
# Iterate over the data loader to access batches of data
for batch_idx, (samples, targets) in enumerate(data_loader):
    print(f"Batch {batch_idx}:")
    print("Samples:", samples)
    print("Targets:", targets)

Output:

Batch 0:
Samples: tensor([[5, 6],
        [3, 4]])
Targets: tensor([0, 1])
Batch 1:
Samples: tensor([[1, 2],
        [7, 8]])
Targets: tensor([0, 1])

Intermediate Topics in PyTorch

After getting into the basics of PyTorch. Let us now discuss some intermediate topics. These intermediate topics will help you get mastery over the PyTorch and build advanced Machine Learning Models.

Optimizers

Optimizers are algorithms that aims to minimize the loss function. We can use these them to update the parameters of a model during the training process. You might think what parameters may affect the performance of the model. The parameters generally include weights and biases of the neural network. These parameters help us to improve the model’s accuracy. Various Optimizers help us to improve the PyTorch Model which are:

Stochastic Gradient Descent (SGD): It updates parameters in the direction opposite to the gradient of the loss function with respect to the parameters.

Adam: It is based on the adaptive learning rate optimization that computes adaptive learning rates for each parameter. It combines the advantages of AdaGrad and RMSProp.

Adagrad: This algorithm adapts the learning rate of each parameter based on the historical gradients.

Loss Functions

Using Loss Function, we can quantify the difference between the predicted output and of a model and actual target labels. Hence, Loss Functions have to be minimized so that the model can predict the accurate outputs. The following are the most commonly used Loss Function:

Mean Squared Error (MSE): It calculates the average squared difference between predicted and actual values.

Cross-Entropy Loss: This measures the dissimilarity between the predicted probability distribution and the actual distribution of class labels.

Binary Cross-Entropy Loss: This function is the special case of the cross-entropy loss used for binary classification tasks.

Categorical Cross-Entropy Loss: This loss function calculates the cross-entropy loss between the predicted class probabilities and the one-hot encoded target labels.

Validation and Testing

Validation and testing plays important role in evaluating the performance of a trained model so that the model can be used for the generalized data. For Validation the Model, two important terms are used which are called Overfitting and Underfitting.

Overfitting

Overfittng is the condition in which the model performs well on the training data but poorly on unseen data (validation or test data). Also, the model has excessively high variance. It means that it is overly sensitive to small changes in the training data. This condition arises due to over complexity of the model. To prevent this, we can use the techniques such as regularization (e.g., L1 or L2 regularization), dropout, early stopping, or using more training data.

Underfitting

This is the condition when the model performs poorly on both the training data and unseen data. The model has excessively high bias, which means that it is unable to capture the underlying patterns in the data. This condition arises when the model is too simple or less training data is provided. Thus, adding more layers or neurons, providing more relevant features, or training the model for more epochs can prevent Underfitting.

Model Evaluation

This method involves testing of the performance of a trained model on unseen data. This helps us to understand how well our Moedel perform for the unseen instances. Model Evaluation includes the testing the accuracy, precision, recall, F1 score for classification tasks, or mean squared error (MSE) for regression tasks. The main goal is to select the best-performing model for the unseen data.

Conclusion

Learning PyTorch as beginner can be little tricky as it has lot of complex mathematical concepts. In a nutshell, PyTorch is powerful tool for performing Deep Learning operations. The above model was built using the Iris dataset for image classification which can be good point for learning the PyTorch basics as beginner. But when you work with the more complex dataset, you have to optimize the model parameters so that it can generate the accurate output. Now, you have gained sufficient information to get started with PyTorch.

Frequently Asked Questions

Q. What is MNIST Dataset?

MNIST Dataset is the widely used dataset that consists of 28×28 pixel grayscale images of handwritten digits (0 through 9), along with their corresponding labels. The dataset is commonly used for training and testing machine learning models like image classification.

Q. What is the use of Automatic Differentiation in PyTorch?

The gradient of the loss function tells us about how the loss function changes as the parameters are adjusted. Automatic Differentiation allows us to this gradient with respect to the loss function, without any explicit calculation.

Q. What is an optimizer in the Neural Network?

We can use the optimizer to update the parameters (weights and biases) of neural network during the training process, so that we can minimize the loss function and improve the model’s performance. For example, we can use the stochastic gradient descent (SGD), Adam, and RMSprop .

Q. How to define the Loss Function in Neural Network?

The loss function measures the difference between the predicted outputs of neural network and the true labels of the training data. In PyTorch, loss functions are typically defined using modules from the torch.nn module, such as nn.CrossEntropyLoss for classification tasks and nn.MSELoss for regression tasks.

Suggest improvement

How to Start Learning Machine Learning?

How to Design ER Diagrams for Content Management Systems

Share your thoughts in the comments

Start learning PyTorch for Beginners

What is Pytorch?

Why use PyTorch?

How to install Pytorch ?

PyTorch Basics

PyTorch Tensors: Creation, Manipulation, and Operations

Python

Autograd: Automatic Differentiation in PyTorch

Python

Neural Networks in PyTorch

Basics of nn.Module and nn.Parameter

Building Neural Network using PyTorch

Defining Neural Network Architecture

Python

Training a Neural Network on a Simple Dataset

Python

Evaluating the Trained Model

Python

Working with Data in PyTorch

Loading Data: Using DataLoader and Dataset

Python

Preprocessing Data: Transformations and Normalization

Python

Handling Custom Datasets

Python

Intermediate Topics in PyTorch

Optimizers

Loss Functions

Validation and Testing

Overfitting

Underfitting

Model Evaluation

Conclusion

Frequently Asked Questions

Q. What is MNIST Dataset?

Q. What is the use of Automatic Differentiation in PyTorch?

Q. What is an optimizer in the Neural Network?

Q. How to define the Loss Function in Neural Network?

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?