Initialize weights in PyTorch

Last Updated : 09 Feb, 2023

If we are trying to build a neural network then we have to initialize the layers of the network with some initial weights which we try to optimize as the training process of the model goes on. The method by which the weights of a neural network are initialized does affect the time required to reach the optimized solution and solve the problem of vanishing or exploding gradients. In this article, we will try to learn the method by which effective initialization of weights can be done by using the PyTorch machine learning framework.

Why initialize weights?

Initializing the weights of a neural network is a vital step in the training process as appropriate weight initialization is an instrumental factor impacting the convergence and performance of a network. Weights that are initialized to the same value can cause the model to converge to the same suboptimal solution, regardless of the optimization algorithm being used.

Weights that are initialized to large values can lead to vanishing or exploding gradients, depending on the activation function being used. This can cause the model to converge slowly or not at all. Weights that are initialized to small random values can lead to more efficient training, as the optimization algorithm is able to make larger updates to the weights at the beginning of training. Different initialization methods can be more suitable for different types of problems and model architectures.

Using the nn.init Module for Weights Initialization

The PyTorch nn.init module is a conventional way to initialize weights in a neural network, which provides a multitude of weight initialization methods such as:

Uniform initialization
Xavier initialization
Kaiming initialization
Zeros initialization
One’s initialization
Normal initialization

An example implementation of the same is provided below:

Uniform Initialization

Using a uniform distribution to initialize the weights can help prevent the ‘vanishing gradient’ problem, as the distribution has a finite range and the weights are distributed evenly across that range. However, this method can suffer from the ‘exploding gradient’ problem if the range is too large.

Python3

import torch 
  
# Initializing a linear layer with 2 
# independent features and 3 dependent features 
linear_layer = torch.nn.Linear(2, 3) 
  
# Initializing the weights with a uniform distribution 
torch.nn.init.uniform_(linear_layer.weight) 
  
# Displaying the initialized weights 
print(linear_layer.weight) 

Output:

Parameter containing:
tensor([[-0.1768, -0.4942],
       [ 0.0756, -0.0967],
       [-0.3923,  0.3283]], requires_grad=True)

Xavier Initialization

Using Xavier initialization can help prevent the ‘vanishing gradient’ problem, as it scales the weights such that the variance of the outputs of each layer is the same as the variance of the inputs.

Python3

import torch 
  
# Initializing a linear layer with  
# 2 independent features and 3 dependent features 
linear_layer = torch.nn.Linear(2, 3) 
  
# Initializing the weights with the Xavier initialization method 
torch.nn.init.xavier_uniform_(linear_layer.weight) 
  
# Displaying the initialized weights 
print(linear_layer.weight) 

Output:

Parameter containing:
tensor([[ 0.4442, -0.3890],
        [-0.2876, -0.3379],
        [-0.5261,  0.5227]], requires_grad=True)

Kaiming Initialization

Using Kaiming initialization can help prevent the ‘vanishing gradient’ problem, as it scales the weights such that the variance of the outputs is the same as the variance of the inputs, taking into account the nonlinearity of the activation function.

Python3

import torch 
  
# Initializing a linear layer with 
# 2 independent features and 3 dependent features 
linear_layer = torch.nn.Linear(2, 3) 
  
# Initializing the weights with the Kaiming initialization method 
torch.nn.init.kaiming_uniform_(linear_layer.weight, 
                               a=0, mode="fan_in", 
                               nonlinearity="relu") 
  
# Displaying the initialized weights 
print(linear_layer.weight) 

Output:

Parameter containing:
tensor([[ 0.0582,  0.4701],
        [ 0.4982,  0.5452],
        [-0.0384,  0.5999]], requires_grad=True)

Zeros and Ones Initialisation

Initializing the weights to zeros can cause the model to converge slowly, as all of the weights will be updated in the same direction. This can also lead to the ‘vanishing gradient’ problem.

Python3

import torch 
  
# Initializing a linear layer with 
# 2 independent features and 3 dependent features 
linear_layer = torch.nn.Linear(2, 3) 
  
# Initializing the weights with the  
# zeros initialization method 
torch.nn.init.zeros_(linear_layer.weight) 
  
# Displaying the initialized weights 
print(linear_layer.weight) 

Output:

Parameter containing:
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]], requires_grad=True)

Initializing the weights to ones can cause the model to converge slowly, as all of the weights will be updated in the same direction. This can also lead to the ‘exploding gradient’ problem.

Python3

import torch 
  
# Initializing a linear layer with 
# 2 independent features and 3 dependent features 
linear_layer = torch.nn.Linear(2, 3) 
  
# Initializing the weights with the 
# ones initialization method 
torch.nn.init.ones_(linear_layer.weight) 
  
# Displaying the initialized weights 
print(linear_layer.weight) 

Output:

Parameter containing:
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]], requires_grad=True)

Normal Initialisation

Using a normal distribution to initialize the weights can help prevent the ‘exploding gradient’ problem, as the distribution has a finite range and the weights are distributed evenly around the mean. It must be noted that the neural network’s performance is not impacted by the weights alone; the learning rate, the optimization algorithms and the hyperparameters used also play a crucial role in increasing the efficiency of the neural network.

Python3

import torch 
  
# Initializing a linear layer with  
# 2 independent features and 3 dependent features 
linear_layer = torch.nn.Linear(2, 3) 
  
# Initializing the weights with the  
# normal initialization method 
torch.nn.init.normal_(linear_layer.weight, 
                      mean=0, std=1) 
  
# Displaying the initialized weights 
print(linear_layer.weight) 

Output:

Parameter containing:
tensor([[-0.1759,  0.5192],
        [-0.5621, -0.3871],
        [-0.6071,  0.3538]], requires_grad=True)

Applying a Custom Function for Weights Initialization

An alternative method is to create a customized function to initialize the weights, which can be applied to the layer using the apply attribute.

Python3

import torch 
  
# User defined function to initialize the weights 
def custom_weights(m): 
    torch.nn.init.uniform_(m.weight, 
                           -0.5, 0.5) 
  
# Initializing a linear layer with  
# 2 independent features and 3 dependent features 
linear_layer = torch.nn.Linear(2, 3) 
  
# Applying the user defined function to the layer 
linear_layer.apply(custom_weights) 
  
# Displaying the initialized weights 
print(linear_layer.weight) 

Output:

Parameter containing:
tensor([[ 0.4341, -0.3424],
        [ 0.2095,  0.1782],
        [-0.4244,  0.1719]], requires_grad=True)

Using a user-defined Layer Class for Weights Initialization

Another method involves creating a user-defined class that inherits from the torch.nn.Module class. Therein, the constructor can be overridden in order to implement custom weights.

Python3

import torch 
  
# User defined Layer 
class MyLayer(torch.nn.Module): 
  
    # Overriding the constructor 
    def __init__(self, independent, dependent): 
        # Calling the super-class' constructor 
        super(MyLayer, self).__init__() 
        self.linear = torch.nn.Linear(independent, 
                                      dependent) 
        torch.nn.init.uniform_(self.linear.weight, 
                               -0.5, 0.5) 
  
    def forward(self, x): 
        return self.linear(x) 
  
  
# Initializing a linear layer with 
# 2 independent features and 3 dependent features 
linear_layer = MyLayer(2, 3) 
  
# Displaying the initialized weights 
print(linear_layer.linear.weight) 

Output:

Parameter containing:
tensor([[-0.1566,  0.2461],
        [-0.3361, -0.0551],
        [ 0.4607,  0.3077]], requires_grad=True)

In conclusion, initializing the weights of a neural network model is an important step in the training process, as it can have a significant impact on the model’s performance. PyTorch provides several built-in initialization methods, including uniform, normal, Xavier, Kaiming, ones, and zeros. Each of these methods has its own advantages and disadvantages, and the choice of method will depend on the specific problem and model architecture being used. It is important to choose an initialization method that is suitable for the problem at hand, as it can help prevent vanishing or exploding gradient problems and improve the convergence speed and final accuracy of the model.

Suggest improvement

Spectral Co-Clustering Algorithm in Scikit Learn

Master Data Visualization With ggplot2

Share your thoughts in the comments

Initialize weights in PyTorch

Why initialize weights?

Using the nn.init Module for Weights Initialization

Uniform Initialization

Python3

Xavier Initialization

Python3

Kaiming Initialization

Python3

Zeros and Ones Initialisation

Python3

Python3

Normal Initialisation

Python3

Applying a Custom Function for Weights Initialization

Python3

Using a user-defined Layer Class for Weights Initialization

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?