Open In App

Apply a 2D Convolution Operation in PyTorch

Last Updated : 21 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

A 2D Convolution operation is a widely used operation in computer vision and deep learning. It is a mathematical operation that applies a filter to an image, producing a filtered output (also called a feature map). In this article, we will look at how to apply a 2D Convolution operation in PyTorch.

PyTorch provides a convenient and efficient way to apply 2D Convolution operations. It provides functions for performing operations on tensors (PyTorch’s implementation of arrays), and it also provides functions for building deep learning models.

Convolutions are a fundamental concept in computer vision and image processing. They are mathematical operations that take an input signal (such as an image) and produce a transformed output signal that highlights certain features of the input. Convolutional neural networks (ConvNets or CNNs) are deep learning models that are built using convolutions as a core component. 

In the context of PyTorch, the meaning of 1D, 2D, and 3D convolutions is determined by the dimensionality of the input data that the convolution applied.1D Convolutions are applied to 1D input signals such as 1D arrays, sequences, or time series. In this case, the convolution kernel (or filter) slides along the input signal and performs element-wise multiplication and accumulation at each position to produce the output signal.2D Convolutions are applied to 2D input signals such as grayscale or color images. In this case, the convolution kernel slides over the 2D input array, performs element-wise multiplication and accumulation at each position, and produces a 2D output signal.3D Convolutions are applied to 3D input signals such as video or volumetric data. In this case, the convolution kernel slides over the 3D input array, performs element-wise multiplication and accumulation at each position, and produces a 3D output signal.

A convolution operation is a mathematical operation that is widely used in image processing and computer vision. It involves applying a convolution kernel, also known as a filter, to an image. The filter acts as a sliding window over the image, computing the dot product of its values with the underlying image pixels at each step.

Mathematically, a convolution operation can be represented as:

(f * g)(t) = \int_{-\infty}^{\infty} f(\tau) g(t-\tau) d\tau

Where f and g are functions representing the image and the filter respectively, and * denotes the convolution operator.

2D convolution in PyTorch 

Syntax of  Conv2d() :

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=’zeros’, device=None, dtype=None)

Cout is determined by the number of filters used in the convolutional layer.

in_channels (int) – Number of channels in the input image.

out_channels (int) – Number of channels produced by the convolution.

kernel_size (int or tuple) – Size of the convolving kernel.

bias (bool, optional) – If True, adds a learnable bias to the output. Default: True.

stride : controls the stride for the cross-correlation, a single number or a tuple.

padding : controls the amount of padding applied to the input. It can be either a string {‘valid’, ‘same’} or a tuple of ints giving the amount of implicit padding applied on both sides.

dilation : controls the spacing between the kernel points; also known as the à trous algorithm.  It is harder to describe, but this link has a nice visualization of what dilation does.

groups : controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups.

For 2D convolution in PyTorch, we apply the convolution operation by using the simple formula :

\text{Output} = bias_j+    \sum_{k = 0}^{C_{\text{in}} - 1} \text{kernel}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)

The input shape refers to the dimensions of a single data sample in a batch. The shape is defined as (N, Cin, Hin, Win), where:

  • N is the batch size or number of samples in the batch
  • Cin is the number of channels in the input data
  • Hin is the height of the input data
  • Win is the width of the input data


The output shape refers to the dimensions of the output from a convolutional layer. The shape is defined as (N, Cout, Hout, Wout), where:

  • N is the batch size or number of samples in the batch
  • Cout is the number of channels in the output data
  • Hout is the height of the output data
  • Wout is the width of the output data
    These shapes can be determined mathematically based on the kernel size, stride, and padding of the convolutional layer. The formula for Hout is:

h_{out} = \left[\frac{h_{in} - \text{kernel\_size[0]}+ 2 * \text{padding[0]}}{\text{stride[0]}} + 1 \right]

Similarly, the formula for Wout is:

w_{out} = \left[\frac{w_{in} - \text{kernel\_size[1]}+ 2 * \text{padding[1]}}{\text{stride[1]}} + 1 \right]

Let’s consider this with an example, Here we define a custom image of shape 4X4 and kernel 3X3 and bias 1X1.

 

Find the output shape by using the above formula.

Python3

import numpy as np
import torch
  
# Define the filter
kernel = torch.tensor(
    [[0, -1, 0],
     [-1, 5, -1], 
     [0, -1, 0]], dtype=torch.float32)
  
# Define the bias
bias = torch.tensor([5], dtype=torch.float32)
  
# Define the input image
image = torch.tensor(
    [[1, 2, 3, 4], 
     [5, 6, 7, 8], 
     [9, 10, 11, 12],
     [13, 14, 15, 16]], dtype=torch.float32)
  
def Output_shape(image, kernel, padding, stride):
    h,w = image.shape[-2],image.shape[-1]
    k_h, k_w = kernel.shape[-2],kernel.shape[-1]
  
    h_out = (h-k_h-2*padding)//stride[0] +1
    w_out = (w-k_w-2*padding)//stride[1] +1
    return h_out,w_out
  
Output_shape(image, kernel, padding=0, stride=(1,1))

                    

Output:

(2, 2)

Let’s apply the convolution operation by using the simple formula 

\text{Output} = bias_j+    \sum_{k = 0}^{C_{\text{in}} - 1} \text{kernel}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)

Python3

output_shape = Output_shape(image, kernel, padding=0, stride=(1,1))
output = np.zeros(output_shape)
  
for i in range(output_shape[0]):
    for j in range(output_shape[1]):
        output[i,j]=torch.tensordot(image[i:3+i,j:3+j],kernel).numpy() +bias.numpy()
          
output

                    

Output:

array([[11., 12.],
       [15., 16.]])

Example 1:

We’ll start by creating a 2D Convolution operation that applies a filter to an image.

The code defines the filter using a 3×3 tensor and the input image using a 4×4 tensor. The nn.Conv2d function creates a 2D Convolution operation, and we specify the number of input and output channels, the size of the kernel, and whether or not to include a bias term in the calculation. we don’t include a bias in the code. 

We then set the filter kernel for the convolution operation using the conv. weight parameter and bias by conv.bias, and apply the operation to the input image using the conv function. The resulting output is a tensor that represents the filtered image. 

Python3

import torch
import torch.nn as nn
import torch.nn.functional as F
  
# Define the filter
kernel = torch.tensor(
    [[0, -1, 0],
     [-1, 5, -1], 
     [0, -1, 0]], dtype=torch.float32)
kernel = kernel.reshape(1, 1, 3, 3)
  
# Define the bias
bias = torch.tensor([5], dtype=torch.float32)
  
# Define the input image
image = torch.tensor(
    [[1, 2, 3, 4], 
     [5, 6, 7, 8], 
     [9, 10, 11, 12],
     [13, 14, 15, 16]], dtype=torch.float32)
image = image.reshape(1, 1, 4, 4)
  
# Define the convolution operation
conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, bias=False)
  
# Set the filter for the convolution operation
conv.weight = nn.Parameter(kernel)
conv.bias = nn.Parameter(bias)
# Apply the convolution operation
output = conv(image)
  
# Print the output
print('Output Shape :',output.shape)
print('Output \n',output)

                    

Output:

Output Shape : torch.Size([1, 1, 2, 2])
Output 
 tensor([[[[11., 12.],
          [15., 16.]]]], grad_fn=<ConvolutionBackward0>)

As you can see, the 2D Convolution operation has produced a filtered output. In this case, the output is a tensor with shape (1, 1, 2, 2).

Example 2:

Let’s try with real image

Python3

import torch
import torch.nn as nn
import torchvision.transforms as T
from PIL import Image
  
# Read the image file
image = Image.open('pawan.jpeg')
    
# convert input image to torch tensor
Input = T.ToTensor()(image)
    
# unsqueeze image to make 4D
Input = Input.unsqueeze(0)
print('Input Tensor :',Input.shape)
  
  
# Define the convolution operation
conv = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3,stride=2, bias=True)
  
output = conv(Input)
  
print('Output Tensor :',output.shape)
# squeeze image
Out_img = output.squeeze(0)
  
# convert tensor to image
Out_img = T.ToPILImage()(Out_img)
Out_img

                    

Output:

Input Tensor : torch.Size([1, 3, 460, 460])
Output Tensor : torch.Size([1, 3, 229, 229])

Output Image

In this case, the output image pixel value will change every time, we haven’t defined the weight and bias. So, it is taking a random value. but the output shape will be the same every time.

We can calculate the output shape by using the formula also :

\begin{aligned} h_{out} & =\left[\frac{h_{in}  - \text{kernel\_size[0]} + 2 * \text{padding[0]} }{\text{stride[0]}} + 1\right] \\&=\left [\frac{460 -3 + 2 * 0}{2} + 1\right] \\ &= \left[\frac{460 -3 + 0}{2} + 1\right] \\ &= \left[\frac{457}{2} + 1\right] \\ &= \left [228.5+1 \right] \\ &= 229.5 \\ & \approx 229 \end{aligned}

\begin{aligned} w_{out} & =\left[\frac{w_{in}  - \text{kernel\_size[1]} + 2 * \text{padding[1]} }{\text{stride[1]}} + 1\right] \\&=\left [\frac{460 -3 + 2 * 0}{2} + 1\right] \\ &= \left[\frac{460 -3 + 0}{2} + 1\right] \\ &= \left[\frac{457}{2} + 1\right] \\ &= \left [228.5+1 \right] \\ &= 229.5 \\ & \approx 229 \end{aligned}

In this article, we looked at how to apply a 2D Convolution operation in PyTorch. We defined a filter and an input image and created a 2D Convolution operation using PyTorch’s nn.Conv2d function set the filter for the operation and applied the operation to the input image to produce a filtered output.

By using PyTorch’s convenient and efficient functions for performing 2D Convolution operations, we can easily build deep learning models that incorporate this important operation.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads