Let us begin this article with a basic question – “Why padding and strided convolutions are required?”
Assume we have an image with dimensions of n x n. If it is convoluted with an f x f filter, then the dimensions of the image obtained are .
Consider a 6 x 6 image as shown in figure below. It is to be convoluted with a 3 x 3 filter. The convolution is done using element wise multiplication.
Figure 1: Image obtained after convolution of 6×6 image with a 3×3 filter and s=0
Figure 2: 6 x 6 filter
Figure 3: 3 x 3 filter
Figure 4: Element wise multiplication
But there are two downsides of this convolution:
- By applying the convolutional filter every time, the original image sinks. i.e. the output image has smaller dimensions than the original input image which may lead to information loss.
- Pixels at the corner of the image used in only one of the outputs than pixels in the middle which lead to huge information loss.
In order to avoid it, padding is required. Also, sometimes it happens that we have a very large input image is to be convoluted with an f x f filter which may be computationally very expensive. In this situation, strides are used . That is why padding and strides are one of the most basic building blocks of Convolutional Neural Networks
Dimensions of output image :
Lets have an n x n image to be convoluted with an f x f filter. Assume a padding border of p pixels and a stride s, then the dimensions of the output image obtained are
The stride amount should be selected such that comparatively lesser computations are required and the information loss should be minimum.
Figure 5: Image obtained after convolution of 6×6 image with a 3×3 filter and a stride of 2