CNN | Introduction to Padding

Problem with Simple Convolution Layers

  • For a gray scale (n x n) image and (f x f) filter/kernel, the dimensions of the image resulting from a convolution operation is (n – f + 1) x (n – f + 1).
    For example, for an (8 x 8) image and (3 x 3) filter, the output resulting after convolution operation would be of size (6 x 6). Thus, the image shrinks every time a convolution operation is performed. This places an upper limit to the number of times such an operation could be performed before the image reduces to nothing thereby precluding us from building deeper networks.
  • Also, the pixels on the corners and the edges are used much less than those in the middle.
    For example,
    Clearly, pixel A is touched in just one convolution operation and pixel B is touched in 3 convolution operations, while pixel C is touched in 9 convolution operations. In general, pixels in the middle are used more often than pixels on corners and edges. Consequently, the information on the borders of images are not preserved as well as the information in the middle.
  • To overcome these problems, we use padding.

Padding Input Images

Padding is simply a process of adding layers of zeros to our input images so as to avoid the problems mentioned above.