Convolutional neural networks are very powerful in image classification and recognition tasks. CNN models learn features of the training images with various filters applied at each layer. The features learned at each convolutional layer significantly vary. It is an observed fact that initial layers predominantly capture edges, the orientation of image and colours in the image which are low-level features. With an increase in the number of layers, CNN captures high-level features which help differentiate between various classes of images.
To understand how convolutional neural networks learn spatial and temporal dependencies of an image, different features captured at each layer can be visualized in the following manner.
To visualize the features at each layer, Keras Model class is used. It allows the model to have multiple outputs. It maps given a list of input tensors to list of output tensors.
tf.keras.Model()Arguments: inputs: It can be a single input or a list of inputs which are objects of
keras.Inputclass outputs: Output/ List of outputs.
Considering a dataset with images of cats and dogs, we build a convolutional neural network and add a classifier on top of it, to recognize the image given as either a cat or a dog.
Training images and Validation images are loaded into a data generator using
The class mode is considered as ‘Binary’ and Batch size is considered as 20. The target size of the image is fixed as (150, 150).
Step 2: Architecture of the model
A combination of two-dimensional convolutional layers and max-pooling layers are added, a dense classification layer is also added on top of it. For the final Dense layer, Sigmoid activation function is used as it is a two-class classification problem.
Output: Model Summary
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_1 (Conv2D) (None, 148, 148, 32) 896 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 72, 72, 64) 18496 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 34, 34, 128) 73856 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 15, 15, 128) 147584 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 7, 7, 128) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 6272) 0 _________________________________________________________________ dense_1 (Dense) (None, 512) 3211776 _________________________________________________________________ dense_2 (Dense) (None, 1) 513 ================================================================= Total params: 3, 453, 121 Trainable params: 3, 453, 121 Non-trainable params: 0
Step 3: Compiling and training the model on cats and dogs dataset
Loss function: Binary cross Entropy
Step 4: Visualizing intermediate activations (Output of each layer)
Consider an image which is not used for training, i.e., from test data, store the path of image in a variable ‘image_path’.
Code: Using Keras Model class to get outputs of each layer
First layer activation shape: (1, 148, 148, 32) Sixth channel of first layer activation: Fifteenth channel of first layer activation:
As already discussed, initial layers identify low-level features. The 6th channel identifies edges in the image, whereas, the fifteenth channel identifies the colour of the eyes.
Code: The names of the eight layers in our model
Layer names: ['conv2d_1', 'max_pooling2d_1', 'conv2d_2', 'max_pooling2d_2', 'conv2d_3', 'max_pooling2d_3', 'conv2d_4', 'max_pooling2d_4']
Layer 6: max_pooling2d_3
Initial layers are more interpretable and retain the majority of the features in the input image. As the level of the layer increases, features become less interpretable, they become more abstract and they identify features specific to the class leaving behind the general features of the image.