Inception V1 (or GoogLeNet) was the state-of-the-art architecture at ILSRVRC 2014. It has produced the record lowest error at ImageNet classification dataset but there are some points on which improvement can be made to improve the accuracy and decrease the complexity of the model.
Problems of Inception V1 architecture:
Inception V1 have sometimes use convolutions such as 5*5 that causes the input dimensions to decrease by a large margin. This causes the neural network some accuracy decrease. The reason behind that the neural network is susceptible to information loss if the input dimension decreases too drastically.
Furthermore, there is also complexity decrease when we use bigger convolutions like 5×5 as compared to 3×3.We can go further in terms of factorization i.e. that we can divide a 3×3 convolution into an asymmetric convolution of 1×3 then followed by 3×1 convolution. This is equivalent to sliding a two-layer network with the same receptive field as in a 3×3 convolution but 33% more cheaper than 3×3. This factorization does not work well for early layers when input dimensions are big but only when the input size mxm (m is between 12 and 20). According to the Inception V1 architecture, the auxiliary classifier improves the convergence of the network. They argue that it can help reduce the effect of the vanishing gradient problem in deep network by pushing the useful gradient to earlier layers (to reduce the loss). But, the authors of this paper found that this classifier didn’t improve the convergence very much early in the training.
Architectural Changes in Inception V2:
In the Inception V2 architecture. The 5×5 convolution is replaced by the two 3×3 convolutions. This also decreases computational time and thus increase computational speed because a 5×5 convolution is 2.78 more expensive than 3×3 convolution. So, Using two 3×3 layers instead of 5×5 increases the performance of architecture.
This architecture also converts nXn factorization into 1xn and nx1 factorization. As we discuss above that a 3×3 convolution can be converted into 1×3 then followed by 3×1 convolution which is 33% cheaper in terms of computational complexity as compared to 3×3.
To deal with the problem of the representational bottleneck, the feature banks of the module were expanded instead of making it deeper. This would prevent the loss of information that causes when we make it deeper.
Architectural Changes in Inception V3:
Inception V3 is similar to and contains all the features of Inception V2 with following changes/additions:
- Use of RMSprop optimizer.
- Batch Normalization in the fully connected layer of Auxiliary classifier.
- Use of 7×7 factorized Convolution
- Label Smoothing Regularization: It is a method to regularize the classifier by estimating the effect of label-dropout during training. It prevents the classifier to predict too confidently a class. The addition of label smoothing gives 0.2% improvement from the error rate.
Below is the layer-by-layer details of Inception V2:
The above architecture takes image input of size (299,299,3). Notice in the above architecture figures 5, 6, 7 refers to figure 1, 2, 3 in this article.
In this section we will look into the implementation of Inception V3. We will using Keras applications API to load the module We are using Cats vs Dogs dataset for this implementation.
Code: Importing the required module.
Code: Creating directories in order to prepare for the dataset
Code: Storing the dataset in the directories created above and plot some sample images.
Code: Data augmentation to increase the data samples in dataset.
Code: Define the base model using Inception API we imported above and callback function to train the model.
In this step, we train our model but before training, we need to change the last layer so that it can predict only one output and use the optimizer function for training. Here we used RMSprop with a learning rate of 0.0001. We also add a dropout 0.2 after the last fully connected layer. After that, we train the model up to 100 epochs.
Code: Plot the training and validation accuracy along with training and validation loss.
The best performing Inception V3 architecture reported top-5 error of just 5.6% and top-1 error of 21.2% for a single crop on ILSVRC 2012 classification challenge which is the new state-of-the-art. On multiple crops(144 crops) it reported top-5 and top-1 error rate rate of 4.2% and 18.77% on ILSVRC 2012 classification benchmark.
An ensemble of Inception V3 architecture reported a top-5 error rate of 3.46% ILSVRC 2012 validation set (3.58% on ILSVRC 2012 test set).