ML | Transfer Learning with Convolutional Neural Networks

Transfer learning as a general term refers to reusing the knowledge learned from one task for another. Specifically for convolutional neural networks (CNNs), many image features are common to a variety of datasets (e.g. lines, edges are seen in almost every image). It is for this reason that, especially for large structures, CNNs are very rarely trained completely from scratch as large datasets and heavy computational resources are hard to come by.
A common pretraining dataset used is the ImageNet dataset, consisting of 1.2 million images. The actual model used varies from task to task (many times, people just choose what performs best on the ImageNet challenge), but ResNet50 model in used this article. The pre-trained model can often be found through whatever library is being used which, in this case, is Keras.

ResNet Introduction
ResNet was initially designed as a method to solve the vanishing gradient problem. This is a problem where backpropagated gradients become extremely small as they’re multiplied over and over again, limiting the size of a neural network. The ResNet architecture attempts to solve that by employing skip connections, that is adding shortcuts that allow data to skip past layers.

The model consists of a series of convolutional layers + skip connections, then average pooling, then an output fully connected (dense) layer. For transfer learning, we only want the convolutional layers as those to contain the features we’re interested in, so we would want to omit them when importing the model. Finally, because we’re removing the output layers, we then need to replace them with our own series of layers.



Problem Statement
To show the process of transfer learning, I’ll be using the Caltech-101 dataset, an image dataset with 101 categories and about 40-800 images per category.

Data Processing

First download and extract the dataset here. Make sure to remove the “BACKGROUND_Google” folder after extraction.

Code : To properly evaluate, we need to split the data into training and testing sets as well. Here, we need to split within each category to ensure proper representation in the test set.

filter_none

edit
close

play_arrow

link
brightness_4
code

TEST_SPLIT = 0.2
VALIDATION_SPLIT = 0.2
  
import os
import math
  
# stores test data
os.mkdir("caltech_test"
  
for cat in os.listdir("101_ObjectCategories/"):
  # moves x portion of images per category into test images
  # new category folder
  os.mkdir("caltech_test/"+cat) 
  imgs = os.listdir("101_ObjectCategories/"+cat) 
  # all image filenames
  split = math.floor(len(imgs)*TEST_SPLIT) 
  test_imgs = imgs[:split]
  # move test portion
  for t_img in test_imgs: 
    os.rename("101_ObjectCategories/"+cat+"/"+t_img, 
              "caltech_test/"+cat+"/"+t_img)

chevron_right


Output:

This above code creates the file structure:

101_ObjectCategories/
-- accordion
-- airplanes
-- anchor
-- ...
caltech_test/
-- accordion
-- airplanes
-- anchor
-- ...

The first folder contains the train images, the second contains test images. Each subfolder includes images belonging to that category. To input the data, we’re going to use Keras’s ImageDataGenerator class. ImageDataGenerator allows for the easy processing of image data, having options for augmentation as well.

filter_none

edit
close

play_arrow

link
brightness_4
code

# make sure to match original model's preprocessing function
from keras.applications.resnet50 import preprocess_input 
from keras.preprocessing.image import ImageDataGenerator
  
train_gen = ImageDataGenerator(
        validation_split = 0.2
        preprocessing_function = preprocess_input)
train_flow = train_gen.flow_from_directory("101_ObjectCategories/"
                                           target_size =(256, 256), 
                                           batch_size = 32
                                           subset ="training")
  
valid_flow = train_gen.flow_from_directory("101_ObjectCategories/"
                                           target_size =(256, 256), 
                                           batch_size = 32
                                           subset ="validation")
  
test_gen = ImageDataGenerator(
        preprocessing_function = preprocess_input)
test_flow = test_gen.flow_from_directory("caltech_test"
                                         target_size =(256, 256), 
                                         batch_size = 32)

chevron_right


The above code takes the file path of the image directory and creates an object for data generation.

Model Building
Code : To add the base pretrained model.

filter_none

edit
close

play_arrow

link
brightness_4
code

from keras.applications.resnet50 import ResNet50
from keras.layers import GlobalAveragePooling2D, Dense
from keras.layers import BatchNormalization, Dropout
from keras.models import Model
  
# by default, the loaded model will include the original CNN 
#classifier designed for the ImageNet dataset
# since we want to reuse this model for a different problem,
# we need to omit the original fully connected layers, and 
# replace them with our own setting include_top = False will
# load the model without the fully connected layer
  
# load resnet model, with pretrained imagenet weights.
res = ResNet50(weights ='imagenet', include_top = False
               input_shape =(256, 256, 3)) 

chevron_right


This dataset is relatively small at around 5628 images after splitting, with most categories having only 50 images, so fine-tuning the convolutional layers may result in overfitting. Our new dataset is pretty similar to the ImageNet dataset, so we can be confident that a lot of the pre-trained weights have the correct features as well. So, we can freeze those trained convolutional layers so they aren’t changed when we train the rest of the classifier. If you have a smaller dataset that is significantly different from the original, fine-tuning may still cause overfitting, but the later layers wouldn’t contain the correct features. So, you could again freeze the convolutional layers but only use the output from earlier layers as those contain more general features. With a large dataset, you don’t need to worry about overfitting, so you can often fine-tune the entire network.

filter_none

edit
close

play_arrow

link
brightness_4
code

from keras.applications.resnet50 import ResNet50
from keras.layers import GlobalAveragePooling2D, Dense
from keras.layers import BatchNormalization, Dropout
from keras.models import Model
  
# by default, the loaded model will include the original CNN 
#classifier designed for the ImageNet dataset
# since we want to reuse this model for a different problem,
# we need to omit the original fully connected layers, and 
# replace them with our own setting include_top = False will
# load the model without the fully connected layer
  
# load resnet model, with pretrained imagenet weights.
res = ResNet50(weights ='imagenet', include_top = False
               input_shape =(256, 256, 3)) 

chevron_right


Now, we can add the rest of the classifier. This takes the output from the pre-trained convolutional layers and inputs it into a separate classifier that gets trained on the new dataset.



filter_none

edit
close

play_arrow

link
brightness_4
code

# get the output from the loaded model
x = res.output 
  
# avg. pools across the spatial dimensions (rows, columns) 
# until it becomes zero. Reshapes data into a 1D, allowing 
# for proper input shape into Dense layers 
# (e.g. (8, 8, 2048) -> (2048)).
x = GlobalAveragePooling2D()(x) 
  
# subtracts batch mean and divides by batch standard deviation
# to reduce shift in input distributions between layers. 
x = BatchNormalization()(x) 
  
# dropout allows layers to be less dependent on
# certain features, reducing overfitting
x = Dropout(0.5)(x) 
x = Dense(512, activation ='relu')(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
  
# output classification layer, we have 101 classes, 
# so we need 101 output neurons
x = Dense(101, activation ='softmax')(x) 
  
# create the model, setting input / output
model = Model(res.input, x) 
  
# compile the model - we're training using the Adam Optimizer
# and Categorical Cross Entropy as the loss function
model.compile(optimizer ='Adam'
              loss ='categorical_crossentropy'
              metrics =['accuracy']) 
  
# structure of our model
model.summary() 

chevron_right


Code : Train the model

filter_none

edit
close

play_arrow

link
brightness_4
code

model.fit_generator(train_flow, epochs = 5, validation_data = valid_flow)

chevron_right



Output:

Epoch 1/5
176/176 [==============================] - 27s 156ms/step - loss: 1.6601 - acc: 0.6338 - val_loss: 0.3799 - val_acc: 0.8922
Epoch 2/5
176/176 [==============================] - 19s 107ms/step - loss: 0.4637 - acc: 0.8696 - val_loss: 0.2841 - val_acc: 0.9225
Epoch 3/5
176/176 [==============================] - 19s 107ms/step - loss: 0.2777 - acc: 0.9211 - val_loss: 0.2714 - val_acc: 0.9225
Epoch 4/5
176/176 [==============================] - 19s 107ms/step - loss: 0.2223 - acc: 0.9327 - val_loss: 0.2419 - val_acc: 0.9284
Epoch 5/5
176/176 [==============================] - 19s 106ms/step - loss: 0.1784 - acc: 0.9461 - val_loss: 0.2499 - val_acc: 0.9239

Code: To evaluate the test set

filter_none

edit
close

play_arrow

link
brightness_4
code

result = model.evaluate(test_flow)
  
print('The model achieved a loss of %.2f and,'
      'accuracy of %.2f%%.' % (result[0], result[1]*100))

chevron_right


Output:

53/53 [==============================] - 5s 95ms/step
The model achieved a loss of 0.23 and accuracy of 92.80%.

For a 101 class dataset, we have achieved a 92.8% accuracy after only 5 epochs. For perspective, the original ResNet was trained on an ~1 million image dataset, for 120 epochs.
There are a couple of things that could be improved upon. For one, looking at the discrepancy between validation loss and training loss in the last epoch, you can see that the model is starting to overfit. One way to solve this is to add image augmentation. Simple image augmentation can be easily implemented with the ImageDataGenerator class. You could also play around with adding/removing layers or changing hyperparameters such as the dropout or the size of the Dense layer.
Run this code here with Google Colab’s free GPU compute resources.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.