Transfer Learning with Fine-tuning

Last Updated : 18 Feb, 2024

Transfer learning is a machine learning approach that involves utilizing pre-trained models to address specific tasks. The knowledge acquired from one task is utilized to enhance the performance on a related task. In natural language processing, transfer learning techniques have revolutionized the way language models are developed. For example, knowledge gained from learning positive or negative sentiments of the SST (Stanford Sentiment Treebank) dataset could be applied when trying to recognize sentiment in product reviews. The article explores the fundamentals of transfer learning and demonstrates how to apply transfer learning by fine-tuning.

What is Transfer Learning?

The basic ideology of the feature is that the features learned by the pre-trained model on a large dataset can be generalized and useful for other tasks, even if the new task has a different dataset. Some of the most popular pre-trained models used for NLP tasks are Word2Vec, GloVe, FastText, GPT and more.

Working of Transfer Learning

The process typically involves taking a pre-trained model, removing its last layers, and replacing them with new layers. The initial layers of the pre-trained model are fine-tuned with a small learning rate to preserve the learned representations. They help in capturing the general features. The newly added layers are then trained using the new dataset specific to the target task.

Benefits of using Transfer Learning

Reduces the amount of training time required for a new task.
The knowledge of the pre-trained dataset can be generalised in their understanding of different domain-related tasks.
Small datasets are prone to overfitting, by using the transfer learning approach helps to mitigate this issue by starting with the learned features.
Building a model from scratch is computationally expensive and transfer learning helps to reduce the training time.

Fine Tuning in NLP

Fine-tuning refers to taking a pre-trained model and further training it on a new dataset. Fine-tuning involves training the entire model, including the initial layers. The learning rate used for the initial layers is set to a small value to prevent significant changes. While the later layers make use of a higher learning rate to adapt to the new dataset.

Both transfer learning and fine-tuning are widely used in natural language processing. They offer practical solutions to overcome limitations posed by small datasets and allow for the efficient development of deep learning models with improved performance.

Fine-Tuning in NLP

Using Transfer Learning for Sentiment Analysis

For the understanding of Transfer Learning and Fine Tuning on an NLP model, let us consider a BERT model. BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art language representation model developed by Google. It is designed to capture contextual relationships and meanings of words within sentences or documents. There is an explicit train-test split. Fine-tuning the BERT model is a custom task using a small sample of input texts and corresponding labels. The input_texts list contains two examples, and the labels list contains their corresponding labels. This example can help us classify positive and negative comments.

Install the transformers library.

!pip install transformers

Importing Libraries and Dataset

In the code, ‘tokenizer’ refers to the BERT tokenizer. The tokenizer is responsible for converting input text into numerical representations that can be understood by the BERT model.

Python3

import transformers
import torch
from transformers import AdamW
from transformers import BertTokenizer,\
BertForSequenceClassification

By importing the torch module, you’ll be able to use the necessary functionalities from the PyTorch library. By importing AdamW from transformers, you’ll be able to use it as the optimizer for fine-tuning the BERT model.

Load the pre-trained model

In this step, the pre-trained BERT model is loaded. It is then with our own dataset loading.

Python3

pretrained_model_name = 'bert-base-uncased'

Transfer Learning

The code starts by loading the pre-trained BERT model (bert-base-uncased) using the BertForSequenceClassification.from_pretrained() method. This model has been pre-trained on a large corpus to learn general language representations and contextualized word embeddings. By loading this pre-trained model, we are leveraging the knowledge and insights gained from its pre-training.

Python3

tokenizer = BertTokenizer.from_pretrained(pretrained_model_name)
model = BertForSequenceClassification.from_pretrained(pretrained_model_name)

Tokenize and Encode the Data

Tokenization is a common step in NLP that helps in preparing the text data for further processing. The code utilizes the BERT tokenizer to tokenize the input texts. The tokenizer splits the text into tokens and performs additional tasks such as adding special tokens, truncating or padding the sequences to a fixed length and generating attention masks.

Python3

input_texts = ['This is a positive review.',
               'This is a negative review.']
labels = [1, 0]
 
input_ids = []
attention_masks = []
 
for text in input_texts:
    encoded_dict = tokenizer.encode_plus(
        text,
        add_special_tokens=True,
        max_length=128,
        padding='max_length',
        truncation=True,
        return_tensors='pt'
    )
 
    input_ids.append(encoded_dict['input_ids'])
    attention_masks.append(encoded_dict['attention_mask'])
 
input_ids = torch.cat(input_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)
labels = torch.tensor(labels)

Fine Tuning the BERT Model

Fine-tuning refers to the process of adapting the pre-trained BERT model to a specific downstream task. Fine-tuning involves training the BERT model on a task-specific dataset with labeled examples.

After loading the pre-trained BERT model (BertForSequenceClassification.from_pretrained()), the optimizer (AdamW) and the loss function (CrossEntropyLoss) are defined. The model is put into training mode using model.train(). This ensures that the model is set to train and update its parameters during the fine-tuning process. The training loop runs for a specified number of epochs. For each epoch, the loop iterates through the dataset (dataloader) to obtain batches of input data. Within the loop, the optimizer’s gradient is set to zero using optimizer.zero_grad() to clear any previous gradients.

Python3

batch_size = 2
epochs = 3
optimizer = AdamW(model.parameters(), lr=2e-5)
model.train()
 
for epoch in range(epochs):
    for i in range(0, input_ids.size(0), batch_size):
        batch_input_ids = input_ids[i:i+batch_size]
        batch_attention_masks = attention_masks[i:i+batch_size]
        batch_labels = labels[i:i+batch_size]
 
        optimizer.zero_grad()
 
        outputs = model(
            input_ids=batch_input_ids,
            attention_mask=batch_attention_masks,
            labels=batch_labels
        )
 
        loss = outputs.loss
        loss.backward()
        optimizer.step()

Model Predictions

tokenizer.encode_plus() is a method provided by the Hugging Face Transformers library’s tokenizer class. It is used to tokenize and encode a given input text or pair of texts into numerical representations that can be understood by the BERT model. Use the fine-tuned BERT model for predictions:

Python

model.eval()
 
test_texts = ['This is another review.',
              'I am not sure about this.']
test_input_ids = []
test_attention_masks = []
 
for text in test_texts:
    encoded_dict = tokenizer.encode_plus(
        text,
        add_special_tokens=True,
        max_length=128,
        padding='max_length',
        truncation=True,
        return_tensors='pt'
    )
 
    test_input_ids.append(encoded_dict['input_ids'])
    test_attention_masks.append(encoded_dict['attention_mask'])
 
test_input_ids = torch.cat(test_input_ids, dim=0)
test_attention_masks = torch.cat(test_attention_masks, dim=0)
 
with torch.no_grad():
    outputs = model(
        input_ids=test_input_ids,
        attention_mask=test_attention_masks
    )
 
predicted_labels = torch.argmax(outputs.logits, dim=1)

This says the overall output of the label that has to be predicted:

Python3

for text, label in zip(test_texts, predicted_labels):
    print(f'Text: {text}\nPredicted Label: {label.item()}\n')

Output:

Text: This is another review.
Predicted Label: 1
Text: I am not sure about this.
Predicted Label: 1

The following steps are to be followed to demonstrate the overall working of transfer learning with fine tuning using an already built-in model of BERT. The output model given above shows this output:

The output of the fine-tuned BERT model in the provided code example is expected to be better than a normal (untrained) model because of the following reasons:

Pre-training on Large-Scale Data: The pre-trained BERT model has been trained on a massive amount of text data, such as Wikipedia articles, to learn general language representations. This pre-training allows the model to capture a deep understanding of language patterns and semantics, which can be beneficial for a wide range of NLP tasks.
Transfer of Knowledge: By fine-tuning the pre-trained BERT model on a specific task with a smaller labeled dataset, the model can leverage the knowledge and representations learned during pre-training. The pre-trained model has already learned useful features and linguistic patterns, which can be transferable to the target task. This transfer of knowledge helps the fine-tuned model perform better compared to training a model from scratch on the same task.
Generalization Capability: The fine-tuned BERT model has the ability to generalize well to new, unseen data. This is because the model has been exposed to diverse language patterns during pre-training and fine-tuning. As a result, the model can capture the nuances and context of the input texts, leading to more accurate predictions on new examples.
Capturing Task-Specific Information: During the fine-tuning process, the BERT model is adapted to the specific task by updating its parameters on the task-specific dataset. This allows the model to learn task-specific patterns, features, and decision boundaries, further enhancing its predictive capabilities.

Overall, the output of the fine-tuned BERT model is expected to be better than a normal model because it benefits from pre-training on large-scale data, transfer of knowledge, generalization capability, and task-specific adaptation. The fine-tuning process allows the model to harness the power of pre-trained language representations and apply them to specific NLP tasks, resulting in improved performance and more accurate predictions.

Conclusion

In conclusion, transfer learning with fine-tuning in Natural Language Processing (NLP) is a powerful technique that leverages pre-trained models to enhance the performance of specific NLP tasks.

Also Check:

Frequently Asked Questions (FAQs)

Q. Is BERT a transfer learning model?

Yes, BERT is a transfer learning model. The BERT model is trained on large textual data using masked language model. The BERT learns words and sentences by looking a lot of text without specific tasks in mind, this helps to understand the context and meaning of words. Then, BERT is fine-tuned for specific tasks like sentiment analysis using small dataset.

Q. What is lexical transfer in NLP?

Lexical transfer is the process of transferring knowledge of semantic features from one language to another. This helps to understand the similarities and differences between languages. The transfer involves mapping words from the source language to the target language.

Q. What is the difference between transfer learning and traditional ML?

Traditional models are trained from scratch using labeled dataset whereas the pretrained knowledge is used to train small dataset.

The traditional ML models are computational expensive and less adaptable whereas TL models are adaptable.

Suggest improvement

Variational AutoEncoders

100 Days of Machine Learning - A Complete Guide For Beginners

Share your thoughts in the comments

Getting Started with Machine Learning

Data Preprocessing

Classification & Regression

K-Nearest Neighbors (KNN)

Support Vector Machines

Decision Tree

Ensemble Learning

Generative Model

Time Series Forecasting

Clustering Algorithm

Convolutional Neural Networks

Recurrent Neural Networks

Reinforcement Learning

Model Deployment and Productionization

Advanced Topics

Transfer Learning with Fine-tuning

What is Transfer Learning?

Working of Transfer Learning

Benefits of using Transfer Learning

Fine Tuning in NLP

Using Transfer Learning for Sentiment Analysis

Importing Libraries and Dataset

Python3

Load the pre-trained model

Python3

Transfer Learning

Python3

Tokenize and Encode the Data

Python3

Fine Tuning the BERT Model

Python3

Model Predictions

Python

Python3

Conclusion

Also Check:

Frequently Asked Questions (FAQs)

Q. Is BERT a transfer learning model?

Q. What is lexical transfer in NLP?

Q. What is the difference between transfer learning and traditional ML?

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?