Bidirectional LSTM in NLP

Last Updated : 08 Jun, 2023

In this article, we will first discuss bidirectional LSTMs and their architecture. We will then look into the implementation of a review system using Bidirectional LSTM. Finally, we will conclude this article while discussing the applications of bidirectional LSTM.

Bidirectional LSTM (BiLSTM)

Bidirectional LSTM or BiLSTM is a term used for a sequence model which contains two LSTM layers, one for processing input in the forward direction and the other for processing in the backward direction. It is usually used in NLP-related tasks. The intuition behind this approach is that by processing data in both directions, the model is able to better understand the relationship between sequences (e.g. knowing the following and preceding words in a sentence).

To better understand this let us see an example. The first statement is “Server can you bring me this dish” and the second statement is “He crashed the server”. In both these statements, the word server has different meanings and this relationship depends on the following and preceding words in the statement. The bidirectional LSTM helps the machine to understand this relationship better than compared with unidirectional LSTM. This ability of BiLSTM makes it a suitable architecture for tasks like sentiment analysis, text classification, and machine translation.

Architecture

The architecture of bidirectional LSTM comprises of two unidirectional LSTMs which process the sequence in both forward and backward directions. This architecture can be interpreted as having two separate LSTM networks, one gets the sequence of tokens as it is while the other gets in the reverse order. Both of these LSTM network returns a probability vector as output and the final output is the combination of both of these probabilities. It can be represented as:

$p_t = p_t^f + p_t^b$

where,

$p_t$ : Final probability vector of the network.
$p_t^f$ : Probability vector from the forward LSTM network.
$p_t^b$ : Probability vector from the backward LSTM network.

Bidirectional LSTM layer Architecture

Figure 1 describes the architecture of the BiLSTM layer where $X_i$ is the input token, $Y_i$ is the output token, and $A$ and $A'$ are LSTM nodes. The final output of $Y_i$ is the combination of $A$ and $A'$ LSTM nodes.

Now, let us look into an implementation of a review system using BiLSTM layers in Python using the Tensorflow library. We would be performing sentiment analysis on the IMDB movie review dataset. We would implement the network from scratch and train it to identify if the review is positive or negative.

Importing Libraries and Dataset

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
Matplotlib – This library is used to draw visualizations.
TensorFlow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.

Python3

import tensorflow as tf 
import tensorflow_datasets as tfds 
  
import numpy as np 
import matplotlib.pyplot as plt 

IMDB movies review dataset is the dataset for binary sentiment classification containing 25,000 highly polar movie reviews for training, and 25,000 for testing. This dataset can be acquired from this website or we can also use the tensorflow_datasets library to acquire it.

Python3

# Obtain the imdb review dataset from tensorflow datasets 
dataset = tfds.load('imdb_reviews', as_supervised=True) 
  
# Seperate test and train datasets 
train_dataset, test_dataset = dataset['train'], dataset['test'] 
  
# Split the test and train data into batches of 32 
# and shuffling the training set 
batch_size = 32
train_dataset = train_dataset.shuffle(10000) 
train_dataset = train_dataset.batch(batch_size) 
test_dataset = test_dataset.batch(batch_size) 

Printing a sample review and its label from the training set.

Python3

example, label = next(iter(train_dataset)) 
print('Text:\n', example.numpy()[0]) 
print('\nLabel: ', label.numpy()[0]) 

Output:

Text:
 b'Stumbling upon this HBO special late one night, I was absolutely taken by this 
 attractive British "executive transvestite." I have never laughed so hard over 
 European History or any of the other completely worthwhile point Eddie Izzard made.
  I laughed so much that I woke up my mother sleeping at the other end of the house...'
Label:  1

Model Architecture

In this section, we will define the model we will use for sentiment analysis. The initial layer of this architecture is the text vectorization layer, responsible for encoding the input text into a sequence of token indices. These tokens are subsequently fed into the embedding layer, where each word is assigned a trainable vector. After enough training, these vectors tend to adjust themselves such that words with similar meanings have similar vectors. This data is then passed to Bidirectional LSTM layers which process these sequences and finally convert it to a single logit as the classification output.

We will first perform text vectorization and let the encoder map all the words in the training dataset to a token. We can also see in the example below how we can encode and decode the sample review into a vector of integers.

Python3

# Using the TextVectorization layer to normalize, split, and map strings 
# to integers. 
encoder = tf.keras.layers.TextVectorization(max_tokens=10000) 
encoder.adapt(train_dataset.map(lambda text, _: text)) 
  
# Extracting the vocabulary from the TextVectorization layer. 
vocabulary = np.array(encoder.get_vocabulary()) 
  
# Encoding a test example and decoding it back. 
original_text = example.numpy()[0] 
encoded_text = encoder(original_text).numpy() 
decoded_text = ' '.join(vocabulary[encoded_text]) 
  
print('original: ', original_text) 
print('encoded: ', encoded_text) 
print('decoded: ', decoded_text)

Output:

original: 
 b'Stumbling upon this HBO special late one night, I was absolutely taken by this 
 attractive British "executive transvestite." I have never laughed so hard over 
 European History or any of the other completely worthwhile point Eddie Izzard made. 
 I laughed so much that I woke up my mother sleeping at the other end of the house...'
encoded: 
 [9085  720   11 4335  309  534   29  311   10   14  412  602   33   11
 1523  683 3505    1   10   26  110 1434   38  264  126 1835  489   42
   99    5    2   81  325 2601  215 1781 9352   91   10 1434   38   73
   12   10 9259   58   56  462 2703   31    2   81  129    5    2  313]
decoded: 
 stumbling upon this hbo special late one night i was absolutely taken by this 
 attractive british executive [UNK] i have never laughed so hard over european history
  or any of the other completely worthwhile point eddie izzard made i laughed so much 
  that i woke up my mother sleeping at the other end of the house

Now, we will use this trained encoder along with Bidirectional LSTM layers to define a model as discussed earlier.

We will implement a Sequential model which will contain the following parts:

First layer is the embedding layer used to create a embedding for the inpurt text.
Then bidirectional LSTM layers in the network to learn greater dependencies in the network.
Then we will have two fully connected layers whose final output will be teh probability of being the positive review.

Python3

# Creating the model 
model = tf.keras.Sequential([ 
    encoder, 
    tf.keras.layers.Embedding( 
        len(encoder.get_vocabulary()), 64, mask_zero=True), 
    tf.keras.layers.Bidirectional( 
        tf.keras.layers.LSTM(64,  return_sequences=True)), 
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)), 
    tf.keras.layers.Dense(64, activation='relu'), 
    tf.keras.layers.Dense(1) 
]) 
  
# Summary of the model 
model.summary() 
  
# Compile the model 
model.compile( 
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), 
    optimizer=tf.keras.optimizers.Adam(), 
    metrics=['accuracy'] 
)

Output:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 text_vectorization (TextVec  (None, None)             0         
 torization)                                                     
                                                                 
 embedding (Embedding)       (None, None, 64)          640000    
                                                                 
 bidirectional (Bidirectiona  (None, None, 128)        66048     
 l)                                                              
                                                                 
 bidirectional_1 (Bidirectio  (None, 64)               41216     
 nal)                                                            
                                                                 
 dense (Dense)               (None, 64)                4160      
                                                                 
 dense_1 (Dense)             (None, 1)                 65        
                                                                 
=================================================================
Total params: 751,489
Trainable params: 751,489
Non-trainable params: 0
_________________________________________________________________

Model Training

Now, we will train the model we defined in the previous step for five epochs.

Python3

# Training the model and validating it on test set 
history = model.fit( 
    train_dataset,  
    epochs=5, 
    validation_data=test_dataset, 
)

Output:

Epoch 1/5
782/782 [==============================] - 1209s 2s/step - loss: 0.3657 - 
accuracy: 0.8266 - val_loss: 0.3110 - val_accuracy: 0.8441
Epoch 2/5
782/782 [==============================] - 1269s 2s/step - loss: 0.2147 - 
accuracy: 0.9126 - val_loss: 0.3566 - val_accuracy: 0.8590
Epoch 3/5
782/782 [==============================] - 1146s 1s/step - loss: 0.1616 - 
accuracy: 0.9380 - val_loss: 0.3764 - val_accuracy: 0.8670
Epoch 4/5
782/782 [==============================] - 1963s 3s/step - loss: 0.0962 - 
accuracy: 0.9647 - val_loss: 0.4271 - val_accuracy: 0.8564
Epoch 5/5
782/782 [==============================] - 1121s 1s/step - loss: 0.0573 - 
accuracy: 0.9796 - val_loss: 0.5516 - val_accuracy: 0.8575

Plotting the training and validation accuracy and loss plots.

Python3

# Plotting the accuracy and loss over time 
  
# Training history 
history_dict = history.history 
  
# Seperating validation and training accuracy 
acc = history_dict['accuracy'] 
val_acc = history_dict['val_accuracy'] 
  
# Seperating validation and training loss 
loss = history_dict['loss'] 
val_loss = history_dict['val_loss'] 
  
# Plotting 
plt.figure(figsize=(8, 4)) 
plt.subplot(1, 2, 1) 
plt.plot(acc) 
plt.plot(val_acc) 
plt.title('Training and Validation Accuracy') 
plt.xlabel('Epochs') 
plt.ylabel('Accuracy') 
plt.legend(['Accuracy', 'Validation Accuracy']) 
  
plt.subplot(1, 2, 2) 
plt.plot(loss) 
plt.plot(val_loss) 
plt.title('Training and Validation Loss') 
plt.xlabel('Epochs') 
plt.ylabel('Loss') 
plt.legend(['Loss', 'Validation Loss']) 
  
plt.show()

Output:

The plot of training and validation accuracy and loss

Model Evaluation

Now, we will test the trained model with a random review and check its output.

Python3

# Making predictions 
sample_text = ( 
    '''The movie by GeeksforGeeks was so good and the animation are so dope.  
    I would recommend my friends to watch it.'''
) 
predictions = model.predict(np.array([sample_text])) 
print(*predictions[0]) 
  
# Print the label based on the prediction 
if predictions[0] > 0: 
    print('The review is positive') 
else: 
    print('The review is negative')

Output:

1/1 [==============================] - 0s 33ms/step
5.414222
The review is positive

Applications of BiDirectional LSTM

Some of the popular application which uses BiLSTM are sentiment analysis, text classification, text generation, and machine translation. You can also explore some of these applications in the following articles:

Suggest improvement

Bidirectional RNNs in NLP

Share your thoughts in the comments

Bidirectional LSTM in NLP

Bidirectional LSTM (BiLSTM)

Architecture

Importing Libraries and Dataset

Python3

Python3

Python3

Model Architecture

Python3

Python3

Model Training

Python3

Python3

Model Evaluation

Python3

Applications of BiDirectional LSTM

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?