Open In App

Bidirectional LSTM in NLP

Last Updated : 08 Jun, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will first discuss bidirectional LSTMs and their architecture. We will then look into the implementation of a review system using Bidirectional LSTM. Finally, we will conclude this article while discussing the applications of bidirectional LSTM.

Bidirectional LSTM (BiLSTM)

Bidirectional LSTM or BiLSTM is a term used for a sequence model which contains two LSTM layers, one for processing input in the forward direction and the other for processing in the backward direction. It is usually used in NLP-related tasks. The intuition behind this approach is that by processing data in both directions, the model is able to better understand the relationship between sequences (e.g. knowing the following and preceding words in a sentence).

To better understand this let us see an example. The first statement is “Server can you bring me this dish” and the second statement is “He crashed the server”. In both these statements, the word server has different meanings and this relationship depends on the following and preceding words in the statement. The bidirectional LSTM helps the machine to understand this relationship better than compared with unidirectional LSTM. This ability of BiLSTM makes it a suitable architecture for tasks like sentiment analysis, text classification, and machine translation.

Architecture

The architecture of bidirectional LSTM comprises of two unidirectional LSTMs which process the sequence in both forward and backward directions. This architecture can be interpreted as having two separate LSTM networks, one gets the sequence of tokens as it is while the other gets in the reverse order. Both of these LSTM network returns a probability vector as output and the final output is the combination of both of these probabilities. It can be represented as:

p_t = p_t^f + p_t^b

where,

  • p_t    : Final probability vector of the network.
  • p_t^f    : Probability vector from the forward LSTM network.
  • p_t^b    : Probability vector from the backward LSTM network.
Bidirectional LSTM layer Architecture

Bidirectional LSTM layer Architecture

Figure 1 describes the architecture of the BiLSTM layer where X_i    is the input token, Y_i    is the output token, and A    and A'    are LSTM nodes. The final output of Y_i    is the combination of A    and A'    LSTM nodes.

Now, let us look into an implementation of a review system using BiLSTM layers in Python using the Tensorflow library. We would be performing sentiment analysis on the IMDB movie review dataset. We would implement the network from scratch and train it to identify if the review is positive or negative.

Importing Libraries and Dataset

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

  • Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
  • Matplotlib – This library is used to draw visualizations.
  • TensorFlow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.

Python3

import tensorflow as tf
import tensorflow_datasets as tfds
  
import numpy as np
import matplotlib.pyplot as plt

                    

IMDB movies review dataset is the dataset for binary sentiment classification containing 25,000 highly polar movie reviews for training, and 25,000 for testing. This dataset can be acquired from this website or we can also use the tensorflow_datasets library to acquire it.

Python3

# Obtain the imdb review dataset from tensorflow datasets
dataset = tfds.load('imdb_reviews', as_supervised=True)
  
# Seperate test and train datasets
train_dataset, test_dataset = dataset['train'], dataset['test']
  
# Split the test and train data into batches of 32
# and shuffling the training set
batch_size = 32
train_dataset = train_dataset.shuffle(10000)
train_dataset = train_dataset.batch(batch_size)
test_dataset = test_dataset.batch(batch_size)

                    

Printing a sample review and its label from the training set.

Python3

example, label = next(iter(train_dataset))
print('Text:\n', example.numpy()[0])
print('\nLabel: ', label.numpy()[0])

                    

Output:

Text:
b'Stumbling upon this HBO special late one night, I was absolutely taken by this
attractive British "executive transvestite." I have never laughed so hard over
European History or any of the other completely worthwhile point Eddie Izzard made.
I laughed so much that I woke up my mother sleeping at the other end of the house...'
Label: 1

Model Architecture

In this section, we will define the model we will use for sentiment analysis. The initial layer of this architecture is the text vectorization layer, responsible for encoding the input text into a sequence of token indices. These tokens are subsequently fed into the embedding layer, where each word is assigned a trainable vector. After enough training, these vectors tend to adjust themselves such that words with similar meanings have similar vectors. This data is then passed to Bidirectional LSTM layers which process these sequences and finally convert it to a single logit as the classification output.

We will first perform text vectorization and let the encoder map all the words in the training dataset to a token. We can also see in the example below how we can encode and decode the sample review into a vector of integers.

Python3

# Using the TextVectorization layer to normalize, split, and map strings
# to integers.
encoder = tf.keras.layers.TextVectorization(max_tokens=10000)
encoder.adapt(train_dataset.map(lambda text, _: text))
  
# Extracting the vocabulary from the TextVectorization layer.
vocabulary = np.array(encoder.get_vocabulary())
  
# Encoding a test example and decoding it back.
original_text = example.numpy()[0]
encoded_text = encoder(original_text).numpy()
decoded_text = ' '.join(vocabulary[encoded_text])
  
print('original: ', original_text)
print('encoded: ', encoded_text)
print('decoded: ', decoded_text)

                    

Output:

original: 
b'Stumbling upon this HBO special late one night, I was absolutely taken by this
attractive British "executive transvestite." I have never laughed so hard over
European History or any of the other completely worthwhile point Eddie Izzard made.
I laughed so much that I woke up my mother sleeping at the other end of the house...'
encoded:
[9085 720 11 4335 309 534 29 311 10 14 412 602 33 11
1523 683 3505 1 10 26 110 1434 38 264 126 1835 489 42
99 5 2 81 325 2601 215 1781 9352 91 10 1434 38 73
12 10 9259 58 56 462 2703 31 2 81 129 5 2 313]
decoded:
stumbling upon this hbo special late one night i was absolutely taken by this
attractive british executive [UNK] i have never laughed so hard over european history
or any of the other completely worthwhile point eddie izzard made i laughed so much
that i woke up my mother sleeping at the other end of the house

Now, we will use this trained encoder along with Bidirectional LSTM layers to define a model as discussed earlier.

We will implement a Sequential model which will contain the following parts:

  • First layer is the embedding layer used to create a embedding for the inpurt text.
  • Then bidirectional LSTM layers in the network to learn greater dependencies in the network.
  • Then we will have two fully connected layers whose final output will be teh probability of being the positive review.

Python3

# Creating the model
model = tf.keras.Sequential([
    encoder,
    tf.keras.layers.Embedding(
        len(encoder.get_vocabulary()), 64, mask_zero=True),
    tf.keras.layers.Bidirectional(
        tf.keras.layers.LSTM(64,  return_sequences=True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)
])
  
# Summary of the model
model.summary()
  
# Compile the model
model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy']
)

                    

Output:

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
text_vectorization (TextVec (None, None) 0
torization)

embedding (Embedding) (None, None, 64) 640000

bidirectional (Bidirectiona (None, None, 128) 66048
l)

bidirectional_1 (Bidirectio (None, 64) 41216
nal)

dense (Dense) (None, 64) 4160

dense_1 (Dense) (None, 1) 65

=================================================================
Total params: 751,489
Trainable params: 751,489
Non-trainable params: 0
_________________________________________________________________

Model Training

Now, we will train the model we defined in the previous step for five epochs.

Python3

# Training the model and validating it on test set
history = model.fit(
    train_dataset, 
    epochs=5,
    validation_data=test_dataset,
)

                    

Output:

Epoch 1/5
782/782 [==============================] - 1209s 2s/step - loss: 0.3657 -
accuracy: 0.8266 - val_loss: 0.3110 - val_accuracy: 0.8441
Epoch 2/5
782/782 [==============================] - 1269s 2s/step - loss: 0.2147 -
accuracy: 0.9126 - val_loss: 0.3566 - val_accuracy: 0.8590
Epoch 3/5
782/782 [==============================] - 1146s 1s/step - loss: 0.1616 -
accuracy: 0.9380 - val_loss: 0.3764 - val_accuracy: 0.8670
Epoch 4/5
782/782 [==============================] - 1963s 3s/step - loss: 0.0962 -
accuracy: 0.9647 - val_loss: 0.4271 - val_accuracy: 0.8564
Epoch 5/5
782/782 [==============================] - 1121s 1s/step - loss: 0.0573 -
accuracy: 0.9796 - val_loss: 0.5516 - val_accuracy: 0.8575

Plotting the training and validation accuracy and loss plots.

Python3

# Plotting the accuracy and loss over time
  
# Training history
history_dict = history.history
  
# Seperating validation and training accuracy
acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
  
# Seperating validation and training loss
loss = history_dict['loss']
val_loss = history_dict['val_loss']
  
# Plotting
plt.figure(figsize=(8, 4))
plt.subplot(1, 2, 1)
plt.plot(acc)
plt.plot(val_acc)
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(['Accuracy', 'Validation Accuracy'])
  
plt.subplot(1, 2, 2)
plt.plot(loss)
plt.plot(val_loss)
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(['Loss', 'Validation Loss'])
  
plt.show()

                    

Output:

The plot of training and validation accuracy and loss

The plot of training and validation accuracy and loss

Model Evaluation

Now, we will test the trained model with a random review and check its output.

Python3

# Making predictions
sample_text = (
    '''The movie by GeeksforGeeks was so good and the animation are so dope. 
    I would recommend my friends to watch it.'''
)
predictions = model.predict(np.array([sample_text]))
print(*predictions[0])
  
# Print the label based on the prediction
if predictions[0] > 0:
    print('The review is positive')
else:
    print('The review is negative')

                    

Output:

1/1 [==============================] - 0s 33ms/step
5.414222
The review is positive

Applications of BiDirectional LSTM

Some of the popular application which uses BiLSTM are sentiment analysis, text classification, text generation, and machine translation. You can also explore some of these applications in the following articles:

  1. LSTM-Based Poetry Generation Using NLP in Python
  2. Emotion Detection using Bidirectional LSTM


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads