Open In App

Text Generation using Recurrent Long Short Term Memory Network

Last Updated : 22 May, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

This article will demonstrate how to build a Text Generator by building a Recurrent Long Short Term Memory Network. The conceptual procedure of training the network is to first feed the network a mapping of each character present in the text on which the network is training to a unique number. Each character is then hot-encoded into a vector which is the required format for the network. 
The data for the described procedure was downloaded from Kaggle. This dataset contains the articles published in the New York Times from April 2017 to April 2018. separated according to the month of publication. The dataset is in the form of .csv file which contains the url of the published article along with other details. Any one random url was chosen for the training process and then on visiting this url, the text was copied into a text file and this text file was used for the training process.
Step 1: Importing the required libraries
 

Python3




from __future__ import absolute_import, division,
                       print_function, unicode_literals
  
import numpy as np
import tensorflow as tf
  
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
  
from keras.optimizers import RMSprop
  
from keras.callbacks import LambdaCallback
from keras.callbacks import ModelCheckpoint
from keras.callbacks import ReduceLROnPlateau
import random
import sys


Step 2: Loading the data into a string
 

Python3




# Changing the working location to the location of the text file
cd C:\Users\Dev\Desktop\Kaggle\New York Times
  
# Reading the text file into a string
with open('article1.txt', 'r') as file:
    text = file.read()
  
# A preview of the text file    
print(text)


Step 3: Creating a mapping from each unique character in the text to a unique number
 

Python3




# Storing all the unique characters present in the text
vocabulary = sorted(list(set(text)))
  
# Creating dictionaries to map each character to an index
char_to_indices = dict((c, i) for i, c in enumerate(vocabulary))
indices_to_char = dict((i, c) for i, c in enumerate(vocabulary))
  
print(vocabulary)


Step 4: Pre-processing the data
 

Python3




# Dividing the text into subsequences of length max_length
# So that at each time step the next max_length characters 
# are fed into the network
max_length = 100
steps = 5
sentences = []
next_chars = []
for i in range(0, len(text) - max_length, steps):
    sentences.append(text[i: i + max_length])
    next_chars.append(text[i + max_length])
      
# Hot encoding each character into a boolean vector
X = np.zeros((len(sentences), max_length, len(vocabulary)), dtype = np.bool)
y = np.zeros((len(sentences), len(vocabulary)), dtype = np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_to_indices[char]] = 1
    y[i, char_to_indices[next_chars[i]]] = 1


Step 5: Building the LSTM network
 

Python3




# Building the LSTM network for the task
model = Sequential()
model.add(LSTM(128, input_shape =(max_length, len(vocabulary))))
model.add(Dense(len(vocabulary)))
model.add(Activation('softmax'))
optimizer = RMSprop(lr = 0.01)
model.compile(loss ='categorical_crossentropy', optimizer = optimizer)


Step 6: Defining some helper functions which will be used during the training of the network
Note that the first two functions given below have been referred from the documentation of the official text generation example from the Keras team.
a) Helper function to sample the next character: 
 

Python3




# Helper function to sample an index from a probability array
def sample_index(preds, temperature = 1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


b) Helper function to generate text after each epoch
 

Python3




# Helper function to generate text after the end of each epoch
def on_epoch_end(epoch, logs):
    print()
    print('----- Generating text after Epoch: % d' % epoch)
  
    start_index = random.randint(0, len(text) - max_length - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)
  
        generated = ''
        sentence = text[start_index: start_index + max_length]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)
  
        for i in range(400):
            x_pred = np.zeros((1, max_length, len(vocabulary)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_to_indices[char]] = 1.
  
            preds = model.predict(x_pred, verbose = 0)[0]
            next_index = sample_index(preds, diversity)
            next_char = indices_to_char[next_index]
  
            generated += next_char
            sentence = sentence[1:] + next_char
  
            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()
print_callback = LambdaCallback(on_epoch_end = on_epoch_end)


c) Helper function to save the model after each epoch in which loss decreases
 

Python3




# Defining a helper function to save the model after each epoch
# in which the loss decreases
filepath = "weights.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor ='loss',
                             verbose = 1, save_best_only = True,
                             mode ='min')


d) Helper function to reduce the learning rate each time the learning plateaus
 

Python3




# Defining a helper function to reduce the learning rate each time
# the learning plateaus
reduce_alpha = ReduceLROnPlateau(monitor ='loss', factor = 0.2,
                              patience = 1, min_lr = 0.001)
callbacks = [print_callback, checkpoint, reduce_alpha]


Step 7: Training the LSTM model
 

Python3




# Training the LSTM model
model.fit(X, y, batch_size = 128, epochs = 500, callbacks = callbacks)


Step 8: Generating new and random text
 

Python3




# Defining a utility function to generate new and random text based on the
# network's learnings
def generate_text(length, diversity):
    # Get random starting text
    start_index = random.randint(0, len(text) - max_length - 1)
    generated = ''
    sentence = text[start_index: start_index + max_length]
    generated += sentence
    for i in range(length):
            x_pred = np.zeros((1, max_length, len(vocabulary)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_to_indices[char]] = 1.
  
            preds = model.predict(x_pred, verbose = 0)[0]
            next_index = sample_index(preds, diversity)
            next_char = indices_to_char[next_index]
  
            generated += next_char
            sentence = sentence[1:] + next_char
    return generated
  
print(generate_text(500, 0.2))




Previous Article
Next Article

Similar Reads

ML | Text Generation using Gated Recurrent Unit Networks
This article will demonstrate how to build a Text Generator by building a Gated Recurrent Unit Network. The conceptual procedure of training the network is to first feed the network a mapping of each character present in the text on which the network is training to a unique number. Each character is then hot-encoded into a vector which is the requi
6 min read
Long Short Term Memory Networks Explanation
Prerequisites: Recurrent Neural Networks To solve the problem of Vanishing and Exploding Gradients in a Deep Recurrent Neural Network, many variations were developed. One of the most famous of them is the Long Short Term Memory Network(LSTM). In concept, an LSTM recurrent unit tries to "remember" all the past knowledge that the network is seen so f
7 min read
Deep Learning | Introduction to Long Short Term Memory
Long Short-Term Memory is an improved version of recurrent neural network designed by Hochreiter & Schmidhuber. LSTM is well-suited for sequence prediction tasks and excels in capturing long-term dependencies. Its applications extend to tasks involving time series and sequences. LSTM's strength lies in its ability to grasp the order dependence
9 min read
Short term Memory
In the wider community of neurologists and those who are researching the brain, It is agreed that two temporarily distinct processes contribute to the acquisition and expression of brain functions. These variations can result in long-lasting alterations in neuron operations, for instance through activity-dependent changes in synaptic transmission.
5 min read
Bidirectional Recurrent Neural Network
Recurrent Neural Networks (RNNs) are a particular class of neural networks that was created with the express purpose of processing sequential input, including speech, text, and time series data. RNNs process data as a sequence of vectors rather than feedforward neural networks, which process data as a fixed-length vector. Each vector is processed d
9 min read
Introduction to Recurrent Neural Network
In this article, we will introduce a new variation of neural network which is the Recurrent Neural Network also known as (RNN) that works better than a simple neural network when data is sequential like Time-Series data and text data. What is Recurrent Neural Network (RNN)?Recurrent Neural Network(RNN) is a type of Neural Network where the output f
20 min read
Recurrent Neural Networks Explanation
Today, different Machine Learning techniques are used to handle different types of data. One of the most difficult types of data to handle and the forecast is sequential data. Sequential data is different from other types of data in the sense that while all the features of a typical dataset can be assumed to be order-independent, this cannot be ass
8 min read
Gated Recurrent Unit Networks
Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) that was introduced by Cho et al. in 2014 as a simpler alternative to Long Short-Term Memory (LSTM) networks. Like LSTM, GRU can process sequential data such as text, speech, and time-series data. The basic idea behind GRU is to use gating mechanisms to selectively update the hi
8 min read
Types of Recurrent Neural Networks (RNN) in Tensorflow
Recurrent neural network (RNN) is more like Artificial Neural Networks (ANN) that are mostly employed in speech recognition and natural language processing (NLP). Deep learning and the construction of models that mimic the activity of neurons in the human brain uses RNN. Text, genomes, handwriting, the spoken word, and numerical time series data fr
2 min read
Difference Between Feed-Forward Neural Networks and Recurrent Neural Networks
Pre-requisites: Artificial Neural Networks and its Applications Neural networks are artificial systems that were inspired by biological neural networks. These systems learn to perform tasks by being exposed to various datasets and examples without any task-specific rules. In this article, we will see the difference between Feed-Forward Neural Netwo
2 min read