Open In App

How can Tensorflow be used to download and explore the Iliad dataset using Python?

Last Updated : 27 Jun, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Tensorflow is a free open-source machine learning and artificial intelligence library widely popular for training and deploying neural networks. It is developed by Google Brain Team and supports a wide range of platforms. In this tutorial, we will learn to download, load and explore the famous Iliad dataset.

In the Iliad dataset, there are various works of different English translations of the same Homer’s Iliad text. Tensorflow has modified the documents for focusing on the examples of their work. The dataset is available at the following URL.

https://storage.googleapis.com/download.tensorflow.org/data/illiad/

Example: In the following example, we will take the works of three translators named: William Cowper, Edward, Earl of Derb, and Samuel Butler. Then with the help of TensorFlow, we will load them and classify their works with their translations.

Install the TensorFlow text package:

pip install "tensorflow-text==2.8.*"

Download and load the Iliad dataset

We need to label each dataset individually and so we use the Dataset.map function. This will return example-label pairs. 

Python3




import pathlib
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import utils
from tensorflow.keras.layers import TextVectorization
import tensorflow_datasets as tfds
import tensorflow_text as tf_text
  
print("Welcome to GeeksforGeeks")
print("Loading the Illiad dataset")
DIRECTORY_URL = 'https://storage.googleapis.com/\
download.tensorflow.org/data/illiad/'
FILE_NAMES = ['cowper.txt', 'derby.txt', 'butler.txt']
  
for name in FILE_NAMES:
   text_dir = utils.get_file(name,
                             origin=DIRECTORY_URL + name)
  
parent_dir = pathlib.Path(text_dir).parent
  
def labeler(example, index):
  return example, tf.cast(index, tf.int64)
  
labeled_data_sets = []
  
for i, file_name in enumerate(FILE_NAMES):
  lines_dataset = tf.data.TextLineDataset(str(parent_dir/file_name))
  labeled_dataset = lines_dataset.map(lambda ex: labeler(ex, i))
  labeled_data_sets.append(labeled_dataset)
labeled_data_sets


Output:

[<MapDataset element_spec=(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>,
<MapDataset element_spec=(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>,
<MapDataset element_spec=(TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>]

Concatenate and shuffle the datasets. There are concatenated using the Dataset.concatenate function. The shuffle function is used to shuffle the data. We then print out some examples.

Python3




BUFFER_SIZE = 50000
BATCH_SIZE = 64
VALIDATION_SIZE = 5000
  
all_labeled_data = labeled_data_sets[0]
for labeled_dataset in labeled_data_sets[1:]:
    all_labeled_data = all_labeled_data.concatenate(labeled_dataset)
  
all_labeled_data = all_labeled_data.shuffle(
    BUFFER_SIZE, reshuffle_each_iteration=False)
for text, label in all_labeled_data.take(5):
    print("Sentence: ", text.numpy())
    print("Label:", label.numpy())


Output:

Sentence:  b”Of brass, and color’d with a ring of gold.”
Label: 0
Sentence:  b’drove the horses in among the others.’
Label: 2
Sentence:  b’Into the boundless ether. Reaching soon’
Label: 0
Sentence:  b”Drive to the ships, for pain weigh’d down his soul.”
Label: 1
Sentence:  b”Not one is station’d to protect the camp.”
Label: 1



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads