Open In App

How can Tensorflow be used to split the flower dataset into training and validation?

Last Updated : 12 Dec, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

The Tensorflow flower dataset is a large dataset that consists of flower images. In this article, we are going to see how we can split the flower dataset into training and validation sets. For the purposes of this article, we will use tensorflow_datasets to load the dataset.  It is a library of public datasets ready to use with TensorFlow in Python.

To import the flower dataset, we are going to use the tfds.load() method. It is used to load the named dataset, which is provided using the name argument, into a tf.data.Dataset. The name for the flower dataset is tf_flowers. In the method, we also split the dataset using the split argument with training_set taking 70% of the dataset and the rest going to test_set.

Loading the Flower Dataset with Initial Splitting

By using tensorflow_datasets we can load some of the standard datasets for training and testing the model’s architecture. and performance. It has a load() function which contains multiple attributes which come in handy.

Syntax:

tensorflow_datasets.load(name, split, batch_size, shuffle_files, with_info)

where,

  • name – Here you need to provide the name of the dataset you would like to load.
  • Split – Optional parameter which you can define if any initial splitting of the dataset is required.
  • batch_size – This can form batches of images of the desired size.
  • shuffle_files – Default is false but you can pass true if you want to shuffle the data.
  • with_info – Default is false but can return configuration of the dataset if setted to True.

Python3




import tensorflow_datasets as tfds
(training_set, test_set), info = tfds.load(
    'tf_flowers',
    split=['train[:70%]', 'train[70%:]'],
    with_info=True,
    as_supervised=True,
)


Output:

Downloading and preparing dataset 218.21 MiB 
(download: 218.21 MiB, generated: 221.83 MiB, total: 440.05 MiB)
to ~/tensorflow_datasets/tf_flowers/3.0.1...
Dl Completed...: 100%
5/5 [00:02<00:00, 2.43 file/s]
Dataset tf_flowers downloaded and prepared to 
~/tensorflow_datasets/tf_flowers/3.0.1. Subsequent calls will reuse this data.

If we print the information provided for the dataset by Tensorflow using the print command, we will get the following output.

Python3




print(info)


Output:

 

Now, let’s print the sizes of the training and test sets. The following piece of codes so:

Python3




print("Training Set Size: %d" % training_set.cardinality().numpy())
print("Test Set Size: %d" % test_set.cardinality().numpy())


Output:

Training Set Size: 2569
Test Set Size: 1101

Now, let’s split the dataset into a validation set as well. I will be partitioning the dataset into 70:15:15 fashion with 70% going to the training set and the rest equally divided among the validation set and test set. We have already split the dataset at the time of loading. At the moment training set has 70% of the dataset and the rest hence is with the test set. So we just need to split the test 50:50 into the validation set and the test set.

Using the Take and Skip method for further Splitting

We will be using take-and-skip methods to split the dataset. The tf.data.Dataset.take() method includes the first n number of images or data entries and the tf.data.Dataset.skip() method includes all others after the first n entries.

Python3




validation_size = int(0.5 * test_set.cardinality().numpy())
validation_set = test_set.take(validation_size)
test_set = test_set.skip(validation_size)


Now, let’s print the new sizes.

Python3




print("Training Set Size: %d" % training_set.cardinality().numpy())
print("Validation Set Size: %d" % validation_set.cardinality().numpy())
print("Test Set Size: %d" % test_set.cardinality().numpy())


Output:

Training Set Size: 2569
Validation Set Size: 550
Test Set Size: 551


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads