How can Tensorflow be used to split the flower dataset into training and validation?

Last Updated : 12 Dec, 2022

The Tensorflow flower dataset is a large dataset that consists of flower images. In this article, we are going to see how we can split the flower dataset into training and validation sets. For the purposes of this article, we will use tensorflow_datasets to load the dataset. It is a library of public datasets ready to use with TensorFlow in Python.

To import the flower dataset, we are going to use the tfds.load() method. It is used to load the named dataset, which is provided using the name argument, into a tf.data.Dataset. The name for the flower dataset is tf_flowers. In the method, we also split the dataset using the split argument with training_set taking 70% of the dataset and the rest going to test_set.

Loading the Flower Dataset with Initial Splitting

By using tensorflow_datasets we can load some of the standard datasets for training and testing the model’s architecture. and performance. It has a load() function which contains multiple attributes which come in handy.

Syntax:

tensorflow_datasets.load(name, split, batch_size, shuffle_files, with_info)

where,

name – Here you need to provide the name of the dataset you would like to load.

Split – Optional parameter which you can define if any initial splitting of the dataset is required.

batch_size – This can form batches of images of the desired size.

shuffle_files – Default is false but you can pass true if you want to shuffle the data.

with_info – Default is false but can return configuration of the dataset if setted to True.

Python3

import tensorflow_datasets as tfds
(training_set, test_set), info = tfds.load(
    'tf_flowers',
    split=['train[:70%]', 'train[70%:]'],
    with_info=True,
    as_supervised=True,
)

Output:

Downloading and preparing dataset 218.21 MiB 
(download: 218.21 MiB, generated: 221.83 MiB, total: 440.05 MiB)
to ~/tensorflow_datasets/tf_flowers/3.0.1...
Dl Completed...: 100%
5/5 [00:02<00:00, 2.43 file/s]
Dataset tf_flowers downloaded and prepared to 
~/tensorflow_datasets/tf_flowers/3.0.1. Subsequent calls will reuse this data.

If we print the information provided for the dataset by Tensorflow using the print command, we will get the following output.

Python3

print(info)

Output:

Now, let’s print the sizes of the training and test sets. The following piece of codes so:

Python3

print("Training Set Size: %d" % training_set.cardinality().numpy())
print("Test Set Size: %d" % test_set.cardinality().numpy())

Output:

Training Set Size: 2569
Test Set Size: 1101

Now, let’s split the dataset into a validation set as well. I will be partitioning the dataset into 70:15:15 fashion with 70% going to the training set and the rest equally divided among the validation set and test set. We have already split the dataset at the time of loading. At the moment training set has 70% of the dataset and the rest hence is with the test set. So we just need to split the test 50:50 into the validation set and the test set.

Using the Take and Skip method for further Splitting

We will be using take-and-skip methods to split the dataset. The tf.data.Dataset.take() method includes the first n number of images or data entries and the tf.data.Dataset.skip() method includes all others after the first n entries.

Python3

validation_size = int(0.5 * test_set.cardinality().numpy())
validation_set = test_set.take(validation_size)
test_set = test_set.skip(validation_size)

Now, let’s print the new sizes.

Python3

print("Training Set Size: %d" % training_set.cardinality().numpy())
print("Validation Set Size: %d" % validation_set.cardinality().numpy())
print("Test Set Size: %d" % test_set.cardinality().numpy())

Output:

Training Set Size: 2569
Validation Set Size: 550
Test Set Size: 551

Suggest improvement

Calculate Time Difference in Python

Pandas - Create Test and Train Samples from DataFrame

Share your thoughts in the comments

How can Tensorflow be used to split the flower dataset into training and validation?

Loading the Flower Dataset with Initial Splitting

Python3

Python3

Python3

Using the Take and Skip method for further Splitting

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?