Open In App

How can Tensorflow be used with the flower dataset to compile and fit the model?

In this article, we will learn how can we compile a model and fit the flower dataset to it. TO fit a dataset on a model we need to first create a data pipeline, create the model’s architecture using TensorFlow high-level API, and then before fitting the model on the data using data pipelines we need to compile the model with an appropriate loss function and optimizer and a metric to understand the whether the model is making progress epoch after epoch or not.

Importing Libraries

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.






import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
  
from glob import glob
from PIL import Image
from sklearn.model_selection import train_test_split
from skimage.feature import local_binary_pattern
  
import tensorflow as tf
from tensorflow import keras
from keras import layers
  
AUTO = tf.data.experimental.AUTOTUNE
import warnings
warnings.filterwarnings('ignore')

Now, let’s check the total number of images we have across all the classes of flowers. The link to the dataset is here https://www.kaggle.com/datasets/alxmamaev/flowers-recognition.




images = glob('flowers/*/*.jpg')
len(images)

Output:



4317




df = pd.DataFrame({'filepath': images})
df['label'] = df['filepath'].str.split('/', expand=True)[1]
df.head()

Output:

 




from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['encoded'] = le.fit_transform(df['label'])
df.head()

Output:

 

Let’s check the different classes present in the training data and which class has been assigned to which integer.




classes = le.classes_
classes

Output:

array(['daisy', 'dandelion', 'rose', 'sunflower', 'tulip'], dtype=object)

Data Visualization

In this section, we will try to understand and visualize some images which have been provided to us to build the classifier for each class. Also, we will check for the imbalance problem.




x = df['label'].value_counts()
plt.pie(x.values,
        labels=x.index,
        autopct='%1.1f%%')
plt.show()

Output:

A pie chart to visualize the data distribution

From the above graph, we can say that there is a little data imbalance problem in the given dataset. But handling the data balance is not an objective of this article.




for cat in df['label'].unique():
    temp = df[df['label'] == cat]
  
    index_list = temp.index
    fig, ax = plt.subplots(1, 4, figsize=(15, 5))
    fig.suptitle(f'Images for {cat} category . . . .', fontsize=20)
    for i in range(4):
        index = np.random.randint(0, len(index_list))
        index = index_list[index]
        data = df.iloc[index]
  
        image_path = data[0]
  
        img = Image.open(image_path).resize((256, 256))
        img = np.array(img)
        ax[i].imshow(img)
        ax[i].axis('off')
plt.tight_layout()
plt.show()

Output:

Some images from the training dataset




features = df['filepath']
target = df['encoded']
  
X_train, X_val,\
 Y_train, Y_val = train_test_split(features, target,
                                   test_size=0.15,
                                   random_state=10)
   
X_train.shape, X_val.shape

Output:

((3669,), (648,))

Now by using the above function we will be implementing our training data input pipeline and the validation data pipeline.




train_ds = (
    tf.data.Dataset
    .from_tensor_slices((X_train, Y_train))
    .map(decode_image, num_parallel_calls=AUTO)
    .batch(32)
    .prefetch(AUTO)
)
  
val_ds = (
    tf.data.Dataset
    .from_tensor_slices((X_val, Y_val))
    .map(decode_image, num_parallel_calls=AUTO)
    .batch(32)
    .prefetch(AUTO)
)

Model Development

We will use pre-trained weight for an Inception network which is trained on imagenet dataset. This dataset contains millions of images for around 1000 classes of images. The parameters of a model we import are already trained on millions of images and for weeks so, we do not need to train them again.




from tensorflow.keras.applications.resnet50 import ResNet50
  
pre_trained_model = ResNet50(
    input_shape = (224,224,3),
    weights = 'imagenet',
    include_top = False
)
  
for layer in pre_trained_model.layers:
  layer.trainable = False

Output:

94765736/94765736 [==============================] - 5s 0us/step

Model Architecture

We will implement a model using the  Functional API of Keras which will contain the following parts:




from tensorflow.keras import Model
  
inputs = layers.Input(shape=(224, 224, 3))
x = layers.Flatten()(inputs)
  
x = layers.Dense(256,activation='relu')(x)
x = layers.BatchNormalization()(x)
x = layers.Dense(256,activation='relu')(x)
x = layers.Dropout(0.3)(x)
x = layers.BatchNormalization()(x)
outputs = layers.Dense(5, activation='softmax')(x)
  
model = Model(inputs, outputs)

While compiling a model we provide these three essential parameters:




model.compile(
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    optimizer='adam',
    metrics=['AUC']
)

Now we are ready to train our model.




history = model.fit(train_ds,
                    validation_data=val_ds,
                    epochs=5,
                    verbose=1)

Output:

Epoch 1/5
115/115 [==============================] - 8s 60ms/step - loss: 1.5825 - auc: 0.7000 - val_loss: 1.6672 - val_auc: 0.7152
Epoch 2/5
115/115 [==============================] - 7s 59ms/step - loss: 1.3806 - auc: 0.7650 - val_loss: 1.4497 - val_auc: 0.7531
Epoch 3/5
115/115 [==============================] - 8s 68ms/step - loss: 1.2619 - auc: 0.7980 - val_loss: 1.3494 - val_auc: 0.7751
Epoch 4/5
115/115 [==============================] - 7s 58ms/step - loss: 1.1828 - auc: 0.8242 - val_loss: 1.3371 - val_auc: 0.7751
Epoch 5/5
115/115 [==============================] - 7s 60ms/step - loss: 1.0954 - auc: 0.8485 - val_loss: 1.8526 - val_auc: 0.7215

In the below code, we will create a data frame from the log obtained from the training of the model.




hist_df=pd.DataFrame(history.history)
hist_df.head()

Output:

 

Let’s visualize the training loss and the validation loss of the data.




hist_df['loss'].plot()
hist_df['val_loss'].plot()
plt.title('Loss v/s Validation Loss')
plt.legend()
plt.show()

Output:

Training loss v/s Validation loss

Let’s visualize the training AUC and the validation AUC of the data.




hist_df['auc'].plot()
hist_df['val_auc'].plot()
plt.title('AUC v/s Validation AUC')
plt.legend()
plt.show()

Output:

Training AUC v/s Validation AUC


Article Tags :