Open In App

How can Tensorflow be used to configure the dataset for performance?

Last Updated : 21 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Tensorflow is a popular open-source platform for building and training machine learning models. It provides several techniques for loading and preparing the dataset to get the best performance out of the model. The correct configuration of the dataset is crucial for the overall performance of the model, so it is important to choose the right techniques for preparing the dataset for Tensorflow.

Steps:

  1. Split the dataset into training, validation, and test sets: It is important to split the dataset into training, validation, and test sets. The training set is used to train the model, the validation set is used to validate the model, and the test set is used to evaluate the model’s performance.
  2. Data pre-processing: Pre-processing is crucial in machine learning as it helps to eliminate any noise, missing values, or outliers in the data. In Tensorflow, the tf.data module provides several methods to pre-process the data. For example, you can use the map method to apply a pre-processing function to each element in the dataset.
  3. Batching and shuffling the data: Batching the data into smaller portions improves the performance of the model as it reduces memory usage. Shuffling the data helps to randomize the data, making the model less prone to overfitting. In Tensorflow, the batch and shuffle methods are used to batch and shuffle the data.
  4. Augmenting the data: Augmenting the data refers to creating new samples from the existing data by applying various transformations such as rotations, translations, etc. In Tensorflow, the map method can be used to apply augmentations to the dataset.

Steps to use TensorFlow to configure a dataset for performance:

1. Import TensorFlow

 The first step is to import TensorFlow and other required libraries in your Python script.

Python3




import tensorflow as tf
import numpy as np


import tensorflow as tf is a statement in Python that imports the TensorFlow library into your program and assigns the alias tf to it. This means that, in your code, you can refer to TensorFlow as tf.

Numpy is a library in Python that provides support for arrays and matrices. The statement import numpy as np imports the Numpy library and assigns the alias np to it.

2. Load the dataset

The next step is to load the dataset into TensorFlow. You can use the TensorFlow Dataset API to load your dataset. You can also use the np.load method to load the NumPy arrays into TensorFlow.

Python3




# Load the dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
  
# Create TensorFlow Dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))


The code is used to load the MNIST dataset and create TensorFlow Datasets from the loaded data.

  • The mnist.load_data() function is used to load the MNIST dataset which is pre-loaded in TensorFlow. It returns the training and testing data, i.e., (x_train, y_train) and (x_test, y_test) respectively. The x data represents the images while the y data represents the corresponding labels of the images.
  • The tf.data.Dataset.from_tensor_slices() function is used to create a TensorFlow Dataset from the loaded data. Here, the function is used to create two datasets, one for training data (train_dataset) and one for testing data (test_dataset). This function takes the training or testing data as an argument, in this case (x_train, y_train) or (x_test, y_test) respectively, and creates a TensorFlow Dataset object from it. This TensorFlow Dataset can then be used to feed data into a machine-learning model for training or evaluation.

3. Preprocess the dataset:

The next step is to preprocess the dataset. You can use the map method to apply a function to each element of the dataset. You can also use the batch method to batch the elements of the dataset.

Python3




# Preprocess the dataset
def preprocess(x, y):
    x = tf.cast(x, tf.float32) / 255.0
    y = tf.cast(y, tf.int64)
    return x, y
  
  
train_dataset = train_dataset.map(preprocess).batch(32)
test_dataset = test_dataset.map(preprocess).batch(32)


  • The code snippet above performs preprocessing on the training and testing datasets. The preprocessing step is essential for most machine learning models to work effectively as it helps to standardize the data, improve the performance of the model and prevent overfitting.
  • The preprocess function takes two arguments x, and y which represent the input features and the labels of the dataset respectively. The function first casts the input features x to float32 data type and normalizes them by dividing by 255.0 to scale the values between 0 and 1. The labels y are cast to the int64 data type. This is done to ensure that the data types are consistent and to make it easier for the model to learn the relationships between the input and the output.
  • The map function is applied to both the training and testing datasets. It applies the preprocess function to each element in the datasets, which in this case are the training and testing samples. The batch function is then applied to each of the datasets to group the elements into batches of size 32. Batching helps to make the model training process more efficient as it reduces memory usage and speeds up the computation.

4. Train the model

Finally, you can use the preprocessed dataset to train your model.

Python3




# Create the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Reshape(target_shape=(28 * 28,), input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10)
])
  
# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(0.001),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(
                  from_logits=True),
              metrics=['accuracy'])
  
# Train the model
history = model.fit(train_dataset.repeat(), epochs=10, steps_per_epoch=500,
                    validation_data=test_dataset.repeat(), validation_steps=2)


The code uses functions from TensorFlow’s tf.keras module, which is a high-level API for building and training deep learning models in TensorFlow. Here’s what each function does in detail:

  • tf.keras.datasets.mnist.load_data() – This function loads the MNIST dataset, which consists of images of handwritten digits and their corresponding labels. It returns two tuples of numpy arrays, one for training data and one for testing data.
  • tf.data.Dataset.from_tensor_slices() – This function creates a TensorFlow Dataset object from numpy arrays. It is used to create train_dataset and test_dataset from the training and testing data respectively.
  • map(preprocess) – This function applies the preprocess function to each element in the dataset. The preprocess function is defined to cast the input images to a float type with values between 0 and 1, and the labels to an integer type. This is a common preprocessing step for image classification problems.
  • batch(32) – This function groups elements of the dataset into batches of 32. This is done to increase the efficiency of training and reduce memory usage.
  • tf.keras.models.Sequential – This class creates a sequential model, which is a linear stack of layers. It is used to define the model in this code.
  • tf.keras.layers.Reshape – This layer reshapes the input data into a tensor with the specified shape. In this code, it is used to flatten the 28×28 input images into a 1D tensor with 28 * 28 = 784 elements.
  • tf.keras.layers.Dense – This layer is a fully-connected layer that applies a dense matrix multiplication to the input tensor, followed by a bias offset. In this code, it is used to define three dense layers with 128, 64, and 10 neurons respectively. The activation function for the first two layers is ‘relu’, which stands for rectified linear unit. The activation function for the last layer is the default linear function.
  • compile() – This method is used to compile the model before training. It requires the following arguments:
  1. optimizer – the optimizer used to update the model weights during training. In this code, tf.keras.optimizers.Adam is used with a learning rate of 0.001.
  2. loss – the loss function used to measure how well the model is doing. In this code, tf.keras.losses.SparseCategoricalCrossentropy is used, which is a loss function for multiclass classification problems.
  3. metrics – a list of metrics to be evaluated during training and testing. In this code, only the accuracy is used.
     
  • fit() – This method is used to fit the model to the training data. It requires the following arguments:
  1. train_dataset.repeat() – the training data. The repeat() function is used to repeat the dataset for multiple epochs.
  2. epochs – the number of training epochs. In this code, it is set to 10.
  3. steps_per_epoch – the number of training steps in each epoch. In this code

The complete code is given below:

Python3




# import modules
import tensorflow as tf
import numpy as np
  
# Load the dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
  
# Create TensorFlow Dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))
  
# Preprocess the dataset
def preprocess(x, y):
    x = tf.cast(x, tf.float32) / 255.0
    y = tf.cast(y, tf.int64)
    return x, y
  
train_dataset = train_dataset.map(preprocess).batch(32)
test_dataset = test_dataset.map(preprocess).batch(32)
  
# Create the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Reshape(target_shape=(28 * 28,), input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10)
])
  
# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(0.001),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(
                  from_logits=True),
              metrics=['accuracy'])
  
# Train the model
history = model.fit(train_dataset.repeat(), epochs=10, steps_per_epoch=500,
                    validation_data=test_dataset.repeat(), validation_steps=2)


Output:

 



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads