Open In App

How can Tensorflow be used with Estimators to split the iris dataset?

Last Updated : 17 Apr, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

TensorFlow is an open-source machine-learning framework that has become incredibly popular in the past few years. It is widely used for building and training deep neural networks, as well as for implementing other machine learning algorithms. Estimators, on the other hand, are high-level TensorFlow APIs that can be used to simplify the process of building, training, and evaluating machine learning models.

In TensorFlow, the estimator is a high-level API that simplifies the process of building, training, evaluating, and deploying machine learning models. It provides a simple interface for working with pre-built models or building custom models while abstracting away many of the low-level details of TensorFlow.

The Iris dataset is a popular machine learning dataset that contains measurements of various characteristics of iris flowers, such as the length and width of petals and sepals. The purpose of this dataset is to classify irises into one of three species based on these measurements. The Iris dataset is often used as a reference dataset in machine learning research and is an excellent dataset to explore TensorFlow and estimators.

Here we used the dataset directly online from the UCI machine learning repository so to run this code we need an active Internet connection.

Before we begin, make sure you have TensorFlow, scikit-learn, and pandas installed on your system. You can install them using pip:

pip install tensorflow
pip install scikit-learn
pip install pandas

Importing the necessary libraries

Python3




import tensorflow as tf
import pandas as pd


 Load the iris dataset

Next, let’s load the iris dataset into a pandas DataFrame:

Python3




iris_data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']


Split the dataset into training and testing sets

The iris dataset contains 150 samples, with 50 samples for each of the three species. We can split the dataset into training and testing sets using the train_test_split function from the scikit-learn library:

Python3




from sklearn.model_selection import train_test_split
  
train_data, test_data, train_labels, test_labels = train_test_split(
    iris_data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']],
    iris_data['species'], test_size=0.2)


Now that we have split the dataset into training and testing sets.

Define the feature columns

let’s define the feature columns using the tf.feature_column API. Feature columns are used to map raw input data to a format that can be input to a TensorFlow model. In this case, we will define four feature columns for the four input features in the iris dataset:

Python3




feature_columns = [
    tf.feature_column.numeric_column('sepal_length'),
    tf.feature_column.numeric_column('sepal_width'),
    tf.feature_column.numeric_column('petal_length'),
    tf.feature_column.numeric_column('petal_width')
]


TensorFlow Estimator

Next, let’s create an Estimator object using the DNNClassifier class. This will allow us to create a deep neural network that can classify the iris flowers based on the input features:

Python3




estimator = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[10, 10],
    n_classes=3,
    model_dir='model'
)


In this case, we are creating a neural network with two hidden layers, each with 10 nodes. The n_classes parameter is set to 3, since there are three possible classes in the iris dataset. The model_dir parameter specifies the directory where the TensorFlow model will be saved.

Train and test Dataset using Tensorflow Estimators

Now, let’s define the input functions that will feed data into the Estimator. We will define two input functions, one for the training data and one for the testing data:

Python3




train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=train_data,
    y=train_labels,
    batch_size=32,
    shuffle=True
    )
  
test_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=test_data,
    y=test_labels,
    batch_size=32,
    shuffle=False
    )


The pandas_input_fn function is used to create input functions from pandas DataFrames. The batch_size parameter specifies the number of samples that will be fed to the model at once. The shuffle parameter is set to True for the training input function, which will shuffle the training data before feeding it to the model.

Train the Estimator

Now that we have defined the Estimator and the input functions, we can train the model using the train method:

Python3




estimator.train(input_fn=train_input_fn, steps=1000)


The train method trains the model using the specified input function for the specified number of steps.

Evaluation

Finally, we can evaluate the performance of the model on the testing data using the evaluate method:

Python3




eval_result = estimator.evaluate(input_fn=test_input_fn)
print(eval_result)


Output:

{'accuracy': 0.33333334, 'average_loss': 1.6798068, 'loss': 1.6798068, 'global_step': 8}

The evaluate method returns a dictionary containing various performance metrics, such as accuracy, loss. We can print out the evaluation results to see how well our model is performing on the testing data.

Complete code:

Python3




import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split
  
# Load the iris dataset
iris_data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
  
# Map string labels to integers
label_map = {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}
iris_data['label'] = iris_data['species'].map(label_map)
  
# Split the dataset into training and testing sets
train_data, test_data, train_labels, test_labels = train_test_split(
    iris_data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']],
    iris_data['label'], test_size=0.2)
  
# Define the feature columns
feature_columns = [
    tf.feature_column.numeric_column('sepal_length'),
    tf.feature_column.numeric_column('sepal_width'),
    tf.feature_column.numeric_column('petal_length'),
    tf.feature_column.numeric_column('petal_width')
]
  
# Define the Estimator
estimator = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[10, 10],
    n_classes=3,
    model_dir='model'
)
  
# Define the input functions
train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
    x=train_data,
    y=train_labels,
    batch_size=32,
    shuffle=True
)
  
test_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
    x=test_data,
    y=test_labels,
    batch_size=32,
    shuffle=False
)
  
# Train the model
estimator.train(input_fn=train_input_fn, steps=1000)
  
# Evaluate the model
eval_result = estimator.evaluate(input_fn=test_input_fn)
print(eval_result)


Output:

{'accuracy': 0.33333334, 'average_loss': 1.6798068, 'loss': 1.6798068, 'global_step': 8}


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads