Open In App

Estimators Inspect the Titanic Dataset using Python

Last Updated : 09 Jan, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

The TensorFlow Estimator API is a high-level interface that simplifies the process of training and evaluating machine learning models in TensorFlow. It provides pre-built model architectures and optimization algorithms, as well as tools for input preprocessing, evaluation, and serving.

To use the Estimator API, you first need to define the model architecture and the input and output functions. The model architecture is defined using feature columns, which specify the type and shape of the input data. The input function is responsible for reading and preprocessing the training or evaluation data, and the output function is responsible for defining the loss function and evaluation metrics.

Once the model and input/output functions are defined, you can use the Estimator API to train and evaluate the model. Training the model involves providing the input function to the train() method and specifying the number of training steps. Evaluation involves providing the input and output functions to the evaluate() method and specifying the number of evaluation steps.

The Estimator API also provides tools for making predictions on new data, such as the predict() method, which allows you to pass a test dataset to the model and get predictions for each example.

Overall, the TensorFlow Estimator API is a powerful and convenient tool for training and evaluating machine learning models in TensorFlow. It provides a simple, high-level interface that handles many of the low-level details of model training and evaluation, making it easier to focus on the core machine-learning tasks.

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

  • Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
  • Matplotlib – This library is used to draw visualizations.
  • TensorFlow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.

Python3




import tensorflow as tf
import pandas as pd
from tensorflow import estimator


In this code block, the TensorFlow library is imported and given the alias tf. The panda’s library is also imported and given the alias pd. The LinearClassifier class is imported from the estimator module of TensorFlow.

Python3




# Read the Titanic dataset into a Pandas dataframe
data = pd.read_csv("titanic.csv")
  
# preprocess the dataset
data = data[data.Age.notnull()]
data = data[data.Sex.notnull()]
data = data[data.Embarked.notnull()]
data = data.drop('Cabin', axis=1)


In this code block, the Titanic dataset is read into a Pandas data frame using the read_csv function. Then, the data is preprocessed by removing rows with missing values in the Age, Sex, and Embarked columns using the not null method, and by dropping the Cabin column using the drop method.

Python3




# Define a LinearClassifier Estimator
feature_columns = [tf.feature_column.numeric_column(key="Age"),
                   tf.feature_column.numeric_column(key="Fare")]
  
model = tf.estimator.LinearClassifier(feature_columns=feature_columns)


In this code block, the LinearClassifier estimator is defined using the LinearClassifier class from TensorFlow. The feature_columns parameter specifies the input features that the model should use for training. In this case, the input features are the Age and Fare columns of the data.

Python3




def input_fn(data):
    return tf.compat.v1.estimator.inputs.pandas_input_fn(
        x=data[["Age", "Fare"]],
        y=data["Survived"],
        batch_size=100,
        num_epochs=None,
        shuffle=True)


The input_fn function is defined to return an input function that can be used to feed the data to the model for training. It takes the data as an argument and returns an input function that shuffles the data using the shuffle parameter and divides it into batches of size batch_size using the pandas_input_fn function from TensorFlow. The num_epochs parameter specifies the number of times the data should be iterated over during training, and None indicates that the data should be iterated over indefinitely.

Python3




model.train(input_fn=input_fn(data),
            steps=1000)


The training method of the model is called with the input function and the number of steps to train for as arguments. The input function is created by calling the input_fn function with the data as an argument. The steps parameter specifies the number of training steps to take. In this case, the model will be trained for 1000 steps.

Python3




def input_fn_predict(data):
  return tf.compat.v1.estimator.inputs.pandas_input_fn(
      x=data[["Age", "Fare"]],
      y=None,
      batch_size=100,
      num_epochs=1,
      shuffle=False)


The input_fn_predict function is defined to return an input function that can be used to feed the data to the model for making predictions. It is similar to the input_fn function, but it doesn’t include the labels (Survived column) and it doesn’t shuffle the data using the shuffle parameter. The num_epochs parameter specifies the number of times the data should be iterated over during prediction, and a value of 1 indicates that the data will be iterated over once.

Python3




predictions = model.predict(input_fn\
                            =input_fn_predict(data))
  
# Print the predictions
for prediction in predictions:
    print(prediction)


The prediction method of the model is called the input function for making predictions as an argument, and the resulting predictions are stored in a variable called predictions. The predictions are then printed using a loop.

Prediction:

{'logits': array([-0.6641928], dtype=float32),
 'logistic': array([0.3397984], dtype=float32),
 'probabilities': array([0.6602016 , 0.33979842], dtype=float32),
 'class_ids': array([0], dtype=int64),
 'classes': array([b'0'], dtype=object),
 'all_class_ids': array([0, 1]),
 'all_classes': array([b'0', b'1'], dtype=object)}

A prediction is a dictionary with the following keys:

  1. logits: A float value representing the raw, unnormalized prediction made by the model. 
  2. logistic: A float value representing the predicted logistic value. This value is the output of the sigmoid function applied to the model’s prediction and can be interpreted as the probability that the model assigns to the positive class (i.e. the class corresponding to survival).
  3. probabilities: A list of float values representing the predicted class probabilities. The list will have two elements, corresponding to the probabilities assigned to the positive and negative classes. The first element will be the probability of survival, and the second element will be the probability of non-survival.
  4. class_ids: A list of integers representing the predicted class labels. The list will have a single element since this is a binary classification problem. The element will be 1 if the model predicts that the passenger survived, or 0 if the model predicts that the passenger did not survive.
  5. classes: A list of strings representing the predicted class labels. The list will have a single element, which will be ‘1’ if the model predicts that the passenger survived, or ‘0’ if the model predicts that the passenger did not survive.
  6. all_class_ids: A list of integers representing all possible class labels.
  7. all_classes: A list of strings representing all possible class labels.
  • Based on this prediction, the model is predicting that the passenger did not survive (class 0) with a probability of 0.3397984.

Here are a few examples of how you might use TensorFlow’s tf.estimator API to train a neural network model on the titanic dataset using Python:

  • You could use the tf.estimator.DNNClassifier class to train a deep neural network with multiple hidden layers on the titanic dataset. This would allow you to build a model that can learn complex patterns in the data and make more accurate predictions.
  • You could use the tf.estimator.LinearClassifier class to train a logistic regression model on the titanic dataset. This would allow you to build a simple and efficient model that can be used for binary classification tasks.
  • You could use the tf.estimator.BoostedTreesClassifier class to train a gradient boosted trees model on the titanic dataset. This would allow you to build a model that can learn from the data in an iterative manner and make highly accurate predictions.

You can experiment with different model architectures that are available in the tf.estimator API to see which one works best for your specific use case.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads