Python – Model Deployment Using TensorFlow Serving

The most important part of the machine learning pipeline is the model deployment. Model Deployment means Deployment is the method by which you integrate a machine learning model into an existing production environment to allow it to use for practical purposes in real-time.

There are many ways to deploy a model. One way is to integrate a model with Django/Flask application with a script that takes input, load the model, and generates results. So, we can easily pass image data to the model and display results after the model generates the output. Below is the limitation of the above method:

Depending upon the size of the model, it will take some time to process input and generate results.
The model cannot be used in other applications. (Consider, we do not write any REST/gRPC API).
I/O processing is slow in Flask as compared to Node.
Training the model is also resource-intensive and time-consuming (Since it requires a lot of I/O and computation.

The other way is to deploy a model using TensorFlow serving. Since it also provides API (in form of REST and gRPC), so it is portable and can be used in different devices by using its API. It is easy to deploy and works well even for larger models.

Advantages of TensorFlow Serving:

Part of TensorFlow Extended (TFX) ecosystem.
Works well for large models (up to 2 GB).
Provides consistent API structures for the RESTful and gRPC client requests.
Can manage model versioning.
Used internally at Google

RESTful API:

TensorFlow Serving supports two types of client request format in the form of RESTful API.

Classify and Regress API
Predict API (For Prediction task)

Here, we will use predict API, the URL format for this will be:

POST http://{host}:{port}/v1/models/${MODEL_NAME}[/versions/${VERSION}|/labels/${LABEL}]:predict

and the request body contains a JSON object in the form of :

{
  // (Optional) Serving signature to use.
  // default : 'serving-default'
  "signature_name": <string>,

 // Instance : for row format (list, array etc.), inputs: for columns format.
 // can have any one of them
  "instances": <value>|<(nested)list>|<list-of-objects>
  "inputs": <value>|<(nested)list>|<object>
}

gRPC API:

To use gRPC API, we install a package call tensorflow-serving-api using pip. More details about gRPC API endpoint are provided in code.

Implementation:

We will demonstrate the ability of TensorFlow Serving. First, we import (or install) the necessary modules, then we will train the model on CIFAR 10 dataset to 100 epochs. For production uses we can save this file as train.py

Code:

python3

# General import

!pip install -Uq grpcio==1.26.0

import numpy as np

import matplotlib.pyplot as plt

import os

import subprocess

import requests

import json
 
# TensorFlow Imports

from tensorflow import keras

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten,Dense, Dropout

from tensorflow.keras.models import Sequential,save_model

from tensorflow.keras.optimizers import SGD

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.datasets import cifar10
 
class_names =["airplane","automobile","bird","cat","deer","dog",

              "frog","horse", "ship","truck"]
# load  and preprocessdataset

def load_and_preprocess():

  (x_train, y_train), (x_test,y_test) = cifar_10.load_data()

  y_train  = to_categorical(y_train)

  y_test  = to_categorical(y_test)

  x_train = x_train.astype('float32')

  x_test = x_test.astype('float32')

  x_train = x_train/255

  x_test = x_test/255

  return (x_train, y_train), (x_test,y_test)
 
# define model architecture

def get_model():

    model  = Sequential([

      Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),

      Conv2D(32, (3, 3), activation='relu', padding='same'),

      MaxPooling2D((2, 2)),

      Dropout(0.2),

      Conv2D(64, (3, 3), activation='relu', padding='same'),

      Conv2D(64, (3, 3), activation='relu', padding='same'),

      MaxPooling2D((2, 2)),

      Dropout(0.2),

      Flatten(),

      Dense(64, activation='relu'),

      Dense(10, activation='softmax')

    ])
 
    model.compile(

      optimizer=SGD(learning_rate= 0.01 , momentum=0.1), 

      loss='categorical_crossentropy',

      metrics=['accuracy']

    )

    model.summary()

    return model
# train model

model = get_model()
model.fit(

    x_train,

    y_train,

    epochs=100,

    validation_data=(x_test, y_test),

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_4 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 32, 32, 32)        9248      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 16, 16, 64)        18496     
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 8, 8, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 4096)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                262208    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                650       
=================================================================
Total params: 328,426
Trainable params: 328,426
Non-trainable params: 0
_________________________________________________________________
Epoch 1/100
1563/1563 [==============================] - 7s 
5ms/step - loss: 2.0344 - accuracy: 0.2537 - val_loss: 1.7737 - val_accuracy: 0.3691
Epoch 2/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 1.6704 - accuracy: 0.4036 - val_loss: 1.5645 - val_accuracy: 0.4289
Epoch 3/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 1.4688 - accuracy: 0.4723 - val_loss: 1.3854 - val_accuracy: 0.4999
Epoch 4/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 1.3209 - accuracy: 0.5288 - val_loss: 1.2357 - val_accuracy: 0.5540
Epoch 5/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 1.2046 - accuracy: 0.5699 - val_loss: 1.1413 - val_accuracy: 0.5935
Epoch 6/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 1.1088 - accuracy: 0.6082 - val_loss: 1.2331 - val_accuracy: 0.5572
Epoch 7/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 1.0248 - accuracy: 0.6373 - val_loss: 1.0139 - val_accuracy: 0.6389
Epoch 8/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.9613 - accuracy: 0.6605 - val_loss: 0.9723 - val_accuracy: 0.6577
.
.
.
.
.

Epoch 90/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0775 - accuracy: 0.9734 - val_loss: 1.3356 - val_accuracy: 0.7473
Epoch 91/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0739 - accuracy: 0.9740 - val_loss: 1.2990 - val_accuracy: 0.7681
Epoch 92/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0743 - accuracy: 0.9739 - val_loss: 1.2629 - val_accuracy: 0.7655
Epoch 93/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0740 - accuracy: 0.9743 - val_loss: 1.3276 - val_accuracy: 0.7635
Epoch 94/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0724 - accuracy: 0.9746 - val_loss: 1.3179 - val_accuracy: 0.7656
Epoch 95/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0737 - accuracy: 0.9740 - val_loss: 1.3039 - val_accuracy: 0.7677
Epoch 96/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0736 - accuracy: 0.9734 - val_loss: 1.3243 - val_accuracy: 0.7653
Epoch 97/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0704 - accuracy: 0.9756 - val_loss: 1.3264 - val_accuracy: 0.7660
Epoch 98/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0693 - accuracy: 0.9757 - val_loss: 1.3284 - val_accuracy: 0.7658
Epoch 99/100
1563/1563 [==============================] - 7s 
4ms/step - loss: 0.0668 - accuracy: 0.9764 - val_loss: 1.3649 - val_accuracy: 0.7636
Epoch 100/100
1563/1563 [==============================] - 7s 
5ms/step - loss: 0.0710 - accuracy: 0.9749 - val_loss: 1.3206 - val_accuracy: 0.7682
<tensorflow.python.keras.callbacks.History at 0x7f36a042e7f0>

Then we save the model in the temp folder using TensorFlow save_model() and export it to Tar Gz for downloading.

Code:

python3

import tempfile
 
MODEL_DIR = tempfile.gettempdir()

version = 1

export_path = os.path.join(MODEL_DIR, str(version))

print('export_path = {}\n'.format(export_path))
 
save_model(

    model,

    export_path,

    overwrite=True,

    include_optimizer=True
)
 
print('\nSaved model:')

!ls -l {export_path}
 
# The command display input and output kayers with signature and data type 
# These details are required when we make gRPC API call

!saved_model_cli show --dir {export_path} --all
 
# Create a compressed model from the savedmodel  .

!tar -cz -f model.tar.gz --owner=0 --group=0 -C /tmp/1/ .

Now, We will host the model using TensorFlow Serving, we will demonstrate the hosting using two methods.
First, we will take advantage of colab environment and install TensorFlow Serving in that environment
Then, we will use the docker environment to host the model and use both gRPC and REST API to call the model and get predictions. Now, we will implement the first method.

Code:

python3

# Install TensorFlow Serving using Aptitude [For Debian]

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt"

"stable tensorflow-model-server tensorflow-model-server-universal" |

    tee /etc/apt/sources.list.d/tensorflow-serving.list && \

!curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg 

    | apt-key add -
!apt update
# Install TensorFlow Server

!apt-get install tensorflow-model-server
 
# Run TensorFlow Serving on a new thread
 
os.environ["MODEL_DIR"] = MODEL_DIR
 
%%bash --bg 
nohup tensorflow_model_server \

  --rest_api_port=8501 \

  --model_name=cifar_10 \

  --model_base_path="${MODEL_DIR}" >server.log 2>&1

Starting job # 0 in a separate thread.

Now, we define the function to make REST API requests to the server. This will take random images from test dataset and send it to model (in the JSON format). The model then returns prediction in the JSON format.

Now, we will run our model on Docker Environment. It supports both CPU and GPU architecture and it is also recommended by developers at TensorFlow. For serving model, it provides a REST (def PORT: 8501) and gRPC[Remote Procedure Call] (def PORT: 8500) endpoint to communicate with models. We will host our model on docker using the following commands.

Code:

python3

# To TensorFlow Image from Docker Hub

!docker pull tensorflow/serving
# Run our model

!docker run -p 8500:8500 -p 8501:8501 \

  --mount type=bind,source=`pwd`/cifar_10/,target=/models/cifar_10 \

  -e MODEL_NAME=cifar_10 -t tensorflow/serving

Now, we need to write a script to communicate with our model using REST and gRPC endpoints. Below is the script for REST endpoint. Save this code and run it in terminal using python. Download some images from CIFAR-10 dataset and test the results

Code:

python3

import json

import requests

import sys

from PIL import Image

import numpy as np
 
def get_rest_url(model_name, host='127.0.0.1', 

                 port='8501', task='predict', version=None):

    """ This function takes hostname, port, task (b/w predict and classify)

    and version to generate the URL path for REST API"""

    # Our REST URL should be http://127.0.0.1:8501/v1/models/cifar_10/predict

    url = "http://{host}:{port}/v1/models/{model_name}".format(host=host,

     port=port, model_name=model_name)

    if version:

        url += 'versions/{version}'.format(version=version)

    url += ':{task}'.format(task=task)

    return url
 
def get_model_prediction(model_input, model_name='cifar_10', 

                         signature_name='serving_default'):

    """ This function sends request to the URL and get prediction 

    in the form of response"""

    url = get_rest_url(model_name)

    image = Image.open(model_input)

    # convert image to array

    im =  np.asarray(image)

    # add the 4th dimension

    im = np.expand_dims(im, axis=0)

    im= im/255

    print("Image shape: ",im.shape)

    data = json.dumps({"signature_name": "serving_default", 

                       "instances": im.tolist()})

    headers = {"content-type": "application/json"}

    # Send the post request and get response    

    rv = requests.post(url, data=data, headers=headers)

    return rv.json()['predictions']
 
if __name__ == '__main__':

    class_names =["airplane","automobile","bird","cat","deer"

    ,"dog","frog","horse", "ship","truck"]

    print("\nGenerate REST url ...")

    url = get_rest_url(model_name='cifar_10')

    print(url)

    while True:

        print("\nEnter the image path [:q for Quit]")

        if sys.version_info[0] >= 3:

            path = str(input())

        if path == ':q':

            break

        model_prediction = get_model_prediction(path)

        print("The model predicted ...")

        print(class_names[np.argmax(model_prediction)])

And the code below gRPC request. Here, it is important to get the right signature name (by default it is ‘serving default’) and the name of the input and output layers.

Code:

python3

import sys

import grpc

from grpc.beta import implementations

import tensorflow as tf

from PIL import Image

import numpy as np
# import prediction service functions from TF-Serving API

from tensorflow_serving.apis import predict_pb2

from tensorflow_serving.apis import prediction_service_pb2, get_model_metadata_pb2

from tensorflow_serving.apis import prediction_service_pb2_grpc
 
def get_stub(host='127.0.0.1', port='8500'):

    channel = grpc.insecure_channel('127.0.0.1:8500') 

    stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

    return stub
 
def get_model_prediction(model_input, stub, model_name='cifar_10', signature_name='serving_default'):

    """ input => (image path, url, model_name, signature)

        output the results in the form of tf.array"""

    image = Image.open(model_input)

    im =  np.asarray(image, dtype=np.float64)

    im = (im/255)

    im = np.expand_dims(im, axis=0)

    print("Image shape: ",im.shape)

    # We will be using Prediction Task so it uses predictRequest function from predict_pb2

    request = predict_pb2.PredictRequest()

    request.model_spec.name = model_name

    request.model_spec.signature_name = signature_name

    #pass Image input to input layer (Here it is named 'conv2d_4_input')

    request.inputs['conv2d_4_input'].CopyFrom(tf.make_tensor_proto(im, dtype = tf.float32))

    response = stub.Predict.future(request, 5.0)

    # get results from final layer(dense_3)

    return response.result().outputs["dense_3"].float_val
 
def get_model_version(model_name, stub):

    request = get_model_metadata_pb2.GetModelMetadataRequest()

    request.model_spec.name = 'cifar_10'

    request.metadata_field.append("signature_def")

    response = stub.GetModelMetadata(request, 10)

    # signature of loaded model is available here: response.metadata['signature_def']

    return response.model_spec.version.value
 
if __name__ == '__main__':

    class_names =["airplane","automobile","bird","cat","deer","dog","frog","horse", "ship","truck"]

    print("\nCreate RPC connection ...")

    stub = get_stub()

    while True:

        print("\nEnter the image path [:q for Quit]")

        if sys.version_info[0] <= 3:

            path = raw_input() if sys.version_info[0] < 3 else input()

        if path == ':q':

            break

        model_input = str(path)

        model_prediction = get_model_prediction(model_input, stub)

        print(" Predictiom from Model ...")

        print(class_names[np.argmax(model_prediction)])

References:

TensorFlow Serving Docs

Article Tags :

Machine Learning

Python