Open In App

Serving a TensorFlow Model

Last Updated : 25 Sep, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

TensorFlow Serving stands as a versatile and high-performance system tailored for serving machine learning models in production settings. Its primary objective is to simplify the deployment of novel algorithms and experiments while maintaining consistent server architecture and APIs. While it seamlessly integrates with TensorFlow models, TensorFlow Serving’s adaptability also enables the service to be expanded for serving diverse model types and data beyond TensorFlow.

TensorFlow Serving

Tensorflow servable architecture consists of four components: servables, loaders, managers, and core. The explanation for the same are:

1. Servables

Servables can be considered as the core abstraction in TensorFlow Serving. Servables are the objects that are used by clients to perform the computations like inference or lookup. Servables are available in flexible size and granularity, and a single Servable may be anything from a single model to a tuple of inference models. Servables allow flexibility and future improvements such as streaming results, experimental APIs, and asynchronous modes of operation. They do not manage their own life cycles. A Servable may include:

  • TensorFlow SavedModelBundle (tensorflow::Session)
  • Lookup table for embedding or vocabulary lookups

Servable Versions: A TensorFlow Serving can handle one or more versions of servable over the lifetime of a single instance of the server. Versions allows multiple versions of the serverable to be loaded concurrently. They help in experimentation and gradual roll-outs.

Models: TF Serving represents the model as one or more servables and may contain one or more algorithms, lookups or embeddings. A composite model may be represented as multiple independent servables or single composite servables.

Servable Streams: These are the sequence of versions of servables sorted in ascending order.

2. Loaders

Loaders play a pivotal role in overseeing the lifecycle of a servable. They leverage the Loader API to establish a uniform infrastructure that operates independently of the underlying learning algorithms, data, or product use cases. In essence, Loaders provide a standardized set of interfaces for the loading and unloading of servables, ensuring consistent management practices.

3. Sources

Sources represent plugin modules responsible for locating and furnishing servables. A Source is often connected with zero or more SourceAdapters, and the final component in this chain is responsible for emitting the Loaders. TF Serving’s interface for Sources is designed to discover servables from diverse storage systems. Additionally, TensorFlow Serving incorporates standard reference implementations for Sources. These Sources can leverage various mechanisms such as remote procedure calls (RPC) or file system polling to access servables. One notable feature of Sources is their ability to maintain shared state across multiple servables or versions.

Aspired Versions: Aspired versions denote the collection of serviceable versions that need to be loaded and prepared. Sources convey this set of serviceable versions for a single serviceable stream at any given moment. When a Source provides a new roster of aspired versions to the Manager, it supersedes the previous roster for that particular serviceable stream. The Manager proceeds to unload any previously loaded versions that are no longer included in the new list.

4. Managers

Managers handle full life cycle of Servables including: loading, serving and unloading the Servables. Managers monitor Sources and maintain a record of all available versions. While Managers make an effort to satisfy the requests from Sources, there may be situations where they decline to load a desired version due to, for instance, insufficient resources. Additionally, Managers have the flexibility to delay the unloading of a version. For instance, they might opt to postpone unloading until a newer version has completed loading, adhering to a policy that ensures there is always at least one loaded version at any given time. TF Serving Managers provide a simple, narrow interface — GetServableHandle() — for clients to access loaded servable instances.

5. Core

TF core manages services like lifecycle and metrics using the TensorFlow Serving. It treats servables and loaders as opaque objects.

Lifecycle of a Servable

Lifecycle of a Servable is defined as:

1. Loader Creation: Initially, a Source plugin is responsible for crafting a Loader tailored to a specific version of a Servable. This Loader is equipped with all the essential metadata necessary for the subsequent loading of the Servable.

2. Aspired Version Notification: The Source employs a callback mechanism to relay information about the Aspired Version to the Manager. This signifies the version that the system should ideally have loaded and ready for use.

3. Version Policy Application: Upon receiving the Aspired Version information, the Manager enters the decision-making phase. It employs the Version Policy that has been configured to determine the most appropriate action to take next. This action can vary and might involve unloading a previously loaded version to make way for the Aspired Version or initiating the loading process for the new version.

4. Resource Allocation and Loading: If the Manager, following the guidelines of the Version Policy, decides that it is safe and feasible to proceed, it proceeds to allocate the necessary resources to the Loader. Once the required resources are allocated, the Manager instructs the Loader to commence the loading procedure for the new version.

5. Servable Access for Clients: Clients who require access to the Servable interact with the Manager. Clients have the option to either explicitly specify the desired version they wish to use or simply request the latest available version. In response, the Manager furnishes the clients with a handle or reference that enables them to access and utilize the Servable as needed.

TensorFlow Serving -Geeksforgeeks

TensorFlow Serving

Requirements

To run tensorflow servings, you need to have Ubuntu or docker. DO NOT TRY TO INSTALL APT-GET AS IT WILL NOT WORK, TRY DOCKER INSTEAD FOR YOUR OS.

On Ubuntu

On your machine, run the following command to add packages to apt repository:

!echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

Install TensorFlow Serving

$ wget 'http://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-2.8.0/t/tensorflow-model-server/tensorflow-model-server_2.8.0_all.deb'
$ dpkg -i tensorflow-model-server_2.8.0_all.deb
$ pip3 install tensorflow-serving-api==2.8.0

Build the model

Step 1: Import the necessary libraries

Python3




import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import os
import subprocess


Step 2: Load the dataset

Python3




#Load dataset, dataset is cifar 10. It is open source
(data_train, label_train), (data_test, label_test) = keras.datasets.cifar10.load_data()
 
print('Train data :',data_train.shape,label_train.shape)
print('Test data :',data_test.shape,label_test.shape)


Output:

Train data : (50000, 32, 32, 3) (50000, 1)
Test data : (10000, 32, 32, 3) (10000, 1)

Number of classes

Python3




classes = ['aeroplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
num_classes


Output:

10

Step 3: Build the model

Python3




model = keras.models.Sequential()
model.add(keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Conv2D(128, (3, 3), activation='relu'))
model.add(keras.layers.Conv2D(128, (3, 3), activation='relu'))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(256, activation='relu'))
model.add(keras.layers.Dense(num_classes, activation='softmax'))
# Compile the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=[keras.metrics.SparseCategoricalAccuracy()])
# Model summary
model.summary()


Output:

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896

max_pooling2d (MaxPooling2D (None, 15, 15, 32) 0
)

conv2d_1 (Conv2D) (None, 13, 13, 64) 18496

max_pooling2d_1 (MaxPooling (None, 6, 6, 64) 0
2D)

conv2d_2 (Conv2D) (None, 4, 4, 128) 73856

conv2d_3 (Conv2D) (None, 2, 2, 128) 147584

max_pooling2d_2 (MaxPooling (None, 1, 1, 128) 0
2D)

flatten (Flatten) (None, 128) 0

dense (Dense) (None, 256) 33024

dense_1 (Dense) (None, 10) 2570

=================================================================
Total params: 276,426
Trainable params: 276,426
Non-trainable params: 0
_________________________________________________________________

Step 4: Train the model

Python3




epoch = 4
model.fit(data_train, label_train, epochs=epoch)


Output:

Epoch 1/4
1563/1563 [==============================] - 14s 8ms/step - loss: 1.6731 - sparse_categorical_accuracy: 0.4234
Epoch 2/4
1563/1563 [==============================] - 14s 9ms/step - loss: 1.2742 - sparse_categorical_accuracy: 0.5499
Epoch 3/4
1563/1563 [==============================] - 17s 11ms/step - loss: 1.1412 - sparse_categorical_accuracy: 0.5994
Epoch 4/4
1563/1563 [==============================] - 17s 11ms/step - loss: 1.0541 - sparse_categorical_accuracy: 0.6328
<keras.callbacks.History at 0x7f6f5c532920>

Step 5: Evaluate the model

Python3




loss, accuracy_ = model.evaluate(data_test, label_test)
print()
print(f"Test accuracy: {accuracy_*100}")
print(f"Test loss: {loss}")


Output:

313/313 [==============================] - 1s 4ms/step - loss: 1.0704 - sparse_categorical_accuracy: 0.6358

Test accuracy: 63.58000040054321
Test loss: 1.0704132318496704

Step 6: Save the model

Python3




import tempfile
import os
 
# Specify the current version
curr_version = 1
 
# Create a temporary directory
my_dir = tempfile.gettempdir()
 
# Create a directory path for the current version
path = os.path.join(my_dir, str(curr_version))
print(f'Export Path = {path}\n')
 
# Saving the current model
model.save(path, overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None)
 
print('\nSaved model:')
 
# Listing the contents of the export path
import glob
file_list = glob.glob(f'{path}/*')
for file in file_list:
    print(file)


Output:

Export Path = /tmp/1
INFO:tensorflow:Assets written to: /tmp/1/assets

Saved model:
/tmp/1/fingerprint.pb
/tmp/1/saved_model.pb
/tmp/1/assets
/tmp/1/variables
/tmp/1/keras_metadata.pb

Deployment

Step 1: Check the saved model

Python3




!saved_model_cli show --dir {path} --all


Output:

2023-09-15 14:34:42.403572: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
The given SavedModel SignatureDef contains the following input(s):
The given SavedModel SignatureDef contains the following output(s):
outputs['__saved_model_init_op'] tensor_info:
dtype: DT_INVALID
shape: unknown_rank
name: NoOp
Method name is:

signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['conv2d_input'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 32, 32, 3)
name: serving_default_conv2d_input:0
The given SavedModel SignatureDef contains the following output(s):
outputs['dense_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 10)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
The MetaGraph with tag set ['serve'] contains the following ops: {'AssignVariableOp', 'StringJoin', 'BiasAdd', 'StatefulPartitionedCall', 'Pack', 'MaxPool', 'SaveV2', 'VarHandleOp', 'Identity', 'Softmax', 'NoOp', 'StaticRegexFullMatch', 'Relu', 'Const', 'DisableCopyOnRead', 'MatMul', 'MergeV2Checkpoints', 'Reshape', 'Conv2D', 'Select', 'Placeholder', 'RestoreV2', 'ShardedFilename', 'ReadVariableOp'}

Concrete Functions:
Function Name: '__call__'
Option #1
Callable with:
Argument #1
inputs: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='inputs')
Argument #2
DType: bool
Value: True
Argument #3
DType: NoneType
Value: None
Option #2
Callable with:
Argument #1
conv2d_input: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='conv2d_input')
Argument #2
DType: bool
Value: True
Argument #3
DType: NoneType
Value: None
Option #3
Callable with:
Argument #1
conv2d_input: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='conv2d_input')
Argument #2
DType: bool
Value: False
Argument #3
DType: NoneType
Value: None
Option #4
Callable with:
Argument #1
inputs: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='inputs')
Argument #2
DType: bool
Value: False
Argument #3
DType: NoneType
Value: None

Function Name: '_default_save_signature'
Option #1
Callable with:
Argument #1
conv2d_input: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='conv2d_input')

Function Name: 'call_and_return_all_conditional_losses'
Option #1
Callable with:
Argument #1
inputs: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='inputs')
Argument #2
DType: bool
Value: True
Argument #3
DType: NoneType
Value: None
Option #2
Callable with:
Argument #1
inputs: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='inputs')
Argument #2
DType: bool
Value: False
Argument #3
DType: NoneType
Value: None
Option #3
Callable with:
Argument #1
conv2d_input: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='conv2d_input')
Argument #2
DType: bool
Value: False
Argument #3
DType: NoneType
Value: None
Option #4
Callable with:
Argument #1
conv2d_input: TensorSpec(shape=(None, 32, 32, 3), dtype=tf.float32, name='conv2d_input')
Argument #2
DType: bool
Value: True
Argument #3
DType: NoneType
Value: None

Step 2: Define the model directory

Python3




import os
# Define the model directory
os.environ["MODEL_DIR"] = my_dir


Step 3: Start TensorFlow Model Server

Python3




import subprocess
 
# Command to start TensorFlow Model Server
command = f"nohup tensorflow_model_server --rest_api_port=8501 --model_name=CIFARModel --model_base_path='{my_dir}' > server.log 2>&1"
 
# Execute the command using subprocess
subprocess.Popen(command, shell=True)


Output:

<Popen: returncode: None args: "nohup tensorflow_model_server --rest_api_por...>

Check the server log

Python3




!tail server.log


Output:

2023-09-15 14:34:44.080115: E external/org_tensorflow/tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] 
tfg_optimizer{} failed: NOT_FOUND: Op type not registered 'DisableCopyOnRead' in binary running on GFG19509-LAPTOP.
Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph
which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph,
as contrib ops are lazily registered when the module is first accessed.
when importing GraphDef to MLIR module in GrapplerHook

Step 4: Request to the model in TensorFlow Serving to Predict the label of test images

Plot the first 10 test images with their lables

Python3




import matplotlib.pyplot as plt
 
def plot_images(images, titles, rows=2, cols=5):
    fig, axes = plt.subplots(rows, cols, figsize=(13, 5))
    for i, ax in enumerate(axes.ravel()):
        ax.imshow(images[i].reshape(32, 32, 3))
        ax.axis('off')
        ax.set_title(titles[i])
 
# Select the first 10 images from data_test and their corresponding labels
sample_indices = np.linspace(0,9,10,dtype=int)
sample_images = [data_test[i] for i in sample_indices]
sample_labels = [classes[label_test[i].item()] for i in sample_indices]
 
# Plot the selected images
plot_images(sample_images, sample_labels)
plt.show()


Output:

Test images with Actual label -Geeksforgeeks

Test images with Actual label

Create JSON Object

Create JSON Object with JSON library as shown the code below. I have used official code, you may modify code for your needs accordingly.

Python3




import json
 
# Define the data
signature_name = "serving_default"
# Consider the first 10 data_test images
instances = data_test[0:10].tolist()
 
# Create a dictionary
data_dict = {
    "signature_name": signature_name,
    "instances": instances
}
 
# Convert the dictionary to a JSON string
data = json.dumps(data_dict)
 
# Print the JSON data
print(f'Data: {data[:50]} ... {data[-52:]}')


Output:

Data: {"signature_name": "serving_default", "instances": ... 164, 163, 204], [182, 182, 225], [186, 185, 223]]]]}

RUN Experiments

To run experiments, we need to define JSON Objects, install requests, and do some visualisation.

Install Requests

We will install the requests library. Run the following bash command:

!pip install -q requests

Run the following Python 3 code:

Python3




import requests
import json
 
# Define the API endpoint
 
# Set the request headers
headers = {"content-type": "application/json"}
 
# Send a POST request to the API with the JSON data
response = requests.post(api_url, data=data, headers=headers)
 
# Check if the request was successful
if response.status_code == 200:
    # Parse the JSON response and extract predictions
    response_data = json.loads(response.text)
    predictions = response_data['predictions']
else:
    print(f"Failed to make a request. Status code: {response.status_code}")


Now run the following codes to check the classes of both predicted and actual label:

Python3




for prediction in predictions:
    target = max(prediction)
    object_ = prediction.index(target)
    print(classes[object_])


Output:

cat
ship
ship
aeroplane
frog
frog
automobile
frog
cat
truck

You can see the output taken from the model which is around 70% accurate and matches the model accuracy.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads