Deploying Trained Models to Production with TensorFlow Serving
TensorFlow provides a way to move a trained model to a production environment for deployment with minimal effort. In this article, we’ll use a pre-trained model, save it, and serve it using TensorFlow Serving.
Once you’ve trained a TensorFlow model and it’s ready to be deployed, you’d probably like to move it to a production environment. Luckily, TensorFlow provides a way to do this with minimal effort. In this article, we’ll use a pre-trained model, save it, and serve it using TensorFlow Serving. Let’s get moving!
TensorFlow ModelServer
TensorFlow Serving is a system built with the sole purpose of bringing machine learning models to production. TensorFlow’s ModelServer provides support for RESTful APIs. However, we’ll need to install it before we can use it. First, let’s add it as a package source.
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
Installing TensorFlow ModelServer can now be done by updating the system and using apt-get
to install it.
$ sudo apt-get update$ sudo apt-get install tensorflow-model-server
Developing the Model
Next, let’s use a pre-trained model to create the model we’d like to serve. In this case, we’ll use a version of VGG16 with weights pre-trained on ImageNet. To make it work, we have to get a couple of imports out of the way:
VGG16
the architectureimage
for working with image filespreprocess_input
for pre-processing image inputsdecode_predictions
for showing us the probability and class names
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
Next, we define the model with the ImageNet weights.
model = VGG16(weights=’imagenet’)
With the model in place, we can try out a sample prediction. We start by defining the path to an image file (a lion) and using image
to load it.
img_path = ‘lion.jpg’
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
After pre-processing it, we can make predictions using it. We can see that it was able to predict that the image is a lion with 99% accuracy.
preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print(‘Predicted:’, decode_predictions(preds, top=3)[0])# Predicted: [('n02129165', 'lion', 0.9999999), ('n02130308',
# 'cheetah', 7.703386e-08), ('n02128385', 'leopard', 6.330456e-09)]
Now that we have our model, we can save it to prepare it for serving with TensorFlow.
Saving the Model
Let’s now save that model. Notice that we’re saving it to a /1
folder to indicate the model version. This is critical, especially when you want to serve new model versions automatically. More on that in a few.
model.save(‘vgg16/1’)
Running the Server with TensorFlow ModelServer
Let’s start by defining the configuration we’ll use for serving:
name
is the name of our model—in this case, we’ll call itvgg16
.base_path
is the absolute path to the location of our saved model. Be sure to change this to your own path.- The
model_platform
is obviously TensorFlow. model_version_policy
enables us to specify model versioning information.
Now we can run the command that will serve the model from the command line:
rest_api_port=8000
means that our REST API will be served at port 8000.model_config_file
defines the config file that we’d defined above.model_config_file_poll_wait_seconds
indicates how long to wait before checking for changes in the config file. For example, changing the version to 2 in the config file would lead to version 2 of the model being served automatically. This is because changes in the config file are being checked every 300 seconds, in this case.
tensorflow_model_server — rest_api_port=8000 — model_config_file=models.config — model_config_file_poll_wait_seconds=300
Making Predictions using the REST API
At this point, the REST API for our model can be found here: http://localhost:8000/v1/models/vgg16/versions/1:predict.
We can use this endpoint to make predictions. In order to do that, we’ll need to pass JSON-formatted data to the endpoint. To that end — no pun intended — we’ll use the json
module in Python. In order to make requests to the endpoint, we’ll use the requests
Python package.
Let’s start by importing those two.
import json
import requests
Remember that the x variable contained the pre-processed image. We’ll create JSON data containing that. Like any other RESTFUL request, we set the content type to application/json
. Afterward, we make a request to our endpoint as we pass in the headers and the data. After getting the predictions, we decode them just like we did at the beginning of this article.
Serving with Docker
There is an even quicker and shorter way for you to serve TensorFlow models—using Docker. This is actually the recommended way, but knowing the previous method is important, just in case you need it for a specific use case. Serving your model with Docker is as easy as pulling the TensorFlow Serving image and mounting your model.
With Docker installed, run this code to pull the TensorFlow Serving image.
docker pull tensorflow/serving
Let’s now use that image to serve the model. This is done using docker run
and passing a couple of arguments:
-p 8501:8501
means that the container’s port 8501 will be accessible on our localhost at port 8501.— name
for naming our container—choose the name your prefer.I’ve chosentf_vgg_server
in this case.— mount type=bind,source=/media/derrick/5EAD61BA2C09C31B/Notebooks/Python/serving/saved_tf_model,target=/models/vgg16
means that the model will be mounted to/models/vgg16
on the Docker container.-e MODEL_NAME=vgg16
indicates that TensorFlow serving should load the model calledvgg16
.-t tensorflow/serving
indicates that we’re using thetensorflow/serving
image that we pulled earlier.&
running the command in the background.
Run the code below on your terminal.
docker run -p 8501:8501 --name tf_vgg_server --mount type=bind,source=/media/derrick/5EAD61BA2C09C31B/Notebooks/Python/serving/saved_tf_model,target=/models/vgg16 -e MODEL_NAME=vgg16 -t tensorflow/serving &
Now we can use the REST API endpoint to make predictions, just like we did previously.
Clearly, we obtained the same results. With that, we’ve seen how we can serve a TensorFlow model with and without Docker.
Final Thoughts
This article from TensorFlow will give you more information on the TensorFlow Serving architecture. If you’d like to dive deeper into that, this resource will get you there.
You can also explore alternative ways of building using the standard TensorFlow ModelServer. In this article, we focused on serving using a CPU, but you can explore how to serve on GPUs, as well.
This repo contains links to more tutorials on TensorFlow Serving. Hopefully, this piece was of service to you!
mwitiderrick/TensorFlow-Serving
Serving TensorFlow Models. Contribute to mwitiderrick/TensorFlow-Serving development by creating an account on GitHub.
Bio: Derrick Mwiti is a data scientist who has a great passion for sharing knowledge. He is an avid contributor to the data science community via blogs such as Heartbeat, Towards Data Science, Datacamp, Neptune AI, KDnuggets just to mention a few. His content has been viewed over a million times on the internet. Derrick is also an author and online instructor. He also trains and works with various institutions to implement data science solutions as well as to upskill their staff. Derrick’s studied Mathematics and Computer Science from the Multimedia University, he also is an alumnus of the Meltwater Entrepreneurial School of Technology. If the world of Data Science, Machine Learning, and Deep Learning interest you, you might want to check his Complete Data Science & Machine Learning Bootcamp in Python course.
Original. Reposted with permission.
Related:
- Dealing with Imbalanced Data in Machine Learning
- How to deploy PyTorch Lightning models to production
- AI Is More Than a Model: Four Steps to Complete Workflow Success