Deploying Trained Models to Production with TensorFlow Serving
TensorFlow provides a way to move a trained model to a production environment for deployment with minimal effort. In this article, we’ll use a pre-trained model, save it, and serve it using TensorFlow Serving.
Once you’ve trained a TensorFlow model and it’s ready to be deployed, you’d probably like to move it to a production environment. Luckily, TensorFlow provides a way to do this with minimal effort. In this article, we’ll use a pre-trained model, save it, and serve it using TensorFlow Serving. Let’s get moving!
TensorFlow Serving is a system built with the sole purpose of bringing machine learning models to production. TensorFlow’s ModelServer provides support for RESTful APIs. However, we’ll need to install it before we can use it. First, let’s add it as a package source.
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
Installing TensorFlow ModelServer can now be done by updating the system and using
apt-get to install it.
$ sudo apt-get update$ sudo apt-get install tensorflow-model-server
Developing the Model
Next, let’s use a pre-trained model to create the model we’d like to serve. In this case, we’ll use a version of VGG16 with weights pre-trained on ImageNet. To make it work, we have to get a couple of imports out of the way:
imagefor working with image files
preprocess_inputfor pre-processing image inputs
decode_predictionsfor showing us the probability and class names
from tensorflow.keras.applications.vgg16 import VGG16 from tensorflow.keras.preprocessing import image from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions import numpy as np
Next, we define the model with the ImageNet weights.
model = VGG16(weights=’imagenet’)
With the model in place, we can try out a sample prediction. We start by defining the path to an image file (a lion) and using
image to load it.
img_path = ‘lion.jpg’ img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x)
After pre-processing it, we can make predictions using it. We can see that it was able to predict that the image is a lion with 99% accuracy.
preds = model.predict(x) # decode the results into a list of tuples (class, description, probability) # (one such list for each sample in the batch) print(‘Predicted:’, decode_predictions(preds, top=3))# Predicted: [('n02129165', 'lion', 0.9999999), ('n02130308', # 'cheetah', 7.703386e-08), ('n02128385', 'leopard', 6.330456e-09)]
Now that we have our model, we can save it to prepare it for serving with TensorFlow.
Saving the Model
Let’s now save that model. Notice that we’re saving it to a
/1 folder to indicate the model version. This is critical, especially when you want to serve new model versions automatically. More on that in a few.
Running the Server with TensorFlow ModelServer
Let’s start by defining the configuration we’ll use for serving:
nameis the name of our model—in this case, we’ll call it
base_pathis the absolute path to the location of our saved model. Be sure to change this to your own path.
model_platformis obviously TensorFlow.
model_version_policyenables us to specify model versioning information.
Now we can run the command that will serve the model from the command line:
rest_api_port=8000means that our REST API will be served at port 8000.
model_config_filedefines the config file that we’d defined above.
model_config_file_poll_wait_secondsindicates how long to wait before checking for changes in the config file. For example, changing the version to 2 in the config file would lead to version 2 of the model being served automatically. This is because changes in the config file are being checked every 300 seconds, in this case.
tensorflow_model_server — rest_api_port=8000 — model_config_file=models.config — model_config_file_poll_wait_seconds=300
Making Predictions using the REST API
At this point, the REST API for our model can be found here: http://localhost:8000/v1/models/vgg16/versions/1:predict.
We can use this endpoint to make predictions. In order to do that, we’ll need to pass JSON-formatted data to the endpoint. To that end — no pun intended — we’ll use the
json module in Python. In order to make requests to the endpoint, we’ll use the
requests Python package.
Let’s start by importing those two.
import json import requests
Remember that the x variable contained the pre-processed image. We’ll create JSON data containing that. Like any other RESTFUL request, we set the content type to
application/json. Afterward, we make a request to our endpoint as we pass in the headers and the data. After getting the predictions, we decode them just like we did at the beginning of this article.
Serving with Docker
There is an even quicker and shorter way for you to serve TensorFlow models—using Docker. This is actually the recommended way, but knowing the previous method is important, just in case you need it for a specific use case. Serving your model with Docker is as easy as pulling the TensorFlow Serving image and mounting your model.
With Docker installed, run this code to pull the TensorFlow Serving image.
docker pull tensorflow/serving
Let’s now use that image to serve the model. This is done using
docker run and passing a couple of arguments:
-p 8501:8501means that the container’s port 8501 will be accessible on our localhost at port 8501.
— namefor naming our container—choose the name your prefer.I’ve chosen
tf_vgg_serverin this case.
— mount type=bind,source=/media/derrick/5EAD61BA2C09C31B/Notebooks/Python/serving/saved_tf_model,target=/models/vgg16means that the model will be mounted to
/models/vgg16on the Docker container.
-e MODEL_NAME=vgg16indicates that TensorFlow serving should load the model called
-t tensorflow/servingindicates that we’re using the
tensorflow/servingimage that we pulled earlier.
&running the command in the background.
Run the code below on your terminal.
docker run -p 8501:8501 --name tf_vgg_server --mount type=bind,source=/media/derrick/5EAD61BA2C09C31B/Notebooks/Python/serving/saved_tf_model,target=/models/vgg16 -e MODEL_NAME=vgg16 -t tensorflow/serving &
Now we can use the REST API endpoint to make predictions, just like we did previously.
Clearly, we obtained the same results. With that, we’ve seen how we can serve a TensorFlow model with and without Docker.
This repo contains links to more tutorials on TensorFlow Serving. Hopefully, this piece was of service to you!
Serving TensorFlow Models. Contribute to mwitiderrick/TensorFlow-Serving development by creating an account on GitHub.
Bio: Derrick Mwiti is a data scientist who has a great passion for sharing knowledge. He is an avid contributor to the data science community via blogs such as Heartbeat, Towards Data Science, Datacamp, Neptune AI, KDnuggets just to mention a few. His content has been viewed over a million times on the internet. Derrick is also an author and online instructor. He also trains and works with various institutions to implement data science solutions as well as to upskill their staff. Derrick’s studied Mathematics and Computer Science from the Multimedia University, he also is an alumnus of the Meltwater Entrepreneurial School of Technology. If the world of Data Science, Machine Learning, and Deep Learning interest you, you might want to check his Complete Data Science & Machine Learning Bootcamp in Python course.
Original. Reposted with permission.
- Dealing with Imbalanced Data in Machine Learning
- How to deploy PyTorch Lightning models to production
- AI Is More Than a Model: Four Steps to Complete Workflow Success