Deploy your PyTorch model to Production
This tutorial aims to teach you how to deploy your recently trained model in PyTorch as an API using Python.
By Nicolás Metallo, Audatex
Following the last article about Training a Choripan Classifier with PyTorch and Google Colab, we will now talk about what are some steps that you can do if you want to deploy your recently trained model as an API. The discussion on how to do this with Fast.ai is currently ongoing (more) and will most likely continue until PyTorch releases their official 1.0 version. You can find more information in the Fast.ai Forums, PyTorch Documentation/Forums, and their respective GitHub repositories.
Saving and Loading Models
It’s recommended that you take a look at the PyTorch Documentation as it’s a great place to start, but in short, there are two ways to serialize and restore a model. One is loading only the weights and the other loading the entire model (and weights). You will need to first create a model to define its architecture otherwise you will end up with an
OrderedDict with just the weight values. Both options would work for inference and/or for resuming a model's training from a previous checkpoint.
This save/load process uses the most intuitive syntax and involves the least amount of code. Saving a model in this way will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.
pickle is not able to serialize some model creations functions (e.g.
resnext_50_32x4d which is found in previous versions of Fastai) so you need to use
dill instead. Here's the fix.
You can read more about the limitations of
pickle in this article. A common PyTorch convention is to save models using either a
.pth file extension.
In PyTorch, the learnable parameters (e.g. weights and biases) of an
torch.nn.Module model are contained in the model’s parameters (accessed with
model.parameters()). A state_dict is simply a Python dictionary object that maps each layer to its parameter tensor. Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) have entries in the model’s state_dict.
We will need to re-initialize our model in the same way it was originally defined and created when we saved the weights, making sure the variables, classes, functions that go into creating the model are available, whether through module imports or directly within the same script/file. One potential advantage of using this method is that you can use updated scripts to load the old model if the parameters are the same, and it’s also the recommended approach by the official documentation. One thing you should also remember is that
state_dict takes a dictionary object, not a path to a saved object, so you can't load using
model.eval() after loading because you usually have
Dropout layers that by default are in train mode on construction. You don't need to call
model.eval() if you want to resume your model training.
As we have done our training with Fastai, we can call
Learner.load to save and load models (more info in the documentation) . This is running
state_dict() in the back so only the model parameters will be saved and not the architecture. This means that you will need to run the create_cnn method to get a pre-trained model from a given architecture (the same that you used before to train your model, e.g. models.resnet34) with a custom head that is suitable for your data. Models are saved and loaded from the
model_dir directory, and the
.pth extension is automatically added for both operations.
Simple Deployment With Flask
After we have trained our classifier using the free GPU available in Google Colab, we are ready to do inference on our end. We can do it locally or in the cloud and there are many different options (AWS, Paperspace, Google Cloud, etc) that we can choose from. As I have some free Amazon AWS credits remaining I’ll be using an Amazon AMI that comes with several ML libraries already installed hosted on a t2.medium instance. Here are some simple instructions to run a Docker image on your end (should be around the same when we are not doing GPU training).
Ready-to-run Docker images
Jupyter Docker Stacks are a great way to get a notebook up and going in no time with the latest libraries. These are ready-to-run Docker images that contain Jupyter applications and interactive computing tools. Learn more about them in the Official Documentation. We will use the Jupyter Notebook Data Science Stack from this repository.
Once you have installed Docker, open the terminal,
cd into your working directory, and run
This will create a server that you can log into where you will connect to a Jupyter notebook. You can run some commands directly from there or you can also run bash or any command in a Docker container by getting the
container id, typing
docker ps in the terminal and then running
Install PyTorch & Fastai
Depending on your machine configuration you will want to run inference on either GPU or CPU. In our example, we are going to run everything on the CPU, so you need to run the following to install the latest PyTorch.
Now you can install Fastai with
pip install fastai
Create a Flask Application
Install the Flask library by running
We will create a folder called
flask_app and two new python files
server.py with our code to load the model weights and run the inference server and
settings.py that sets some basic params to give us more flexibility in the future. The following is an example of what
flask_app/settings.py might look like. We will then import this with
from settings import * into
We can now go through the
flask_app/server.py. This first part will import the libraries and settings.
In order to run our single image inference prediction, we first need to create a new model that follows the same folder structure that we used when we trained it. That’s why we are going to create a new empty dir based on the labels that we have set before in
path is defined, we are going to create a new
learn model and download the pre-trained weights for the Choripan Classifier.
There are already many great tutorials online that are amazingly detailed so I won’t explain much of how Flask works. I created a
predict function that takes the URL that you receive as INPUT, runs it through
learn.predict(img) to get the predicted class, and then returns a
Once we have that done, we can head over to the terminal, cd into the
flask_app dir, and run
python server.py. We should see something like this.
That’s it! Now we can run commands like these from the terminal (I’m running an AWS instance). Let’s take this image for example.
And this is how it looks from the server side
Not bad! You now have your very own “Choripan/Not Choripan” API. If you want to move to the next level, please check out this tutorial from the Flask Documentation to deploy to Production and/or this other tutorial if you want to Dockerize your Flask application (you can also use docker-compose).
Other Ways to Deploy to Production
1. Image Classification Example Using Clipper
There’s a great
ipynb that you can follow in the ClipperTutorials GitHub with the basics of how everything works. They provide a Docker image or you can just run their Amazon AMI. Sadly, this is only working with PyTorch 0.4.0 which makes it a real pain to convert to when your models have been trained with the latest preview versions of PyTorch and Fastai. Works great with the example pre-trained model though.
Creating a ClipperConnection
To start Clipper, you must first create a
ClipperConnection object with the type of
ContainerManager you want to use. In this case, you will be using the
Now that you have a
ClipperConnection object, you can start a Clipper cluster.
The following command will start 3 Docker containers:
- The Query Frontend: The Query Frontend container listens for incoming prediction requests and schedules and routes them to the deployed models.
- The Management Frontend: The Management Frontend container manages and updates the cluster’s internal configuration state, such as tracking which models are deployed and which application endpoints have been registered.
- A Redis instance: Redis is used to persistently store Clipper’s internal configuration state. By default, Redis is started on port 6380 instead of the standard Redis default port 6379 to avoid collisions with any Redis instances that are already running.
Take a look at the containers Clipper has started.
Create an Application
When you list the applications registered with Clipper, you should see the newly registered
squeezenet-classifier application show up.
Load an example pre-trained PyTorch model
PyTorch models cannot just be pickled and loaded. Instead, they must be saved using PyTorch’s native serialization API. Because of this, you cannot use the generic Python model deployer to deploy the model to Clipper. Instead, you will use the Clipper PyTorch deployer to deploy it. The Docker container will load and reconstruct the model from the serialized model checkpoint when the container is started.
Define a predict function and add metrics
Clipper must download this Docker image from the internet, so this may take a minute
Now link the generated
pytorch-model to the application
squeezenet-classsifier we created before.
How to query the API with Requests
If you run into issues and want to completely stop Clipper, you can do this by calling
When you list all the Docker containers a final time, you should see that all of the Clipper containers have been stopped.
2. Using Now from Zeit
Another option under discussion in the Forums is to use the Now service from Zeit. You can follow this guide in the Fast.ai Documentation. I have tried this but have not gotten accurate results (probably because of normalization). Seems very promising.
You will need to run these commands only one time. The first installs Now’s CLI (Command Line Interface).
And the next downloads a starter pack for model deployment based on Fast.ai Lesson 2.
Upload your trained model file
Upload your trained model file (for example
stage-2.pth) to a cloud service like Google Drive or Dropbox. Copy the download link for the file. Note: the download link is the one which starts the file download directly—and is normally different than the share link which presents you with a view to download the file (use https://rawdownload.now.sh/ if needed)
Customize the app for your model
- Open up the file
appdirectory and update the
model_file_urlvariable with the url copied above
- In the same file, update the line
classes = ['black', 'grizzly', 'teddys']with the classes you are expecting from your model
On the terminal, make sure you are in the
zeit directory, then type:
The first time you run this, it will prompt for your email address and create your Now account for you. After your account is created, run it again to deploy your project.
Every time you deploy with
now it’ll create a unique Deployment URL for the app. It has a format of
xxx.now.sh, and is shown while you are deploying the app.
Torch Script and
PyTorch C++ API
These are some of the most recent changes will come with the official 1.0 version of PyTorch. You can follow the instructions in this article from the Documentation or check out the last chapter of the Intro to Deep Learning with PyTorch class in Udacity where they go through this step by step. This is just an overview of the main steps.
Build a minimal C++ application
- Follow these steps and build
- Install Anaconda and get CMAKE to run on your machine. You can install it through their binaries or, if you are running MacOS, type
brew install cmake(install
homebrewthrough these instructions). If you have some problems with CMAKE, remember to download the X-Code command line tools as a
.dmgdirectly from here (MacOS 10.14 in my case).
- Install Caffe2 from here and run
conda install pytorch-nightly-cpu -c pytorch
- Build the application
PyTorch, Libtorch, C++, and NodeJS
I tried my best to summarize some of the options available to deploy your recently trained PyTorch models. Hope this is useful and looking forward to reading your comments.
Bio: Nicolás Metallo is an award-winning entrepreneur with nearly 10 years of professional experience. He graduated from New York University with an MSc in Technology Management & Innovation and he works as a management consultant and freelance deep learning engineer. Nicolas is also the co-founder of INVIP Labs Inc., a social enterprise that helps blind and low vision individuals understand their environment better through computer vision. His particular interest in data science is focused on providing cities with data to make them more connected, efficient, resilient, vibrant, and prosperous.
Original. Reposted with permission.
- How to build an API for a machine learning model in 5 minutes using Flask
- Introduction to PyTorch for Deep Learning
- Deep Learning Cheat Sheets