Deploy your PyTorch model to Production
This tutorial aims to teach you how to deploy your recently trained model in PyTorch as an API using Python.
By Nicolás Metallo, Audatex
Following the last article about Training a Choripan Classifier with PyTorch and Google Colab, we will now talk about what are some steps that you can do if you want to deploy your recently trained model as an API. The discussion on how to do this with Fast.ai is currently ongoing (more) and will most likely continue until PyTorch releases their official 1.0 version. You can find more information in the Fast.ai Forums, PyTorch Documentation/Forums, and their respective GitHub repositories.
Saving and Loading Models
It’s recommended that you take a look at the PyTorch Documentation as it’s a great place to start, but in short, there are two ways to serialize and restore a model. One is loading only the weights and the other loading the entire model (and weights). You will need to first create a model to define its architecture otherwise you will end up with an OrderedDict
with just the weight values. Both options would work for inference and/or for resuming a model's training from a previous checkpoint.
1. Using torch.save()
and torch.load()
This save/load process uses the most intuitive syntax and involves the least amount of code. Saving a model in this way will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.
Save model
torch.save(learner.model, PATH)
Sometimes pickle
is not able to serialize some model creations functions (e.g. resnext_50_32x4d
which is found in previous versions of Fastai) so you need to use dill
instead. Here's the fix.
import dill as dill torch.save(learner.model, PATH, pickle_module=dill)
You can read more about the limitations of pickle
in this article. A common PyTorch convention is to save models using either a .pt
or .pth
file extension.
Load model
# Model class must be defined somewhere model = torch.load(PATH) model.eval()
2. Using state_dict
In PyTorch, the learnable parameters (e.g. weights and biases) of an torch.nn.Module
model are contained in the model’s parameters (accessed with model.parameters()
). A state_dict is simply a Python dictionary object that maps each layer to its parameter tensor. Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) have entries in the model’s state_dict.
We will need to re-initialize our model in the same way it was originally defined and created when we saved the weights, making sure the variables, classes, functions that go into creating the model are available, whether through module imports or directly within the same script/file. One potential advantage of using this method is that you can use updated scripts to load the old model if the parameters are the same, and it’s also the recommended approach by the official documentation. One thing you should also remember is that state_dict
takes a dictionary object, not a path to a saved object, so you can't load using model.load_state_dict(PATH)
.
Save model
torch.save(model.state_dict(), PATH)
Load model
model = TheModelClass(*args, **kwargs) # Model class must be defined somewhere model.load_state_dict(torch.load(PATH)) model.eval() # run if you only want to use it for inference
You run model.eval()
after loading because you usually have BatchNorm
and Dropout
layers that by default are in train mode on construction. You don't need to call model.eval()
if you want to resume your model training.
As we have done our training with Fastai, we can call Learner.save
and Learner.load
to save and load models (more info in the documentation) . This is running state_dict()
in the back so only the model parameters will be saved and not the architecture. This means that you will need to run the create_cnn method to get a pre-trained model from a given architecture (the same that you used before to train your model, e.g. models.resnet34) with a custom head that is suitable for your data. Models are saved and loaded from the path
/model_dir
directory, and the .pth
extension is automatically added for both operations.
Simple Deployment With Flask
After we have trained our classifier using the free GPU available in Google Colab, we are ready to do inference on our end. We can do it locally or in the cloud and there are many different options (AWS, Paperspace, Google Cloud, etc) that we can choose from. As I have some free Amazon AWS credits remaining I’ll be using an Amazon AMI that comes with several ML libraries already installed hosted on a t2.medium instance. Here are some simple instructions to run a Docker image on your end (should be around the same when we are not doing GPU training).
Ready-to-run Docker images
Jupyter Docker Stacks are a great way to get a notebook up and going in no time with the latest libraries. These are ready-to-run Docker images that contain Jupyter applications and interactive computing tools. Learn more about them in the Official Documentation. We will use the Jupyter Notebook Data Science Stack from this repository.
Once you have installed Docker, open the terminal, cd
into your working directory, and run
$ docker run --rm -p 8888:8888 -e JUPYTER_ENABLE_LAB=yes -v "$PWD":/home/jovyan/work jupyter/datascience-notebook:e5c5a7d3e52d
This will create a server that you can log into where you will connect to a Jupyter notebook. You can run some commands directly from there or you can also run bash or any command in a Docker container by getting the container id
, typing docker ps
in the terminal and then running
$ docker exec -it {container-id} /bin/bash
Install PyTorch & Fastai
Depending on your machine configuration you will want to run inference on either GPU or CPU. In our example, we are going to run everything on the CPU, so you need to run the following to install the latest PyTorch.
pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
Now you can install Fastai with pip install fastai
Create a Flask Application
Install the Flask library by running
pip install -U flask
We will create a folder called flask_app
and two new python files server.py
with our code to load the model weights and run the inference server and settings.py
that sets some basic params to give us more flexibility in the future. The following is an example of what flask_app/settings.py
might look like. We will then import this with from settings import *
into server.py
.
# add your custom labels labels = ['Not Choripan', 'Choripan'] # set your data directory data_dir = 'data' # set the URL where you can download your model weights MODEL_URL = 'https://s3.amazonaws.com/nicolas-dataset/stage1.pth' # example weights # set some deployment settings PORT = 8080
We can now go through the flask_app/server.py
. This first part will import the libraries and settings.
# flask_app/server.py # import libraries print('importing libraries...') from flask import Flask, request, jsonify import logging import random import time from PIL import Image import requests, os from io import BytesIO # import fastai stuff from fastai import * from fastai.vision import * import fastai # import settings from settings import * # import print('done!\nsetting up the directories and the model structure...')
In order to run our single image inference prediction, we first need to create a new model that follows the same folder structure that we used when we trained it. That’s why we are going to create a new empty dir based on the labels that we have set before in settings.py
.
# set dir structure def make_dirs(labels, data_dir): root_dir = os.getcwd() make_dirs = ['train', 'valid', 'test'] for n in make_dirs: name = os.path.join(root_dir, data_dir, n) for each in labels: os.makedirs(os.path.join(name, each), exist_ok=True) make_dirs(labels=labels, data_dir=data_dir) # comes from settings.py path = Path(data_dir)
Once that path
is defined, we are going to create a new learn
model and download the pre-trained weights for the Choripan Classifier.
# download model weights if not already saved path_to_model = os.path.join(data_dir, 'models', 'model.pth') if not os.path.exists(path_to_model): print('done!\nmodel weights were not found, downloading them...') os.makedirs(os.path.join(data_dir, 'models'), exist_ok=True) filename = Path(path_to_model) r = requests.get(MODEL_URL) filename.write_bytes(r.content) print('done!\nloading up the saved model weights...') fastai.defaults.device = torch.device('cpu') # run inference on cpu empty_data = ImageDataBunch.single_from_classes( path, labels, tfms=get_transforms(), size=224).normalize(imagenet_stats) learn = create_cnn(empty_data, models.resnet34) learn = learn.load('model')
There are already many great tutorials online that are amazingly detailed so I won’t explain much of how Flask works. I created a predict
function that takes the URL that you receive as INPUT, runs it through learn.predict(img)
to get the predicted class, and then returns a json
.
print('done!\nlaunching the server...') # set flask params app = Flask(__name__) @app.route("/") def hello(): return "Image classification example\n" @app.route('/predict', methods=['GET']) def predict(): url = request.args['url'] app.logger.info("Classifying image %s" % (url),) response = requests.get(url) img = open_image(BytesIO(response.content)) t = time.time() # get execution time pred_class, pred_idx, outputs = learn.predict(img) dt = time.time() - t app.logger.info("Execution time: %0.02f seconds" % (dt)) app.logger.info("Image %s classified as %s" % (url, pred_class)) return jsonify(pred_class) if __name__ == '__main__': app.run(host="0.0.0.0", debug=True, port=PORT)
Once we have that done, we can head over to the terminal, cd into the flask_app
dir, and run python server.py
. We should see something like this.
* Serving Flask app "server" (lazy loading) * Environment: production WARNING: Do not use the development server in a production environment. Use a production WSGI server instead. * Debug mode: on * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit) * Restarting with stat importing libraries... done! setting up the directories and the model structure... done! loading up the saved model weights... done! launching the server... * Debugger is active! * Debugger PIN: 261-786-850
That’s it! Now we can run commands like these from the terminal (I’m running an AWS instance). Let’s take this image for example.
$ curl http://ec2-100-24-34-242.compute-1.amazonaws.com:8080/predict?url=https://media.minutouno.com/adjuntos/150/imagenes/028/853/0028853430.jpg "Choripan"
And this is how it looks from the server side
[2018-11-13 16:49:32,245] INFO in server: Classifying image https://media.minutouno.com/adjuntos/150/imagenes/028/853/0028853430.jpg [2018-11-13 16:49:33,836] INFO in server: Execution time: 1.35 seconds [2018-11-13 16:49:33,858] INFO in server: Image https://media.minutouno.com/adjuntos/150/imagenes/028/853/0028853430.jpg classified as Choripan
Not bad! You now have your very own “Choripan/Not Choripan” API. If you want to move to the next level, please check out this tutorial from the Flask Documentation to deploy to Production and/or this other tutorial if you want to Dockerize your Flask application (you can also use docker-compose).
Other Ways to Deploy to Production
1. Image Classification Example Using Clipper
There’s a great ipynb
that you can follow in the ClipperTutorials GitHub with the basics of how everything works. They provide a Docker image or you can just run their Amazon AMI. Sadly, this is only working with PyTorch 0.4.0 which makes it a real pain to convert to when your models have been trained with the latest preview versions of PyTorch and Fastai. Works great with the example pre-trained model though.
Creating a ClipperConnection
To start Clipper, you must first create a ClipperConnection
object with the type of ContainerManager
you want to use. In this case, you will be using the DockerContainerManager
.
from clipper_admin import ClipperConnection, DockerContainerManager clipper_conn = ClipperConnection(DockerContainerManager())
Starting Clipper
Now that you have a ClipperConnection
object, you can start a Clipper cluster.
The following command will start 3 Docker containers:
- The Query Frontend: The Query Frontend container listens for incoming prediction requests and schedules and routes them to the deployed models.
- The Management Frontend: The Management Frontend container manages and updates the cluster’s internal configuration state, such as tracking which models are deployed and which application endpoints have been registered.
- A Redis instance: Redis is used to persistently store Clipper’s internal configuration state. By default, Redis is started on port 6380 instead of the standard Redis default port 6379 to avoid collisions with any Redis instances that are already running.
clipper_conn.start_clipper() clipper_addr = clipper_conn.get_query_addr()
Take a look at the containers Clipper has started.
!docker ps --filter label=ai.clipper.container.label
Create an Application
app_name = "squeezenet-classsifier" default_output = "default" clipper_conn.register_application( name=app_name, input_type="bytes", default_output=default_output, slo_micros=10000000)
When you list the applications registered with Clipper, you should see the newly registered squeezenet-classifier
application show up.
clipper_conn.get_all_apps()
Load an example pre-trained PyTorch model
from torchvision import models, transforms model = models.squeezenet1_1(pretrained=True)
PyTorch models cannot just be pickled and loaded. Instead, they must be saved using PyTorch’s native serialization API. Because of this, you cannot use the generic Python model deployer to deploy the model to Clipper. Instead, you will use the Clipper PyTorch deployer to deploy it. The Docker container will load and reconstruct the model from the serialized model checkpoint when the container is started.
Preprocessing
# First we define the preproccessing on the images: normalize = transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ) preprocess = transforms.Compose([ transforms.Scale(256), transforms.CenterCrop(224), transforms.ToTensor(), normalize ]) # Then we download the labels: labels = {int(key):value for (key, value) in requests.get('https://s3.amazonaws.com/outcome-blog/imagenet/labels.json').json().items()}
Define a predict function and add metrics
import clipper_admin.metrics as metrics def predict_torch_model(model, imgs): import io import PIL.Image import torch import clipper_admin.metrics as metrics metrics.add_metric("batch_size", 'Gauge', 'Batch size passed to PyTorch predict function.') metrics.report_metric('batch_size', len(imgs)) # TODO: Fill in the batch size # We first prepare a batch from `imgs` img_tensors = [] for img in imgs: img_tensor = preprocess(PIL.Image.open(io.BytesIO(img))) img_tensor.unsqueeze_(0) img_tensors.append(img_tensor) img_batch = torch.cat(img_tensors) # We perform a forward pass with torch.no_grad(): model_output = model(img_batch) # Parse Result img_labels = [labels[out.data.numpy().argmax()] for out in model_output] return img_labels
Clipper must download this Docker image from the internet, so this may take a minute
from clipper_admin.deployers import pytorch as pytorch_deployer pytorch_deployer.deploy_pytorch_model( clipper_conn, name="pytorch-model", version=1, input_type="bytes", func=predict_torch_model, # predict function wrapper pytorch_model=model, # pass model to function )
Now link the generated pytorch-model
to the application squeezenet-classsifier
we created before.
clipper_conn.link_model_to_app(app_name="squeezenet-classsifier", model_name="pytorch-model")
That’s it!
How to query the API with Requests
import requests import json import base64 clipper_addr = 'localhost:1337' for img in ['img1.jpg', 'img2.jpg', 'img3.jpg']: # example with local images req_json = json.dumps({ "input": base64.b64encode(open(img, "rb").read()).decode() # bytes to unicode }) response = requests.post( "http://%s/%s/predict" % (clipper_addr, 'squeezenet-classsifier'), headers={"Content-type": "application/json"}, data=req_jsn) print(response.json())
Stopping Clipper
If you run into issues and want to completely stop Clipper, you can do this by calling ClipperConnection.stop_all()
.
clipper_conn.stop_all()
When you list all the Docker containers a final time, you should see that all of the Clipper containers have been stopped.
!docker ps --filter label=ai.clipper.container.label
2. Using Now from Zeit
Another option under discussion in the Forums is to use the Now service from Zeit. You can follow this guide in the Fast.ai Documentation. I have tried this but have not gotten accurate results (probably because of normalization). Seems very promising.
You will need to run these commands only one time. The first installs Now’s CLI (Command Line Interface).
sudo apt install npm # if not already installed sudo npm install -g now
And the next downloads a starter pack for model deployment based on Fast.ai Lesson 2.
wget https://github.com/fastai/course-v3/raw/master/docs/production/zeit.tgz tar xf zeit.tgz cd zeit
Upload your trained model file
Upload your trained model file (for example stage-2.pth
) to a cloud service like Google Drive or Dropbox. Copy the download link for the file. Note: the download link is the one which starts the file download directly—and is normally different than the share link which presents you with a view to download the file (use https://rawdownload.now.sh/ if needed)
Customize the app for your model
- Open up the file
server.py
inside theapp
directory and update themodel_file_url
variable with the url copied above - In the same file, update the line
classes = ['black', 'grizzly', 'teddys']
with the classes you are expecting from your model
Deploy
On the terminal, make sure you are in the zeit
directory, then type:
now
The first time you run this, it will prompt for your email address and create your Now account for you. After your account is created, run it again to deploy your project.
Every time you deploy with now
it’ll create a unique Deployment URL for the app. It has a format of xxx.now.sh
, and is shown while you are deploying the app.
3. Using Torch Script
and PyTorch C++ API
These are some of the most recent changes will come with the official 1.0 version of PyTorch. You can follow the instructions in this article from the Documentation or check out the last chapter of the Intro to Deep Learning with PyTorch class in Udacity where they go through this step by step. This is just an overview of the main steps.
import torch import torchvision # An instance of your model. model = torchvision.models.resnet18() # An example input you would normally provide to your model's forward() method. example = torch.rand(1, 3, 224, 224) # Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing. traced_script_module = torch.jit.trace(model, example) # Save the model traced_script_module.save("model-resnet18-jit.pt")
Build a minimal C++ application
- Follow these steps and build
example-app.cpp
andCMakeLists.txt
. - Install Anaconda and get CMAKE to run on your machine. You can install it through their binaries or, if you are running MacOS, type
brew install cmake
(installhomebrew
through these instructions). If you have some problems with CMAKE, remember to download the X-Code command line tools as a.dmg
directly from here (MacOS 10.14 in my case). - Install Caffe2 from here and run
conda install pytorch-nightly-cpu -c pytorch
- Build the application
PyTorch, Libtorch, C++, and NodeJS
Conclusion
I tried my best to summarize some of the options available to deploy your recently trained PyTorch models. Hope this is useful and looking forward to reading your comments.
Bio: Nicolás Metallo is an award-winning entrepreneur with nearly 10 years of professional experience. He graduated from New York University with an MSc in Technology Management & Innovation and he works as a management consultant and freelance deep learning engineer. Nicolas is also the co-founder of INVIP Labs Inc., a social enterprise that helps blind and low vision individuals understand their environment better through computer vision. His particular interest in data science is focused on providing cities with data to make them more connected, efficient, resilient, vibrant, and prosperous.
Original. Reposted with permission.
Related:
- How to build an API for a machine learning model in 5 minutes using Flask
- Introduction to PyTorch for Deep Learning
- Deep Learning Cheat Sheets