Create a PyTorch Docker image ready for production

Given a PyTorch model, how should we put it in a Docker image, with all the related dependencies, ready to be deployed?

| Published

cover
Photo by Michael Dziedzic on Unsplash

You know the drill: your Data Science team has created an amazing PyTorch model, and now they want you to put it in production. They give you a .pt file and some preprocessing script. What now?

Luckily, AWS and Facebook have created a project, called Torch Serve, to put PyTorch images in production, similarly to Tensorflow Serving. It is a well-crafted Docker image, where you can upload your models. In this tutorial, we will see how to customize the Docker image to include your model, how to install other dependencies inside it, and which configuration options are available.

We include the PyTorch model directly inside the Docker image, instead of loading it at runtime; while loading it at runtime as some advantages and makes sense in some scenario (as in testing labs where you want to try a lot of different models), I don’t think it is suitable for production. Including the model directly in the Docker image has different advantages:

  • if you use CI/CD you can achieve reproducible builds;

  • to spawn a new instance serving your model, you need to have available only your Docker registry, and not also a storage solutions to store the model;

  • you need to authenticate only to your Docker registry, and not to the storage solution;

  • it makes easier keeping track of what has been deployed, ‘cause you have to check only the Docker image version, and not the model version. This is especially important if you have a cluster of instances serving your model;

Let’s now get our hands dirty and dive in what is necessary to have the Docker image running!

Building the model archive

The Torch Serve Docker image needs a model archive to work: it’s a file with inside a model, and some configurations file. To create it, first install Torch Serve, and have a PyTorch model available somewhere on the PC.

To create this model archive, we need only one command:

1torch-model-archiver --model-name <MODEL_NAME> --version <MODEL_VERSION> --serialized-file <MODEL> --export-path <WHERE_TO_SAVE_THE_MODEL_ARCHIVE>

There are four options we need to specify in this command:

  • MODEL_NAME is an identifier to recognize the model, we can use whatever we want here: it’s useful when we include multiple models inside the same Docker image, a nice feature of Torch Serve that we won’t cover for now;

  • MODEL_VERSION is used to identify, as the name implies, the version of the model;

  • MODEL is the path, on the local PC, with the .pt file acting as model;

  • WHERE_TO_SAVE_THE_MODEL_ARCHIVE is a local directory where Torch Serve will put the model archive it generates;

Putting all together, the command should be something similar to:

1torch-model-archiver --model-name predict_the_future --version 1.0 --serialized-file ~/models/predict_the_future --export-path model-store/

After having run it, we now have a file with .mar extension, the first step to put in production our PyTorch model!

Probably some pre-processing before invoking the model is necessary. If this is the case, we can create a file where we can put all the necessary instructions. This file can have external dependencies, so we can code an entire application in front of our model.

To include the handler file in the model archive, we need only to add the --handler flag to the command above, like this:

1torch-model-archiver --model-name predict_the_future --version 1.0 --serialized-file ~/models/predict_the_future --export-path model-store/ --handler handler.py

Create the Docker image

Now we have the model archive, and we include it in the PyTorch Docker Image. Other than the model archive, we need to create a configuration file as well, to say to PyTorch which model to automatically load at the startup.

We need a config.properties file similar to the following. Later in this tutorial we will see what these lines mean, and what other options are available.

1inference_address=http://0.0.0.0:8080
2management_address=http://0.0.0.0:8081
3number_of_netty_threads=32
4job_queue_size=1000
5model_store=/home/model-server/model-store

Docker image with just the model

If we need to include just the model archive, and the config file, the Dockerfile is quite straightforward, since we just have to copy the files, all the other things will be managed by TorchServe itself. Our Dockerfile will thus be:

1FROM pytorch/torchserve as production
2
3COPY config.properties /home/model-server/config.properties
4COPY predict_the_future.mar /home/model-server/model-store

TorchServe already includes torch, torchvision, torchtext, and torchaudio, so there is no need to add them. To see the current version of these libraries, please go see the requirements file of TorchServe on GitHub.

Docker image with the model and external dependencies

What if we need different Python Dependencies for our Python handler?

In this case, we want to use a two-step Docker image: in the first step we build our dependencies, and then we copy them over to the final image. We list our dependencies in a file called requirements.txt, and we use pip to install them. Pip is the package installer for Python. Their documentation about the format of the requirements file is very complete.

The Dockerfile is now something like this:

1ARG BASE_IMAGE=ubuntu:18.04
2
3# Compile image loosely based on pytorch compile image
4FROM ${BASE_IMAGE} AS compile-image
5ENV PYTHONUNBUFFERED TRUE
6
7# Install Python and pip, and build-essentials if some requirements need to be compiled
8RUN apt-get update && \
9 DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
10 python3-dev \
11 python3-distutils \
12 python3-venv \
13 curl \
14 build-essential \
15 && rm -rf /var/lib/apt/lists/* \
16 && cd /tmp \
17 && curl -O https://bootstrap.pypa.io/get-pip.py \
18 && python3 get-pip.py
19
20RUN python3 -m venv /home/venv
21
22ENV PATH="/home/venv/bin:$PATH"
23
24RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
25RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1
26
27# The part above is cached by Docker for future builds
28# We can now copy the requirements file from the local system
29# and install the dependencies
30COPY requirements.txt .
31
32RUN pip install --no-cache-dir -r requirements.txt
33
34FROM pytorch/torchserve as production
35
36# Copy dependencies after having built them
37COPY --from=compile-image /home/venv /home/venv
38
39# We use curl for health checks on AWS Fargate
40USER root
41RUN apt-get update && \
42 DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
43 curl \
44 && rm -rf /var/lib/apt/lists/*
45
46USER model-server
47
48COPY config.properties /home/model-server/config.properties
49COPY predict_the_future.mar /home/model-server/model-store

If PyTorch is among the dependencies, we should change the line to install the requirements from

1RUN pip install --no-cache-dir -r requirements.txt

to

1RUN pip install --no-cache-dir -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

In this way, we will use the pre-build Python packages for PyTorch instead of installing them from scratch: it will be faster, and it requires less resources, making it suitable also for small CI/CD systems.

Configuring the Docker image

We created a configuration file above, but what does it? Of course, going through all the possible configurations would be impossible, so I leave here the link to the documentation. Among the other things explained there, there is a way to configure Cross-Origin Resource Sharing (necessary to use the model as APIs over the web), a guide on how to enable SSL, and much more.

There is a set of configurations parameters in particular I’d like to focus on: the ones related to logging. First, for production environment, I suggest setting async_logging to true: it could delay a bit the output, but allows a higher throughput. Then, it’s important to notice that by default Torch Serve captures every message, including the ones with severity DEBUG. In production, we probably don’t want this, especially because it can become quite verbose.

To override the default behavior, we need to create a new file, called log4j.properties. For more information on every possible options I suggest familiarizing with the official guide. To start, copy the default Torch Serve configuration, and increase the severity of the printed messages. In particular, change

1log4j.logger.org.pytorch.serve = DEBUG, ts_log
2log4j.logger.ACCESS_LOG = INFO, access_log

to

1log4j.logger.org.pytorch.serve = WARNING, ts_log
2log4j.logger.ACCESS_LOG = WARNING, access_log

We need also to copy this new file to the Docker Image, so copy the logging config just after the config file:

1COPY config.properties /home/model-server/config.properties
2COPY config.properties /home/model-server/log4j.properties

We need to inform Torch Serve about this new config file, and we do so adding a line to config.properties:

1vmargs=-Dlog4j.configuration=file:///home/model-server/log4j.properties

We now have a full functional Torch Serve Docker image, with our custom model, ready to be deployed!

For any question, comment, feedback, critic, suggestion on how to improve my English, leave a comment below, or drop an email at [email protected].

Ciao,
R.

Comments