Skip to content

Commit

Permalink
[Feature] Support torchserve (open-mmlab#979)
Browse files Browse the repository at this point in the history
* Support torchserve

* add torchserve model conversion tool and model handler
* add test script for model server
* modify inference interface

* fix requirements

* fix bugs

* fix unittest

* add docs for model serving

* fix NOTE in docs

* fix doc format
  • Loading branch information
ly015 authored Oct 22, 2021
1 parent 400def1 commit 46fd62f
Show file tree
Hide file tree
Showing 18 changed files with 561 additions and 330 deletions.
6 changes: 2 additions & 4 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,10 @@ All kinds of contributions are welcome, including but not limited to the followi
1. Commit your changes
1. Create a PR

:::{note}

```{note}
- If you plan to add some new features that involve large changes, it is encouraged to open an issue for discussion first.
- If you are the author of some papers and would like to include your method to mmpose, please contact us. We will much appreciate your contribution.

:::
```

## Code style

Expand Down
47 changes: 47 additions & 0 deletions docker/serve/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
ARG PYTORCH="1.6.0"
ARG CUDA="10.1"
ARG CUDNN="7"
FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel

ENV PYTHONUNBUFFERED TRUE

RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
ca-certificates \
g++ \
openjdk-11-jre-headless \
# MMDet Requirements
ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
&& rm -rf /var/lib/apt/lists/*

ENV PATH="/opt/conda/bin:$PATH"
RUN export FORCE_CUDA=1


# MMLAB
ARG PYTORCH
ARG CUDA
RUN ["/bin/bash", "-c", "pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu${CUDA//./}/torch${PYTORCH}/index.html"]
RUN pip install mmpose

# TORCHSEVER
RUN pip install torchserve torch-model-archiver

RUN useradd -m model-server \
&& mkdir -p /home/model-server/tmp

COPY entrypoint.sh /usr/local/bin/entrypoint.sh

RUN chmod +x /usr/local/bin/entrypoint.sh \
&& chown -R model-server /home/model-server

COPY config.properties /home/model-server/config.properties
RUN mkdir /home/model-server/model-store && chown -R model-server /home/model-server/model-store

EXPOSE 8080 8081 8082

USER model-server
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["serve"]
49 changes: 49 additions & 0 deletions docker/serve/Dockerfile_mmcls
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
ARG PYTORCH="1.6.0"
ARG CUDA="10.1"
ARG CUDNN="7"
FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel

ARG MMCV="1.3.8"
ARG MMCLS="0.16.0"

ENV PYTHONUNBUFFERED TRUE

RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
ca-certificates \
g++ \
openjdk-11-jre-headless \
# MMDet Requirements
ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
&& rm -rf /var/lib/apt/lists/*

ENV PATH="/opt/conda/bin:$PATH"
RUN export FORCE_CUDA=1

# TORCHSEVER
RUN pip install torchserve torch-model-archiver

# MMLAB
ARG PYTORCH
ARG CUDA
RUN ["/bin/bash", "-c", "pip install mmcv-full==${MMCV} -f https://download.openmmlab.com/mmcv/dist/cu${CUDA//./}/torch${PYTORCH}/index.html"]
RUN pip install mmcls==${MMCLS}

RUN useradd -m model-server \
&& mkdir -p /home/model-server/tmp

COPY entrypoint.sh /usr/local/bin/entrypoint.sh

RUN chmod +x /usr/local/bin/entrypoint.sh \
&& chown -R model-server /home/model-server

COPY config.properties /home/model-server/config.properties
RUN mkdir /home/model-server/model-store && chown -R model-server /home/model-server/model-store

EXPOSE 8080 8081 8082

USER model-server
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["serve"]
5 changes: 5 additions & 0 deletions docker/serve/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
model_store=/home/model-server/model-store
load_models=all
12 changes: 12 additions & 0 deletions docker/serve/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash
set -e

if [[ "$1" = "serve" ]]; then
shift 1
torchserve --start --ts-config /home/model-server/config.properties
else
eval "$@"
fi

# prevent docker exit
tail -f /dev/null
9 changes: 4 additions & 5 deletions docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ b. Install PyTorch and torchvision following the [official instructions](https:/
conda install pytorch torchvision -c pytorch
```

:::{note}
```{note}
Make sure that your compilation CUDA version and runtime CUDA version match.
:::
```

You can check the supported CUDA version for precompiled packages on the [PyTorch website](https://pytorch.org/).

Expand Down Expand Up @@ -124,8 +124,7 @@ d. Install optional modules
- [pyrender](https://pyrender.readthedocs.io/en/latest/install/index.html) (to run 3d mesh demos)
- [smplx](https://github.com/vchoutas/smplx) (to run 3d mesh demos)

:::{note}

```{note}
1. The git commit id will be written to the version number with step c, e.g. 0.6.0+2e7045c. The version will also be saved in trained models.
It is recommended that you run step d each time you pull some updates from github. If C++/CUDA codes are modified, then this step is compulsory.
Expand All @@ -140,7 +139,7 @@ d. Install optional modules
To use optional dependencies like `smplx`, either install them with `pip install -r requirements/optional.txt`
or specify desired extras when calling `pip` (e.g. `pip install -v -e .[optional]`,
valid keys for the `[optional]` field are `all`, `tests`, `build`, and `optional`) like `pip install -v -e .[tests,build]`.
:::
```

## Install with CPU only

Expand Down
6 changes: 2 additions & 4 deletions docs/tutorials/6_customize_runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,14 +209,12 @@ In such case, we can set the workflow as

so that 1 epoch for training and 1 epoch for validation will be run iteratively.

:::{note}

```{note}
1. The parameters of model will not be updated during val epoch.
1. Keyword `total_epochs` in the config only controls the number of training epochs and will not affect the validation workflow.
1. Workflows `[('train', 1), ('val', 1)]` and `[('train', 1)]` will not change the behavior of `EpochEvalHook` because `EpochEvalHook` is called by `after_train_epoch` and validation workflow only affect hooks that are called through `after_val_epoch`.
Therefore, the only difference between `[('train', 1), ('val', 1)]` and `[('train', 1)]` is that the runner will calculate losses on validation set after each training epoch.

:::
```

## Customize Hooks

Expand Down
104 changes: 98 additions & 6 deletions docs/useful_tools.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# Useful Tools Link
# Useful Tools

Apart from training/testing scripts, We provide lots of useful tools under the `tools/` directory.

<!-- TOC -->

- [Log Analysis](#log-analysis)
- [Model Complexity (experimental)](#model-complexity)
- [Model Complexity (experimental)](#model-complexity-experimental)
- [Model Conversion](#model-conversion)
- [MMPose model to ONNX (experimental)](#mmpose-model-to-onnx--experimental-)
- [MMPose model to ONNX (experimental)](#mmpose-model-to-onnx-experimental)
- [Prepare a model for publishing](#prepare-a-model-for-publishing)
- [Model Serving](#model-serving)
- [Miscellaneous](#miscellaneous)
- [Evaluating a metric](#evaluating-a-metric)
- [Print the entire config](#print-the-entire-config)
Expand Down Expand Up @@ -67,7 +68,7 @@ python tools/analysis/analyze_logs.py cal_train_time ${JSON_LOGS} [--include-out
average iter time: 0.8406 s/iter
```

## Model Complexity
## Model Complexity (Experimental)

`/tools/analysis/get_flops.py` is a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model.

Expand All @@ -86,9 +87,9 @@ Params: 28.04 M
==============================
```

:::{note}
```{note}
This tool is still experimental and we do not guarantee that the number is absolutely correct.
:::
```

You may use the result for simple comparisons, but double check it before you adopt it in technical reports or papers.

Expand Down Expand Up @@ -129,6 +130,97 @@ python tools/publish_model.py work_dirs/hrnet_w32_coco_256x192/latest.pth hrnet_

The final output filename will be `hrnet_w32_coco_256x192-{hash id}_{time_stamp}.pth`.

## Model Serving

MMPose supports model serving with [`TorchServe`](https://pytorch.org/serve/). You can serve an MMPose model via following steps:

### 1. Install TorchServe

Please follow the official installation guide of TorchServe: https://github.com/pytorch/serve#install-torchserve-and-torch-model-archiver

### 2. Convert model from MMPose to TorchServe

```shell
python tools/deployment/mmpose2torchserve.py \
${CONFIG_FILE} ${CHECKPOINT_FILE} \
--output-folder ${MODEL_STORE} \
--model-name ${MODEL_NAME}
```

**Note**: ${MODEL_STORE} needs to be an absolute path to a folder.

A model file `${MODEL_NAME}.mar` will be generated and placed in the `${MODEL_STORE}` folder.

### 3. Deploy model serving

We introduce following 2 approaches to deploying the model serving.

#### Use TorchServe API

```shell
torchserve --start \
--model-store ${MODEL_STORE} \
--models ${MODEL_PATH1} [${MODEL_NAME}=${MODEL_PATH2} ... ]
```

Example:

```shell
# serve one model
torchserve --start --model-store /models --models hrnet=hrnet.mar

# serve all models in model-store
torchserve --start --model-store /models --models all
```

After executing the `torchserve` command above, TorchServe runse on your host, listening for inference requests. Check the [official docs](https://github.com/pytorch/serve/blob/master/docs/server.md) for more information.

#### Use `mmpose-serve` docker image

**Build `mmpose-serve` docker image:**

```shell
docker build -t mmpose-serve:latest docker/serve/
```

**Run `mmpose-serve`:**

Check the official docs for [running TorchServe with docker](https://github.com/pytorch/serve/blob/master/docker/README.md#running-torchserve-in-a-production-docker-environment).

In order to run in GPU, you need to install [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). You can omit the `--gpus` argument in order to run in CPU.

Example:

```shell
docker run --rm \
--cpus 8 \
--gpus device=0 \
-p8080:8080 -p8081:8081 -p8082:8082 \
--mount type=bind,source=$MODEL_STORE,target=/home/model-server/model-store \
mmpose-serve:latest
```

[Read the docs](https://github.com/pytorch/serve/blob/072f5d088cce9bb64b2a18af065886c9b01b317b/docs/rest_api.md/) about the Inference (8080), Management (8081) and Metrics (8082) APis

### 4. Test deployment

You can use `tools/deployment/test_torchserver.py` to test the model serving. It will compare and visualize the result of torchserver and pytorch.

```shell
python tools/deployment/test_torchserver.py ${IMAGE_PAHT} ${CONFIG_PATH} ${CHECKPOINT_PATH} ${MODEL_NAME} --out-dir ${OUT_DIR}
```

Example:

```shell
python tools/deployment/test_torchserver.py \
ls tests/data/coco/000000000785.jpg \
configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w48_coco_256x192.py \
https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_256x192-b9e0b3ab_20200708.pth \
hrnet \
--out-dir vis_results
```

## Miscellaneous

### Print the entire config
Expand Down
Loading

0 comments on commit 46fd62f

Please sign in to comment.