feat: TorchServe support #34

njhill · 2022-09-22T20:26:55Z

Motivation

The Triton runtime can be used with model-mesh to serve PyTorch torchscript models, but it does not support arbitrary PyTorch models i.e. eager mode. KServe "classic" has integration with TorchServe but it would be good to have integration with model-mesh too so that these kinds of models can be used in distributed multi-model serving contexts.

Modifications

Add adapter logic to implement the modelmesh management SPI using the torchserve gRPC management API
Build and include new adapter binary in the docker image
Add mock server and basic unit tests

Implementation notes:

Model size (mem usage) is not returned from the LoadModel RPC but rather done separately in the ModelSize rpc (so that the model is available for use slightly sooner)
TorchServe's DescribeModel RPC is used to determine the model's memory usage. If that isn't successful it falls back to using a multiple of the model size on disk (similar to other runtimes)
The adapter writes the config file for TorchServe to consume

TorchServe does not yet support the KServe V2 gRPC prediction API (only REST) which means that can't currently be used with model-mesh. The native TorchServe gRPC inference interface can be used instead for the time being.

A smaller PR to the main modelmesh-serving controller repo will be opened to enable use of TorchServe, which will include the ServingRuntime specification.

Result

TorchServe can be used seamlessly with ModelMesh Serving to serve PyTorch models, including eager mode.

Resolves #4
Contributes to kserve/modelmesh-serving#63

Motivation The Triton runtime can be used with model-mesh to serve PyTorch torchscript models, but it does not support arbitrary PyTorch models i.e. eager mode. KServe "classic" has integration with TorchServe but it would be good to have integration with model-mesh too so that these kinds of models can be used in distributed multi-model serving contexts. Modifications - Add adapter logic to implement the modelmesh management SPI using the torchserve gRPC management API - Build and include new adapter binary in the docker image - Add mock server and basic unit tests Implementation notes: - Model size (mem usage) is not returned from the LoadModel RPC but rather done separately in the ModelSize rpc (so that the model is available for use slightly sooner) - TorchServe's DescribeModel RPC is used to determine the model's memory usage. If that isn't successful it falls back to using a multiple of the model size on disk (similar to other runtimes) - The adapter writes the config file for TorchServe to consume TorchServe does not yet support the KServe V2 gRPC prediction API (only REST) which means that can't currently be used with model-mesh. The native TorchServe gRPC inference interface can be used instead for the time being. A smaller PR to the main modelmesh-serving controller repo will be opened to enable use of TorchServe, which will include the ServingRuntime specification. Result TorchServe can be used seamlessly with ModelMesh Serving to serve PyTorch models, including eager mode. Resolves #4 Contributes to kserve/modelmesh-serving#63 Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

Motivation The Triton runtime can be used with model-mesh to serve PyTorch torchscript models, but it does not support arbitrary PyTorch models i.e. eager mode. KServe "classic" has integration with TorchServe but it would be good to have integration with model-mesh too so that these kinds of models can be used in distributed multi-model serving contexts. Modifications The bulk of the required changes are to the adapter image, covered by PR kserve/modelmesh-runtime-adapter#34. This PR contains the minimal controller changes needed to enable the support: - TorchServe ServingRuntime spec - Add "torchserve" to the list of supported built-in runtime types - Add "ID extraction" entry for TorchServe's gRPC Predictions RPC so that model-mesh will automatically extract the model name from corresponding request messages Note the supported model format is advertised as "pytorch-mar" to distinguish from the existing "pytorch" format that refers to raw TorchScript .pt files as supported by Triton. Result TorchServe can be used seamlessly with ModelMesh Serving to serve PyTorch models, including eager mode. Resolves #63 Signed-off-by: Nick Hill <[email protected]>

chinhuang007 · 2022-09-23T18:28:06Z

Looks like the new adapter is not part of unit tests, https://github.com/kserve/modelmesh-runtime-adapter/blob/main/scripts/run_tests.sh. Wonder if it is intentional.

njhill · 2022-09-23T18:44:09Z

@chinhuang007 that links to the main branch - you can see that it's added as part of this PR here.

chinhuang007

Looks good, thanks, @njhill !

kserve-oss-bot · 2022-09-23T22:53:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chinhuang007, njhill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [njhill]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chinhuang007 · 2022-09-23T23:22:22Z

/lgtm

Motivation The Triton runtime can be used with model-mesh to serve PyTorch torchscript models, but it does not support arbitrary PyTorch models i.e. eager mode. KServe "classic" has integration with TorchServe but it would be good to have integration with model-mesh too so that these kinds of models can be used in distributed multi-model serving contexts. Modifications The bulk of the required changes are to the adapter image, covered by PR kserve/modelmesh-runtime-adapter#34. This PR contains the minimal controller changes needed to enable the support: - TorchServe ServingRuntime spec - Add "torchserve" to the list of supported built-in runtime types - Add "ID extraction" entry for TorchServe's gRPC Predictions RPC so that model-mesh will automatically extract the model name from corresponding request messages Note the supported model format is advertised as "pytorch-mar" to distinguish from the existing "pytorch" format that refers to raw TorchScript .pt files as supported by Triton. Result TorchServe can be used seamlessly with ModelMesh Serving to serve PyTorch models, including eager mode. Resolves #63 Signed-off-by: Nick Hill <[email protected]>

#### Motivation Support for TorchServe was added in #250 and kserve/modelmesh-runtime-adapter#34. A test should be added for it as well. #### Modifications - Adds basic FVT for load/inference with a TorchServe MAR model using the native TorchServe gRPC API - Disables OVMS runtime and tests to allow TorchServe to be tested due to resource constraints #### Result Closes #280 Signed-off-by: Rafael Vasquez <[email protected]>

njhill added 2 commits September 22, 2022 13:12

Add mocks/unit tests

b6ec180

Signed-off-by: Nick Hill <[email protected]>

kserve-oss-bot added the do-not-merge/work-in-progress label Sep 22, 2022

kserve-oss-bot requested review from chinhuang007 and Tomcli September 22, 2022 20:26

kserve-oss-bot added the approved label Sep 22, 2022

njhill mentioned this pull request Sep 22, 2022

feat: TorchServe support kserve/modelmesh-serving#250

Merged

njhill marked this pull request as ready for review September 22, 2022 20:52

kserve-oss-bot removed the do-not-merge/work-in-progress label Sep 22, 2022

chinhuang007 approved these changes Sep 23, 2022

View reviewed changes

kserve-oss-bot assigned chinhuang007 Sep 23, 2022

kserve-oss-bot added the lgtm label Sep 23, 2022

kserve-oss-bot merged commit 9a61ddc into main Sep 23, 2022

njhill mentioned this pull request Nov 12, 2022

Add FVT for TorchServe runtime kserve/modelmesh-serving#280

Closed

rafvasq mentioned this pull request Dec 22, 2022

test: Add TorchServe FVT kserve/modelmesh-serving#294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: TorchServe support #34

feat: TorchServe support #34

njhill commented Sep 22, 2022

chinhuang007 commented Sep 23, 2022

njhill commented Sep 23, 2022

chinhuang007 left a comment

kserve-oss-bot commented Sep 23, 2022

chinhuang007 commented Sep 23, 2022

feat: TorchServe support #34

feat: TorchServe support #34

Conversation

njhill commented Sep 22, 2022

Motivation

Modifications

Result

chinhuang007 commented Sep 23, 2022

njhill commented Sep 23, 2022

chinhuang007 left a comment

Choose a reason for hiding this comment

kserve-oss-bot commented Sep 23, 2022

chinhuang007 commented Sep 23, 2022