diff --git a/docs/source/index.rst b/docs/source/index.rst index c0250bf99f7ae..65bfbbabf8be1 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -73,6 +73,7 @@ Documentation serving/run_on_sky serving/deploying_with_kserve serving/deploying_with_triton + serving/deploying_with_bentoml serving/deploying_with_docker serving/serving_with_langchain serving/metrics diff --git a/docs/source/serving/deploying_with_bentoml.rst b/docs/source/serving/deploying_with_bentoml.rst new file mode 100644 index 0000000000000..4b9d19f5bdb72 --- /dev/null +++ b/docs/source/serving/deploying_with_bentoml.rst @@ -0,0 +1,8 @@ +.. _deploying_with_bentoml: + +Deploying with BentoML +====================== + +`BentoML `_ allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. You can serve the model locally or containerize it as an OCI-complicant image and deploy it on Kubernetes. + +For details, see the tutorial `vLLM inference in the BentoML documentation `_. \ No newline at end of file