From 155d9f9d3661d08255576e10732ed8703b2b07b9 Mon Sep 17 00:00:00 2001 From: Sherlock113 Date: Tue, 12 Mar 2024 12:19:36 +0800 Subject: [PATCH 1/2] Add BentoML deployment doc Signed-off-by: Sherlock113 --- docs/source/index.rst | 1 + docs/source/serving/deploying_with_bentoml.rst | 8 ++++++++ 2 files changed, 9 insertions(+) create mode 100644 docs/source/serving/deploying_with_bentoml.rst diff --git a/docs/source/index.rst b/docs/source/index.rst index c0250bf99f7ae..65bfbbabf8be1 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -73,6 +73,7 @@ Documentation serving/run_on_sky serving/deploying_with_kserve serving/deploying_with_triton + serving/deploying_with_bentoml serving/deploying_with_docker serving/serving_with_langchain serving/metrics diff --git a/docs/source/serving/deploying_with_bentoml.rst b/docs/source/serving/deploying_with_bentoml.rst new file mode 100644 index 0000000000000..abc11f9422241 --- /dev/null +++ b/docs/source/serving/deploying_with_bentoml.rst @@ -0,0 +1,8 @@ +.. _deploying_with_bentoml: + +Deploying with BentoML +====================== + +BentoML allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. You can serve the model locally or containerize it as an OCI-complicant image and deploy it on Kubernetes. + +For details, see the tutorial `vLLM inference in the BentoML documentation `_. \ No newline at end of file From 9f444e2823448003cb275eb608d18e15862ea1e4 Mon Sep 17 00:00:00 2001 From: Sherlock113 Date: Tue, 12 Mar 2024 12:23:39 +0800 Subject: [PATCH 2/2] Add link Signed-off-by: Sherlock113 --- docs/source/serving/deploying_with_bentoml.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/serving/deploying_with_bentoml.rst b/docs/source/serving/deploying_with_bentoml.rst index abc11f9422241..4b9d19f5bdb72 100644 --- a/docs/source/serving/deploying_with_bentoml.rst +++ b/docs/source/serving/deploying_with_bentoml.rst @@ -3,6 +3,6 @@ Deploying with BentoML ====================== -BentoML allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. You can serve the model locally or containerize it as an OCI-complicant image and deploy it on Kubernetes. +`BentoML `_ allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. You can serve the model locally or containerize it as an OCI-complicant image and deploy it on Kubernetes. For details, see the tutorial `vLLM inference in the BentoML documentation `_. \ No newline at end of file