[Doc] Update documentation on Tensorizer (vllm-project#5471)

neuralmagic · Jun 23, 2024 · 33edc9b · 33edc9b
1 parent d0a3026
commit 33edc9b
Show file tree

Hide file tree

Showing 3 changed files with 14 additions and 1 deletion.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -81,6 +81,7 @@ Documentation
    serving/env_vars
    serving/usage_stats
    serving/integrations
+   serving/tensorizer
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/source/serving/tensorizer.rst b/docs/source/serving/tensorizer.rst
@@ -0,0 +1,12 @@
+.. _tensorizer:
+
+Loading Models with CoreWeave's Tensorizer
+==========================================
+vLLM supports loading models with `CoreWeave's Tensorizer <https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer>`_.
+vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
+at runtime extremely quickly directly to the GPU, resulting in significantly
+shorter Pod startup times and CPU memory usage. Tensor encryption is also supported.
+
+For more information on CoreWeave's Tensorizer, please refer to
+`CoreWeave's Tensorizer documentation <https://github.com/coreweave/tensorizer>`_. For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
+the `vLLM example script <https://docs.vllm.ai/en/stable/getting_started/examples/tensorize_vllm_model.html>`_.
diff --git a/vllm/engine/arg_utils.py b/vllm/engine/arg_utils.py
@@ -234,7 +234,7 @@ def add_cli_args(
             '* "dummy" will initialize the weights with random values, '
             'which is mainly for profiling.\n'
             '* "tensorizer" will load the weights using tensorizer from '
-            'CoreWeave. See the Tensorize vLLM Model script in the Examples'
+            'CoreWeave. See the Tensorize vLLM Model script in the Examples '
             'section for more information.\n'
             '* "bitsandbytes" will load the weights using bitsandbytes '
             'quantization.\n')