[Doc] Update documentation on Tensorizer (#5471)

2025-10-20 14:53:52 +08:00 · 2024-06-14 14:27:57 -04:00
parent cdab68dcdb
commit 6e2527a7cb
3 changed files with 14 additions and 1 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -81,6 +81,7 @@ Documentation
   serving/env_vars
   serving/usage_stats
   serving/integrations
+   serving/tensorizer

 .. toctree::
   :maxdepth: 1
--- a/docs/source/serving/tensorizer.rst
+++ b/docs/source/serving/tensorizer.rst
@ -0,0 +1,12 @@
+.. _tensorizer:
+
+Loading Models with CoreWeave's Tensorizer
+==========================================
+vLLM supports loading models with `CoreWeave's Tensorizer <https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer>`_.
+vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
+at runtime extremely quickly directly to the GPU, resulting in significantly
+shorter Pod startup times and CPU memory usage. Tensor encryption is also supported.
+
+For more information on CoreWeave's Tensorizer, please refer to
+`CoreWeave's Tensorizer documentation <https://github.com/coreweave/tensorizer>`_. For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
+the `vLLM example script <https://docs.vllm.ai/en/stable/getting_started/examples/tensorize_vllm_model.html>`_.
--- a/vllm/engine/arg_utils.py
+++ b/vllm/engine/arg_utils.py
@ -230,7 +230,7 @@ class EngineArgs:
            '* "dummy" will initialize the weights with random values, '
            'which is mainly for profiling.\n'
            '* "tensorizer" will load the weights using tensorizer from '
-            'CoreWeave. See the Tensorize vLLM Model script in the Examples'
+            'CoreWeave. See the Tensorize vLLM Model script in the Examples '
            'section for more information.\n'
            '* "bitsandbytes" will load the weights using bitsandbytes '
            'quantization.\n')