mirror of
https://github.com/vllm-project/vllm.git
synced 2025-10-20 14:53:52 +08:00
[Doc] Update documentation on Tensorizer (#5471)
This commit is contained in:
@ -81,6 +81,7 @@ Documentation
|
||||
serving/env_vars
|
||||
serving/usage_stats
|
||||
serving/integrations
|
||||
serving/tensorizer
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
12
docs/source/serving/tensorizer.rst
Normal file
12
docs/source/serving/tensorizer.rst
Normal file
@ -0,0 +1,12 @@
|
||||
.. _tensorizer:
|
||||
|
||||
Loading Models with CoreWeave's Tensorizer
|
||||
==========================================
|
||||
vLLM supports loading models with `CoreWeave's Tensorizer <https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer>`_.
|
||||
vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
|
||||
at runtime extremely quickly directly to the GPU, resulting in significantly
|
||||
shorter Pod startup times and CPU memory usage. Tensor encryption is also supported.
|
||||
|
||||
For more information on CoreWeave's Tensorizer, please refer to
|
||||
`CoreWeave's Tensorizer documentation <https://github.com/coreweave/tensorizer>`_. For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
|
||||
the `vLLM example script <https://docs.vllm.ai/en/stable/getting_started/examples/tensorize_vllm_model.html>`_.
|
@ -230,7 +230,7 @@ class EngineArgs:
|
||||
'* "dummy" will initialize the weights with random values, '
|
||||
'which is mainly for profiling.\n'
|
||||
'* "tensorizer" will load the weights using tensorizer from '
|
||||
'CoreWeave. See the Tensorize vLLM Model script in the Examples'
|
||||
'CoreWeave. See the Tensorize vLLM Model script in the Examples '
|
||||
'section for more information.\n'
|
||||
'* "bitsandbytes" will load the weights using bitsandbytes '
|
||||
'quantization.\n')
|
||||
|
Reference in New Issue
Block a user