mirror of
https://github.com/vllm-project/vllm.git
synced 2025-10-20 14:53:52 +08:00
[doc][faq] add warning to download models for every nodes (#5783)
This commit is contained in:
@ -35,4 +35,7 @@ To scale vLLM beyond a single machine, install and start a `Ray runtime <https:/
|
||||
$ # On worker nodes
|
||||
$ ray start --address=<ray-head-address>
|
||||
|
||||
After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines.
|
||||
After that, you can run inference and serving on multiple machines by launching the vLLM process on the head node by setting :code:`tensor_parallel_size` to the number of GPUs to be the total number of GPUs across all machines.
|
||||
|
||||
.. warning::
|
||||
Please make sure you downloaded the model to all the nodes, or the model is downloaded to some distributed file system that is accessible by all nodes.
|
||||
|
Reference in New Issue
Block a user