mirror of
https://github.com/vllm-project/vllm.git
synced 2025-10-20 14:53:52 +08:00
Support to serve vLLM on Kubernetes with LWS (#4829)
Signed-off-by: kerthcet <kerthcet@gmail.com>
This commit is contained in:
12
docs/source/serving/deploying_with_lws.rst
Normal file
12
docs/source/serving/deploying_with_lws.rst
Normal file
@ -0,0 +1,12 @@
|
||||
.. _deploying_with_lws:
|
||||
|
||||
Deploying with LWS
|
||||
============================
|
||||
|
||||
LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads.
|
||||
A major use case is for multi-host/multi-node distributed inference.
|
||||
|
||||
vLLM can be deployed with `LWS <https://github.com/kubernetes-sigs/lws>`_ on Kubernetes for distributed model serving.
|
||||
|
||||
Please see `this guide <https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/vllm>`_ for more details on
|
||||
deploying vLLM on Kubernetes using LWS.
|
@ -8,4 +8,5 @@ Integrations
|
||||
deploying_with_kserve
|
||||
deploying_with_triton
|
||||
deploying_with_bentoml
|
||||
deploying_with_lws
|
||||
serving_with_langchain
|
||||
|
Reference in New Issue
Block a user