Support to serve vLLM on Kubernetes with LWS (#4829)

Signed-off-by: kerthcet <kerthcet@gmail.com>
2025-10-20 14:53:52 +08:00 · 2024-05-17 07:37:29 +08:00
parent 9a31a817a8
commit 8e7fb5d43a
2 changed files with 13 additions and 0 deletions
--- a/docs/source/serving/deploying_with_lws.rst
+++ b/docs/source/serving/deploying_with_lws.rst
@ -0,0 +1,12 @@
+.. _deploying_with_lws:
+
+Deploying with LWS
+============================
+
+LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads.
+A major use case is for multi-host/multi-node distributed inference.
+
+vLLM can be deployed with `LWS <https://github.com/kubernetes-sigs/lws>`_ on Kubernetes for distributed model serving.
+
+Please see `this guide <https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/vllm>`_ for more details on
+deploying vLLM on Kubernetes using LWS.
--- a/docs/source/serving/integrations.rst
+++ b/docs/source/serving/integrations.rst
@ -8,4 +8,5 @@ Integrations
   deploying_with_kserve
   deploying_with_triton
   deploying_with_bentoml
+   deploying_with_lws
   serving_with_langchain