[Docs] Fix typos in EP deployment doc (#24669)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-09-11 17:07:23 +01:00
committed by GitHub
parent 4984a291d5
commit 51d41265ad

View File

@ -158,10 +158,10 @@ vllm serve Qwen/Qwen3-30B-A3B \
### Memory Footprint Overhead
EPLB uses redundant experts to that need to fit in GPU memory. This means that EPLB may not be a good fit for memory constrained environments or when KV cache space is at a premium.
EPLB uses redundant experts that need to fit in GPU memory. This means that EPLB may not be a good fit for memory constrained environments or when KV cache space is at a premium.
This overhead equals `NUM_MOE_LAYERS * BYTES_PER_EXPERT * (NUM_TOTAL_EXPERTS + NUM_REDUNDANT_EXPERTS) ÷ NUM_EP_RANKS`.
For DeepSeekV3, this is approximately `2.4 GB` for one redundant expert per rank.
For DeepSeekV3, this is approximately `2.4 GB` for one redundant expert per EP rank.
### Example Command