mirror of
https://github.com/vllm-project/vllm.git
synced 2025-10-20 14:53:52 +08:00
[docs] add SYS_NICE cap & security-opt
for docker/k8s (#24017)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Signed-off-by: Peter Pan <peter.pan@daocloud.io> Co-authored-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@ -194,3 +194,35 @@ vLLM CPU supports data parallel (DP), tensor parallel (TP) and pipeline parallel
|
||||
- Both of them require `amx` CPU flag.
|
||||
- `VLLM_CPU_MOE_PREPACK` can provides better performance for MoE models
|
||||
- `VLLM_CPU_SGL_KERNEL` can provides better performance for MoE models and small-batch scenarios.
|
||||
|
||||
### Why do I see `get_mempolicy: Operation not permitted` when running in Docker?
|
||||
|
||||
In some container environments (like Docker), NUMA-related syscalls used by vLLM (e.g., `get_mempolicy`, `migrate_pages`) are blocked/denied in the runtime's default seccomp/capabilities settings. This may lead to warnings like `get_mempolicy: Operation not permitted`. Functionality is not affected, but NUMA memory binding/migration optimizations may not take effect and performance can be suboptimal.
|
||||
|
||||
To enable these optimizations inside Docker with the least privilege, you can follow below tips:
|
||||
|
||||
```bash
|
||||
docker run ... --cap-add SYS_NICE --security-opt seccomp=unconfined ...
|
||||
|
||||
# 1) `--cap-add SYS_NICE` is to address `get_mempolicy` EPERM issue.
|
||||
|
||||
# 2) `--security-opt seccomp=unconfined` is to enable `migrate_pages` for `numa_migrate_pages()`.
|
||||
# Actually, `seccomp=unconfined` bypasses the seccomp for container,
|
||||
# if it's unacceptable, you can customize your own seccomp profile,
|
||||
# based on docker/runtime default.json and add `migrate_pages` to `SCMP_ACT_ALLOW` list.
|
||||
|
||||
# reference : https://docs.docker.com/engine/security/seccomp/
|
||||
```
|
||||
|
||||
Alternatively, running with `--privileged=true` also works but is broader and not generally recommended.
|
||||
|
||||
In K8S, the following configuration can be added to workload yaml to achieve the same effect as above:
|
||||
|
||||
```yaml
|
||||
securityContext:
|
||||
seccompProfile:
|
||||
type: Unconfined
|
||||
capabilities:
|
||||
add:
|
||||
- SYS_NICE
|
||||
```
|
||||
|
@ -48,6 +48,10 @@ docker run --rm \
|
||||
--dtype=bfloat16 \
|
||||
other vLLM OpenAI server arguments
|
||||
```
|
||||
|
||||
!!! tip
|
||||
An alternative of `--privileged=true` is `--cap-add SYS_NICE --security-opt seccomp=unconfined`.
|
||||
|
||||
# --8<-- [end:build-image-from-source]
|
||||
# --8<-- [start:extra-information]
|
||||
# --8<-- [end:extra-information]
|
||||
|
@ -89,6 +89,9 @@ docker run --rm \
|
||||
other vLLM OpenAI server arguments
|
||||
```
|
||||
|
||||
!!! tip
|
||||
An alternative of `--privileged true` is `--cap-add SYS_NICE --security-opt seccomp=unconfined`.
|
||||
|
||||
# --8<-- [end:build-image-from-source]
|
||||
# --8<-- [start:extra-information]
|
||||
# --8<-- [end:extra-information]
|
||||
|
@ -44,6 +44,7 @@ docker build -f docker/Dockerfile.cpu \
|
||||
# Launching OpenAI server
|
||||
docker run --rm \
|
||||
--security-opt seccomp=unconfined \
|
||||
--cap-add SYS_NICE \
|
||||
--shm-size=4g \
|
||||
-p 8000:8000 \
|
||||
-e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
|
||||
|
Reference in New Issue
Block a user