vllm-ascend/examples at 03ca2b26ca9ab6b9a12f021b0595a726ee35e223 - vllm-ascend - Gitea: Git for Me

frozenleaves/vllm-ascend

mirror of https://github.com/vllm-project/vllm-ascend.git synced 2025-10-20 13:43:53 +08:00

Files

History

Chao Lei 03ca2b26ca [P/D] Mooncake Connector for v1 distributed (#1568 )

### What this PR does / why we need it?
This PR adopt Mooncake TransferEngine for kv cache register and
pull_blocks style disaggregate prefill implementation.

### Does this PR introduce any user-facing change?
No

### Dependencies
1. Cann Dependencies
Using Mooncake TransferEngine with Ascend Transport requires CANN
version 8.2.RC1 or higher.（see detail
Mooncake[#502](https://github.com/kvcache-ai/Mooncake/pull/502)）

2. vllm-ascend
This PR depends on changes introduced by #950 (modifications to
`model_runner_v1`) and #1361 (updates to `schedule`), both of which have
been merged into the `v0.9.1-dev` branch and are expected to land in
`main` shortly.

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
1c859a1387

---------

Signed-off-by: leichao.lc <leichao139636@163.com>
Co-authored-by: jianzs <zheng.shoujian@outlook.com>
Co-authored-by: zzy-ContiLearn <1831242919@qq.com>
Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: Dreamerleader <2270923832@qq.com>
Co-authored-by: chris668899 <15105191595@126.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>

2025-08-18 14:30:07 +08:00

..

disaggregated_prefill_v1

[P/D] Mooncake Connector for v1 distributed (#1568 )

2025-08-18 14:30:07 +08:00

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_data_parallel.py

[Misc] Add extra checking to torchair_graph_config. (#1939 )

2025-08-01 09:24:11 +08:00

offline_disaggregated_prefill_npu.py

[BugFix] update the kv transfer config (#2121 )

2025-08-01 08:56:55 +08:00

offline_dualbatch_overlap_npu.py

[main][refactor] Refactoring forward_context and model_runner_v1 (#1979 )

2025-07-28 14:06:20 +08:00

offline_embed.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_external_launcher.py

ut: add example and e2e test for sleepmode in external_launcher (#2152 )

2025-08-06 11:11:53 +08:00

offline_inference_audio_language.py

Change retrieving remote files to local retrieval. (#2141 )

2025-08-02 16:51:22 +08:00

offline_inference_npu_tp2.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_inference_npu.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

offline_inference_sleep_mode_npu.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

prompt_embedding_inference.py

[Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837 )

2025-07-17 14:13:30 +08:00

run_dp_server.sh

[Bug] Fix run bug in run_dp_server.sh (#2139 )

2025-08-02 16:52:12 +08:00