vllm-ascend/vllm_ascend at 0740d10211fcdcaa00d29e8659d7de892f5ed4d8 - vllm-ascend - Gitea: Git for Me

frozenleaves/vllm-ascend

mirror of https://github.com/vllm-project/vllm-ascend.git synced 2025-10-20 13:43:53 +08:00

Files

History

rjg-lyh 47eaf622fe [v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs (#2659 )

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal.
At the same time, in engine v1 mode, the ascend scheduler is forcibly
enabled, and the `enable_chunked_prefill` specified by the user in
additional_config is disabled.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Signed-off-by: rjg-lyh <1318825571@qq.com>

2025-09-03 15:27:43 +08:00

..

[0.9.1][BUGFIX] FIX FIA input when mtp is enabled in pd Disaggregation scenario (#2509 )

2025-08-25 16:37:40 +08:00

[0.9.1][2/N][Feat] Restore paged attention kernel in Full Graph for performence (#1677 )

2025-07-11 15:51:50 +08:00

[bugfix] ascend schedule encountered an incorrect req block length in… (#2394 )

2025-08-16 18:32:29 +08:00

device_allocator

[bugfix] Improve log level and info for custom ops build (#937 )

2025-05-23 10:05:57 +08:00

[0.9.1-DEV][BUGFIX] BugFix: Resolve the issue of waiting queue accumulation when requests are canceled. (#2502 )

2025-08-23 19:36:25 +08:00

[0.9.1]eplb support qwen3-moe (#2000 )

2025-08-25 09:40:26 +08:00

[Bugfix] fix import error (#600 )

2025-04-22 08:57:25 +08:00

[0.9.1][Bugfix][Aclgraph] Fix qwen3-moe + aclgraph + tp (#2647 )

2025-09-01 11:38:54 +08:00

[0.9.1][BUGFIX] fix error info and adapt attn_metedata refactor (#2402 )

2025-08-19 09:25:49 +08:00

[0.9.1]eplb support qwen3-moe (#2000 )

2025-08-25 09:40:26 +08:00

rm ptach run_engine_core in dense case (#2665 )

2025-09-01 14:05:57 +08:00

[0.9.1][bugfix] Address abnormal VRAM increase in quantized models with floating-point MTP (#2554 )

2025-08-27 10:44:11 +08:00

[v0.9.1][RejectSampler][Perf] Optimize greedy reject sampler with vectorization. (#2002 )

2025-07-26 18:12:28 +08:00

[0.9.1][BUGFIX] [mtp][pd] FIX mtp torchair bug (#2610 )

2025-08-29 07:42:42 +08:00

__init__.py

[V0.9.1][BugFix] Fix the bug in decoraotor patch (#2199 )

2025-08-05 17:19:57 +08:00

ascend_config.py

[0.9.1][BUGFIX] fix mtp config bug (#2412 )

2025-08-18 16:49:34 +08:00

ascend_forward_context.py

[BUGFIX][v0.9.1] ep_group is not equal to word_size in some cases. (#1862 )

2025-07-18 15:37:58 +08:00

cpu_binding.py

[0.9.1][Feature] Add cpu binding for 091 (#2031 )

2025-07-31 09:59:57 +08:00

envs.py

[0.9.1-DEV][BUGFIX] BugFix: Resolve the issue of waiting queue accumulation when requests are canceled. (#2502 )

2025-08-23 19:36:25 +08:00

platform.py

[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs (#2659 )

2025-09-03 15:27:43 +08:00

soc_info.py

Disaggregate prefill for kv cache register style （merge into v0.9.1-dev） (#1296 )

2025-06-19 19:19:37 +08:00

utils.py

[0.9.1][bugfix] fix file not found error with shutil.rmtree (#2506 )

2025-08-23 19:38:29 +08:00