vllm-ascend/vllm_ascend at c55d99d13eb5705836e51c1e19f463c17b5addaa - vllm-ascend - Gitea: Git for Me

frozenleaves/vllm-ascend

mirror of https://github.com/vllm-project/vllm-ascend.git synced 2025-10-20 13:43:53 +08:00

Files

History

linfeng-yuan c55d99d13e [bugfix][torchair] fix missing weight nz cast for w13_weight in torchair_w8a8_dynamic.py (#3446 )

### What this PR does / why we need it?
Fix the issue of missing NZ conversion for quantized weights in GMM
after moe_dispatch operator in torchair scenario, which does not involve
aclgraph & single scenarios.

### How was this patch tested?
vllm serving passed with lower latency (~5ms TPOT with bs_per_rank=28 &
ep_size=32)

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: linfeng-yuan <1102311262@qq.com>

2025-10-14 21:11:05 +08:00

..

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

fix pagedattention to support fullgraph. (#3436 )

2025-10-14 16:10:09 +08:00

[BugFix] Fix ascend scheduler assert error (#3191 )

2025-09-28 18:22:08 +08:00

device_allocator

[Misc]Clean up useless import from vllm (#2049 )

2025-07-28 16:01:59 +08:00

[BugFix] Fix the port conflict bug of running external dp with disaggregated-prefill. (#3416 )

2025-10-14 16:37:10 +08:00

Bugfix: Expose the user policy type interface (#3336 )

2025-10-11 16:28:57 +08:00

[Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153 )

2025-09-28 17:30:50 +08:00

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

[Quickfix] update CachedRequestState as NewRequestData changed (#2367 )

2025-08-15 07:35:27 +08:00

[2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203 )

2025-10-14 20:16:33 +08:00

[feat] support customized and separated hccl_buffer_size for process group initialization (#3073 )

2025-10-11 15:55:22 +08:00

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

Drop 0.10.2 (#3284 )

2025-10-09 10:28:38 +08:00

bugfix for mtp (#3300 )

2025-10-09 19:22:46 +08:00

[bugfix][torchair] fix missing weight nz cast for w13_weight in torchair_w8a8_dynamic.py (#3446 )

2025-10-14 21:11:05 +08:00

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

__init__.py

【bugfix】fix connector register failed (#3335 )

2025-10-09 21:09:54 +08:00

ascend_config.py

[2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203 )

2025-10-14 20:16:33 +08:00

ascend_forward_context.py

[2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203 )

2025-10-14 20:16:33 +08:00

envs.py

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00

meta_registration.py

Fix the bugs about operator registration by PyTorch Dispatcher (#2786 )

2025-09-13 11:58:52 +08:00

platform.py

[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 )

2025-10-10 16:31:20 +08:00

utils.py

[Feat] Unquantized Linear to nz and control all nz-cast (#3356 )

2025-10-14 17:39:26 +08:00