vllm-ascend

mirror of https://github.com/vllm-project/vllm-ascend.git synced 2025-10-20 21:53:54 +08:00

Files

Slightwind 4f6d60eb06 [Feature] Add W4A4 Flat Quantization support (#3427 )

Introduce W4A4 Flat Quantization for better model compression and
inference efficiency on Ascend devices.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>

2025-10-13 23:20:16 +08:00

__init__.py

[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 )

2025-02-21 17:07:37 +08:00

quant_config.py

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

utils.py

[Feature] Add W4A4 Flat Quantization support (#3427 )

2025-10-13 23:20:16 +08:00

w4a4_flatquant_dynamic.py

[Feature] Add W4A4 Flat Quantization support (#3427 )