**Background:**
There are two principles about operator registration in PyTorch
- The same namespace can be only registered once by `TORCH_LIBRARY`
- The operator signatures can be only registered once by `def`
Considering that all custom operators defined in the current repo are
only used by Ascend, instead of defining a common operator schema by
vLLM, all accelerators then follow this operator schema and complete the
implementation based on their respective hardware, which is conducive to
functional abstraction.
Therefore, we can rename the operator registration namespace to an
Ascend-specific namespace(**_C_ascend**).
Related ISSUE: https://github.com/vllm-project/vllm-ascend/issues/2742
- vLLM version: main
- vLLM main:
f592b3174b
Signed-off-by: FFFrog <ljw1101.vip@gmail.com>
### What this PR does / why we need it?
vLLM now names the process with VLLM prefix after
https://github.com/vllm-project/vllm/pull/21445, we should kill the
correct process name after one iteration benchmark to avoid OOM issue
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.10.1.1
- vLLM main:
e599e2c65e
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
This PR enabled pytest and yaml style accuracy test, users now can
enable accuracy test by running:
```bash
cd ~/vllm-ascend
pytest -sv ./tests/e2e/singlecard/models/test_lm_eval_correctness.py \
--config ./tests/e2e/singlecard/models/configs/Qwen3-8B-Base.yaml \
--report_output ./benchmarks/accuracy/Qwen3-8B-Base.md
pytest -sv ./tests/e2e/singlecard/models/test_lm_eval_correctness.py \
--config-list-file ./tests/e2e/singlecard/models/configs/accuracy.txt
```
Closes: https://github.com/vllm-project/vllm-ascend/issues/1970
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
- vLLM version: v0.10.0
- vLLM main:
2836dd73f1
---------
Signed-off-by: Icey <1790571317@qq.com>
### What this PR does / why we need it?
Currently our workflow run time takes about 3 hours in total, which
seriously affects the developer experience, so it is urgent to have a
optimization, after this pr, It is expected that the running time of the
full CI can be shortened to 1h40min.
- Enable linux-aarch64-a2 (64GB) to replace linux-arm64-npu (32GB)
- Change TP4 ---> TP2 * 2 max-parallel
- Move DeepSeek-V2-Lite-W8A8 to single card test
### Does this PR introduce _any_ user-facing change?
No
- vLLM version: v0.10.0
- vLLM main:
a2480251ec
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code
### How was this patch tested?
- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
This pr purpose to do the following things:
1. Remove `benchmark_datasets.py` patch
2. Increase the scheduler frequency to 2 times per day, due to the
recent large number of daily submissions, we need to increase the
default test time(6h)
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.9.2
- vLLM main:
247102f07f
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
Follow vllm-project/vllm lint way:
https://github.com/vllm-project/vllm/blob/main/.pre-commit-config.yaml
Enable pre-commit to avoid some low level error AMAP.
This pr is one step of #1241, The purpose is make linting system more
clear and convenient, on this step, Mainly did the following things:
yapf, actionlint, ruff, typos, isort, mypy, png-lint, signoff-commit,
enforce-import-regex-instead-of-re.
TODO:
- clang-format(check for csrc with google style)
need clean code, disable for now
- pymarkdown
need clean code, disable for now
- shellcheck
need clean code, disable for now
### Does this PR introduce _any_ user-facing change?
Only developer UX change:
https://vllm-ascend--1256.org.readthedocs.build/en/1256/developer_guide/contributing.html#run-lint-locally
```
pip install -r requirements-lint.txt && pre-commit install
bash format.sh
```
### How was this patch tested?
CI passed with new added/existing test.
Co-authored-by: Yikun [yikunkero@gmail.com](mailto:yikunkero@gmail.com)
Co-authored-by: wangli
[wangli858794774@gmail.com](mailto:wangli858794774@gmail.com)
- vLLM version: v0.9.1
- vLLM main:
5358cce5ff
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
Update accuracy test
1. remove accuarcy report on V0
2. add parallel and execution mode
3. add Qwen/Qwen3-30B-A3B and remove Qwen/Qwen2.5-7B-Instruct
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
Unify Model Usage via ModelScope
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
Since, `vllm bench` cli has optimized enough for production use(support
more datasets), we are now do not need to copy vllm codes, now , with
vllm installed, we can easily use the benchmark cli
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
Sometimes the performance benchmark workflow may fail. We hope to add a
prompt when the operation fails and not upload the dirty data of the
failed operation.
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
This PR add custom ascendc kernel vocabparallelembedding support in
vllm-ascend, related CMakeLists and setuptools is also added in this PR.
pytest -s benchmarks/ops/ben_vocabparallelembedding.py
pytest -s tests/ops/test_vocabparallelembedding.py
---------
Signed-off-by: ttanzhiqiang <389825161@qq.com>
### What this PR does / why we need it?
Make accuarcy CI and report work
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manaully review
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
This is a post patch of #1014, for some convenience optimization
- Set cached dataset path for speed
- Use pypi to install escli-tool
- Add benchmark results convert script to have a developer-friendly
result
- Patch the `benchmark_dataset.py` to disable streaming load for
internet
- Add more trigger ways for different purpose, `pr` for debug,
`schedule` for daily test, `dispatch` and `pr-labled` for manual testing
of a single(current) commit
- Disable latency test for `qwen-2.5-vl`, (This script does not support
multi-modal yet)
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
This PR enable accuracy test for PR labeled with "*accuracy-test" and
workflow_dispatch.
Only one model test running for each type test to reduce excution time.
- The dense test costs about `25mins` to complete (gsm8k 7mins, ~mmlu
3h24mins,~ cEval 18mins)
- The vl test costs about `40mins` to complete
In futute, we might consider enable all job test as nightly schedule
job.
Below is mainly changes:
- the dense/vl accuracy test will be triggered by lableling
`accuracy-test` and `ready-for-test`
- the dense accuracy test will be triggered by lableling
`dense-accuracy-test` and `ready-for-test`
- the vl accuracy test will be triggered by lableling `vl-accuracy-test`
and `ready-for-test`
- accuracy test will also be triggered by workflow_dispatch
- Support V1 and V0 for qwen and V0 for VL
For PR test we also generate summary in test summary.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- CI passed with accuracy-test label
- Preview:
https://github.com/vllm-project/vllm-ascend/actions/runs/15407628722?pr=1040
Closes: https://github.com/vllm-project/vllm-ascend/pull/953
---------
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
Add benchmark workflows
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Run locally
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
- Add qwen2.5-7b test
- Optimize the documentation to be more developer-friendly
Signed-off-by: xuedinge233 <damow890@gmail.com>
Co-authored-by: xuedinge233 <damow890@gmail.com>
### What this PR does / why we need it?
1. Provide accuracy test report for development branch release.
2. Models and datasets for accuracy test:
| Model | datasets |
|---------------------------- | --------------------------- |
| Qwen2.5-7B-Instruct | ceval-val, gsm8k, mmlu |
| Qwen3-8B | ceval-val, gsm8k, mmlu |
| Llama-3.1-8B-Instruct | ceval-val, gsm8k, mmlu |
| Qwen2.5-VL-7B-Instruct | mmmu_val |
### Does this PR introduce _any_ user-facing change?
This PR will display the accuracy test report of the release versionin
docs/source/developer_guide/accuracy_report。
Qwen2.5-7B-Instruct.md
Qwen3-8B.md
Llama-3.1-8B-Instruct.md
Qwen2.5-VL-7B-Instruct .md
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
The purpose of this PR is to add benchmark scripts for npu, developers
can easily run performance tests on their own machines with one line of
code .
---------
Signed-off-by: wangli <wangli858794774@gmail.com>