vllm-ascend

mirror of https://github.com/vllm-project/vllm-ascend.git synced 2025-10-20 21:53:54 +08:00

Files

yupeng 9f1e054fe3 [Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672 )

### What this PR does / why we need it?
Fix the LoRA accuracy issue that introduced by custom AscendC operator
"bgmv_shrink, sgmv_shrink, bgmv_expand, sgmv_epand".

The bug details are: 
- In the kernel function, if you want to call GlobalTensor.GetSize
method, you have to pass the second parameter of bufferSize when you
call GlobalTensor.SetGlobalBuffer first.
- Or GlobalTensor.GetSize method will return a random value.
- You can refer to [this
doc](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/81RC1alpha002/apiref/ascendcopapi/atlasascendc_api_07_00024.html).

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
pytest -sv tests/e2e/singlecard/test_ilama_lora.py
pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py

- vLLM version: v0.10.1.1
- vLLM main:
a344a5aa0a

---------

Signed-off-by: paulyu12 <paulyu0307@gmail.com>
Signed-off-by: paulyu12 <507435917@qq.com>
Co-authored-by: paulyu12 <paulyu0307@gmail.com>

2025-09-02 11:46:59 +08:00

bgmv_expand.cpp

[Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672 )

2025-09-02 11:46:59 +08:00

bgmv_shrink.cpp

[Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672 )

2025-09-02 11:46:59 +08:00

get_masked_input_and_mask_kernel.cpp

[Platform] Add initial experimental support for Altlas 300I series (#1333 )

2025-06-21 09:00:16 +08:00

pos_encoding_kernels.cpp

[Bugfix] Fix header include issue in rope (#2397 )