[BugFix] explicitly setting the tensor shape of otp output (#3027)

When MTP and oprojTP are enabled, it triggers the recompilation of the torchair graph, leading to a decrease in performance, and this PR fixes this issue. - vLLM version: v0.10.2 - vLLM main: 486c5599e3 --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com>
2025-11-11 22:57:57 +08:00 · 2025-09-24 18:44:15 +08:00
parent 360a736dfa
commit 4ee58e213b
1 changed files with 1 additions and 0 deletions
--- a/vllm_ascend/ops/linear_op.py
+++ b/vllm_ascend/ops/linear_op.py
@ -299,6 +299,7 @@ class OProjRowParallelOp(CustomRowParallelOp):

        # otp-specific: Combine partial results across devices
        output = self.comm_group.reduce_scatter(output_parallel, dim=0)
+        output = output.view(input_.shape[0], self.layer.output_size)

        # Handle bias return based on configuration
        output_bias = self.bias if self.skip_bias_add else None