8a8eed2a7b
updated
...
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com >
2025-11-10 20:14:15 +00:00
9c84ca8293
[FA/Chore] Bump FA version for FP8 two-level accumulation ( #27889 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-10 12:06:04 -08:00
6d54336ae5
[Bugfix] Fix llguidance backend, rollback when EOS was encountered ( #25905 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-10 14:53:32 -05:00
34553b9d27
[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next ( #27492 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
2025-11-10 12:34:57 -05:00
b039bfda8f
[Bugfix] Fix persistent_masked_m_silu_mul_quant tests ( #28366 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-10 09:21:52 -08:00
d0e186c16f
[V0 Deprecation] Remove unused context_len and seq_len from M-RoPE ( #28395 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-11 00:30:06 +08:00
f080a83511
[RFC][ROCm][AITER] Keep all AITER kernels in _aiter_ops class like _custom_ops and _ipex_ops ( #24490 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-10 08:20:53 -08:00
40e2eeeb92
[Kernel] Optimization of the mm_k operator. ( #28280 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-10 16:03:46 +00:00
b06b9470ca
[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model ( #27474 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com >
2025-11-10 10:38:56 -05:00
4673e465ff
Add @tjtanaa to codeowner for ROCm and multi-modal ( #28360 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-10 21:39:17 +08:00
912744d066
[Fix] optimize visual token mask with caching and multi-token support ( #28374 )
...
Signed-off-by: Ferrebo <itachi971009@gmail.com >
Signed-off-by: kebo01 <kebo01@baidu.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 13:23:49 +00:00
15be507c86
[bugfix] fix siglip batch text output error ( #28365 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-11-10 21:21:15 +08:00
6f7de33bed
[Metrics] Refactor LoRA state tracking ( #26801 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-10 16:34:36 +08:00
a98cc35c34
Restore PlaMo2 unit test as pfnet/plamo-2-1b now supports transformers >=4.56 ( #28019 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
2025-11-10 06:50:02 +00:00
e8697faf03
[V0 deprecation] Remove no longer used get_metadata_cls ( #28370 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-10 14:32:09 +08:00
03fa4d3fb3
[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X ( #28373 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
Signed-off-by: Xiake Sun <xisun@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-10 04:53:40 +00:00
6b2b9fd934
[CI] lora/test_mixtral.py : Add additional expected outputs due to flakiness ( #28322 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-10 10:45:29 +08:00
c5f685b3ae
[ROCm][Platform] Add RX7900XTX device id in _ROCM_DEVICE_ID_NAME_MAP ( #28279 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-11-09 23:09:36 +00:00
c4768dcf47
[Kernel] Fix fused_gdn_gating ( #28343 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-09 14:26:35 -07:00
a65a934ebe
[CI/Build] Temporary fix to LM Eval Small Models ( #28324 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-09 21:08:38 +00:00
4a8d6bd168
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method ( #28214 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-11-09 19:11:46 +00:00
636efd10a5
[Core] Separate out attention metadata building logic from prepare inputs ( #26764 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-09 13:51:43 -05:00
289eb6c537
[Core] Simplify async KV output aggregation ( #28327 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-09 09:44:13 -08:00
19d91ece4b
[CI] Fix flaky test_eagle_correctness test ( #28364 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-09 16:04:59 +00:00
7ae5a5fb11
[Misc] Add some comments in qwen3-next ( #28267 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-08 23:59:24 -08:00
de2b78305f
[ROCm] Add env to enable/disable aiter triton gemm ( #28321 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-11-08 22:27:00 -08:00
e5e9067e61
[Misc] fix typo and add detailed log ( #28178 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-09 05:33:46 +00:00
3a7d580343
fix: close issue 28338 by fixed python version ( #28339 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-11-09 05:07:26 +00:00
05f8d69077
[chore] Move some wikimedia images to S3 ( #28351 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-09 01:58:26 +00:00
404d7a9d14
[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 ( #28345 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
2025-11-08 15:50:10 -07:00
171133f929
[Bugfix] Fix test fused quant layernorm tests ( #27865 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-08 14:31:33 -08:00
32787d0644
Remove setuptools upper bound constraint (<80) ( #28337 )
...
Signed-off-by: Cole Murray <colemurray.cs@gmail.com >
2025-11-08 22:30:18 +00:00
975676d174
[Feat] Drop-in Torch CUDA Profiler ( #27841 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-08 14:07:37 -08:00
77d702a22b
Enhance run_cluster.sh for multi-NIC support ( #28328 )
...
Signed-off-by: Ev Lacey <elacey@nvidia.com >
2025-11-08 22:04:16 +00:00
2108a571d7
[DCP] Support dcp kv_cache interleave size > 1 ( #26696 )
...
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com >
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Signed-off-by: Qiu <qiuchunshuo@huawei.com >
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com >
2025-11-09 04:45:27 +09:00
47604137a2
[Bugfix] Spec decode + structured output + spec model max len edge case ( #28298 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-11-08 19:44:25 +00:00
26990d25dc
[Bugfix] Update device name for H200 detection ( #28349 )
...
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-11-08 19:01:11 +00:00
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-08 12:15:08 +00:00
608bb14462
[Attention] Remove max cudagraph size limit of 992 ( #27840 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-11-07 22:33:27 -08:00
4a36681f85
[flashinfer][fix] do not check nvcc availability when using pre-downloaded cubins ( #27990 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-11-07 22:25:21 -08:00
d15afc1fd0
Refactor CPU/GPU extension targets for CMake build ( #28026 )
...
Signed-off-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com >
2025-11-08 14:17:35 +08:00
934a9c3b79
[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 ( #28101 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-08 05:01:27 +00:00
70af44fd10
[bugfix] support eagle with lora cudagraph specialization ( #28318 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-11-08 03:25:45 +00:00
781f5ebf52
Bump arctic-inference requirement ( #28174 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:31:18 -08:00
0852527647
[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM ( #28124 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-11-07 18:20:55 -08:00
61d25dc44b
Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) ( #28308 )
...
Signed-off-by: Hamid Mukhtar <15519013+hammmmy@users.noreply.github.com >
2025-11-08 02:09:21 +00:00
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
2025-11-08 01:58:22 +00:00
b158df2813
remove resolve_op_overloads and use splitting_ops directly ( #28081 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-08 01:13:13 +00:00
1aaecda078
[XPU] Enable Expert parallel for MoE models ( #28263 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-08 00:33:11 +00:00
811df41ee9
Update Flashinfer from v0.4.1 to v0.5.2 ( #27952 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-07 16:24:42 -08:00
67a2da890e
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) ( #28319 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-07 22:11:03 +00:00
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-07 20:01:23 +00:00
18903216f5
[Bugfix] Fix and add tests for GptOss reasoning parser ( #28000 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-07 19:28:04 +00:00
d0ceb38ae8
[Build] Fix release pipeline failing annotation ( #28272 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-07 10:06:45 -08:00
155ad56d7b
[doc] add guide about the provided PTX was compiled with an unsupported toolchain ( #28305 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-11-08 00:26:34 +08:00
5fb4137c99
[README] Add Arm CPUs to the list of supported targets ( #28290 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-11-07 15:41:47 +00:00
68a72a5cc1
Revert "[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )" ( #28289 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-11-07 15:07:01 +00:00
0f872b7977
[Log] update shm wait time msg ( #28255 )
2025-11-07 09:43:30 -05:00
4b1ff13221
[Feature] Default ignore_eos True for random dataset ( #28227 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-07 07:35:33 -05:00
e0d6b4a867
[CLI] add --max-tokens to vllm complete ( #28109 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
2025-11-07 12:21:40 +00:00
72b1c2ae2c
[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes ( #27439 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-11-07 04:18:39 -08:00
e0919f331d
[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU ( #28168 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-07 12:14:29 +00:00
8e19d470af
[fix] Revert "fixing mm placeholder replacement issue with gemma3" ( #28285 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-11-07 12:09:09 +00:00
1958bda9b4
[Misc][Model][Refactor] Pass the prefix into Linear layers ( #28259 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-11-07 19:38:38 +08:00
7bdb42b2f2
[CPU]Avoid repeated random sample compile ( #28260 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-07 11:03:57 +00:00
315068eb4a
[FixBug]Aeala/ShareGPT_Vicuna_unfiltered marked as multimodal benchmark ( #28265 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com >
2025-11-07 09:35:22 +00:00
ccd98b59c1
[Perf] Introduce FlattenLogprobs to store logprobs results to reduce GC overhead ( #28171 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-11-07 00:27:12 -08:00
21b82f4ea2
[Kernel] LoRA triton kernels support PDL ( #27402 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-07 08:05:48 +00:00
a736e5ff77
[CI] Reduce Blackwell Fusion test runtime by filtering tests and only run all tests in nightly ( #28074 )
2025-11-07 15:58:16 +08:00
9da9208b20
[Bug] Fix missing token_ids for reasoning parser models in chat completions #28246 ( #28256 )
2025-11-07 07:31:58 +00:00
11fd69dd54
[amd][gptoss] Perf gain because of block alignment ( #28024 )
...
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com >
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com >
2025-11-07 05:27:42 +00:00
c0a4b95d64
Fix issues from #28242 ( #28257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-07 04:23:17 +00:00
a47d94f18c
Add runai model streamer e2e test for GCS ( #28079 )
...
Signed-off-by: Alexis MacAskill <amacaskill@google.com >
2025-11-07 03:07:54 +00:00
e70fbc599b
[CI/Build] Loosen STT LoRA Translate Check (Flaky Test) ( #28247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Signed-off-by: Alex Brooks <alex.brooks@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-11-07 02:51:27 +00:00
4bf56c79cc
[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile ( #28242 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-11-07 00:16:03 +00:00
59b453eaa2
Speed up mm processor kwargs per request by spliting dynamic and static kwargs ( #26483 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
2025-11-07 07:51:28 +08:00
827e4237bc
Fix failing test for CRadio ( #27738 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wang.yuqi <noooop@126.com >
2025-11-06 15:32:25 -08:00
ca6f755d24
[BugFix] Fix FusedMoELoRA + ModularKernel Integration ( #28237 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-11-06 22:53:30 +00:00
ca90f50304
[Test] Add non-MoE DP test coverage ( #28235 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-06 20:59:57 +00:00
da855b42d2
[Doc]: Make extraInit containers fully configurable in helm chart ( #27497 )
...
Signed-off-by: Fang Han <fhan0520@gmail.com >
2025-11-06 20:27:16 +00:00
449de9001a
[ROCm] triton fp8 kernel ( #27058 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
2025-11-06 14:46:44 -05:00
d4aa65c998
[Chore] eliminate duplicated and unconditional object serialization in anthropic messages api ( #27792 )
...
Signed-off-by: Vico Chu <vico24826@gmail.com >
2025-11-06 19:09:19 +00:00
7a8375f8a0
Add llama 4 scaling support ( #28145 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-06 18:55:17 +00:00
5e0c1fe69c
[Structured outputs] Upgrade llguidance to 1.3.0 ( #28039 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-06 10:24:47 -08:00
4507a6dae4
CODEOWNERS: Add myself as reviewer on security docs ( #28216 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-11-06 17:39:42 +00:00
d1dd5f53e4
[Frontend] Fix logging format when enable response logging ( #28049 )
...
Signed-off-by: esmeetu <jasonailu87@gmail.com >
2025-11-06 16:25:39 +00:00
e52e4da971
[HARDWARE][CPU] Add Option for Disabling Binding to Specific CPU Cores ( #27953 )
...
Signed-off-by: Stan Hatko <stan_hatko@live.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2025-11-06 23:47:11 +08:00
2176778cd3
[Doc] Add Arm CPUs are on the list of supported targets in vLLM ( #26018 )
...
Signed-off-by: Milos Puzovic <milos.puzovic@arm.com >
2025-11-06 15:30:26 +00:00
0370679ce9
[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 ( #28200 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-11-06 07:29:46 -08:00
8816e375d3
[Docs] Switch to directory style URLs ( #28058 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-06 07:06:33 -08:00
f32229293e
Disable nm-testing models with issues in CI ( #28206 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-11-06 06:19:07 -08:00
c757a15f0f
[CPU]Improve cpu fused moe perf ( #27244 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-06 11:04:18 +00:00
59a50afa08
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony ( #26874 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-06 10:40:03 +00:00
981cadb35c
[Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty ( #28181 )
...
Signed-off-by: courage17340 <courage17340@163.com >
2025-11-06 17:52:13 +08:00
c3ee80a01a
[V0 deprecation]clean up is_v1_supported_oracle ( #28116 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-06 16:05:32 +08:00
3755c14532
[CPU] Enable torch profiling ( #28130 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com >
2025-11-06 07:32:05 +00:00
201dc98acc
Fix hard-coded parameter name in gemma3n.py ( #27946 )
...
Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com >
Signed-off-by: Biswa Panda <biswa.panda@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-11-05 23:07:36 -08:00
a404e2c0f1
Patch Mistral Tokenizer ( #28146 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-11-06 06:43:16 +00:00
e31946f86e
[flashinfer] fix FI all2all with FI cutlass moe ( #28166 )
...
Signed-off-by: Xiaozhu <mxz297@gmail.com >
2025-11-06 05:52:16 +00:00
bde5039325
[CI] Add compile/test_multimodal_compile.py to CI ( #28151 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-06 05:41:47 +00:00
d72299d47b
Make the cv2 dependency optional ( #27780 )
...
Signed-off-by: Jacob <cmpute@qq.com >
2025-11-06 05:08:55 +00:00
80679f108f
[Core][MM] Use non-blocking CPU-GPU copy of multimodal data ( #28141 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-11-06 04:05:12 +00:00
43ecd0a900
[Chore] Clean up deepseek v2/v3 config copy ( #28055 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-06 03:46:30 +00:00
07d614511f
[Misc] Remove the duplicate code ( #28111 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 21:07:47 -05:00
f948ab6945
[CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests ( #28170 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-06 01:22:13 +00:00
d71af5f502
[Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement ( #28164 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:21:08 -08:00
90189c71a9
[Bug] Fix env string "0" same to True ( #28159 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:04:20 -08:00
d79d9f0780
[Bug] Fix cpu disable shared_experts VLLM_DISABLE_SHARED_EXPERTS_STREAM ( #28157 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-11-05 17:03:09 -08:00
b6a248bdd7
[PERF] Decouple projections from GDN custom op. Attempt 2 ( #28083 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-05 17:01:12 -08:00
1767658559
[Debugging] Add annotation for easier trace analysis ( #22496 )
2025-11-05 16:52:52 -08:00
efe73e9b57
[Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token ( #25431 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-06 00:12:00 +00:00
0b8e871e5e
[CI/Build] Fix test_defaults_with_usage_context in AMD CI ( #27926 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 15:40:24 -08:00
5ee93a5956
[CI/Build] Update checking logic in cutlass_group_gemm_supported ( #27948 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 15:40:10 -08:00
e15601789b
[Feature]: Add corrupted request metric to V1 metrics system. ( #27306 )
...
Signed-off-by: atalhens <sneh.lata@nutanix.com >
2025-11-05 13:45:29 -08:00
65ac8d8dc4
[Docs] Add guide to debugging vLLM-torch.compile integration ( #28094 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-11-05 21:31:46 +00:00
ffb08379d8
[Chore] Remove Nemotron-Nano-VL config copy ( #28126 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-05 20:06:45 +00:00
e04492449e
[Hardware][IBM Z] Optimize s390x Dockerfile ( #28023 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2025-11-05 11:25:44 -08:00
518ec6b722
[Docs] Clean up README_TUNING.md ( #28088 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-11-05 19:01:34 +00:00
802748bddb
[Bugfix] Fix Qwen3-Reranker-8B load ( #28117 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-11-05 18:33:50 +00:00
faedbb4d4f
[Feature] Extend batch invariant torch.compile to B200 ( #27856 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
2025-11-05 10:04:49 -08:00
40db194446
[CI]: Add LMCacheConnector Unit Tests ( #27852 )
...
Signed-off-by: Samuel Shen <slshen@uchciago.edu >
Co-authored-by: Samuel Shen <slshen@uchciago.edu >
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu >
2025-11-05 09:45:57 -08:00
c765f0b443
[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell ( #27994 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-05 09:25:32 -08:00
002b07c4b2
[Bugfix] vLLM should check Inductor config for compile cache enablement status ( #27637 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2025-11-05 12:22:44 -05:00
752ddeacaa
[Core] add support for reasoning parser plugins ( #28075 )
...
Signed-off-by: walter beller-morales <walter.beller.morales@gmail.com >
2025-11-06 01:15:06 +08:00
c18f88c6ca
[Kernel] Fuse computation of g and beta for Gated Delta Net ( #28095 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-11-05 09:14:55 -08:00
6fd0df8132
[misc] add vLLM Beijing Meetup ( #28127 )
...
Signed-off-by: Jiaju Zhang <jjzhang@redhat.com >
2025-11-05 17:12:59 +00:00
3f5a4b6473
[Bugfix] Validate custom logits processor xargs for online serving ( #27560 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-05 16:53:33 +00:00
6cae1e5332
[ROCm][MLA] Support block-size > 1 for AITER MLA backend ( #27224 )
...
Signed-off-by: ganyi <ygan@amd.com >
Co-authored-by: wuhuikx <hattie.wu@amd.com >
2025-11-05 10:43:02 -05:00
80c9275348
Enabling cooperative multi-gpu tests on multi-gpu nodes ( #27986 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-11-05 10:35:49 -05:00
e50c454672
[BugFix] Support EP/DP + EPLB with MTP ( #25311 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2025-11-05 15:22:17 +00:00
5d16d0fa62
[DCP] check return_lse for all layers in dcp ( #27929 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-11-05 22:27:25 +08:00
0606bea2b6
add kimi reasoning parser ( #28128 )
...
Signed-off-by: wangzhengtao <wangzhengtao@msh.team >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team >
2025-11-05 21:48:33 +08:00
6e97eccf5d
[XPU] Enable custom routing functions in IPEX for Llama4 ( #28004 )
...
Signed-off-by: frost-intel <frost.mitchell@intel.com >
2025-11-05 13:39:57 +00:00
6ab183813c
[Graph Partition][Cache] Use inductor partition ops config ( #27702 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-05 13:04:48 +00:00
6b7a81185d
Bugfix: Cutlass FP8 FusedMoE bad scaling factors ( #27255 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-05 06:06:06 -05:00
b57789b62b
Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message ( #27635 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-11-05 19:03:51 +08:00
377061d481
[Misc] fix import error for DeepSeekR1ReasoningParser ( #28114 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 19:02:32 +08:00
86dca07d9b
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator ( #28011 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-11-05 10:36:31 +00:00
16b37f3119
[bugfix] fix wrong dcp_local_seq_lens calc ( #27518 )
...
Signed-off-by: Qiu <qiuchunshuo@huawei.com >
2025-11-05 17:58:13 +08:00
0976711f3b
[Refactor] to simplify and extract the shared logic between chat completion and responses ( #27961 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 15:46:39 +08:00
e261d37c9a
[Refactor] Lazy-loaded reasoning_parser ( #28092 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-05 15:37:02 +08:00
b7cbc25416
[Model, Core] Support Granite Speech & LoRA for STT ( #24455 )
2025-11-05 08:33:48 +01:00
d43ad5a757
[BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) ( #28100 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-05 14:54:43 +08:00
0ff05e3770
[Bugfix] Fix encoder-only model support for transformers backend ( #28021 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-04 22:24:41 -08:00
428bc7bf1c
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules ( #27955 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-11-04 20:51:16 -08:00
878fd5a16f
[CI/Build] Enable some fixed tests in AMD CI ( #28078 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-05 03:15:59 +00:00
18b39828d9
[XPU] Add gpt-oss model support for Intel GPU ( #27786 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-05 02:17:23 +00:00
4ea62b77f5
[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 ( #27740 )
2025-11-05 09:25:09 +08:00
d4e547bb7e
Revert "[PERF] Decouple projections from GDN custom op" ( #28080 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-04 15:58:23 -08:00
2d977a7a9e
[ROCm] gemm_a16w16 upstreaming ( #26969 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-11-04 16:01:00 -05:00
1fb4217a05
[Multimodal] Make MediaConnector extensible. ( #27759 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-11-04 18:28:01 +00:00
611c86ea3c
Added disable rule to track files under benchmarks/lib ( #28048 )
...
Signed-off-by: Nadav Kluger <nadav.k@fmr.ai >
2025-11-04 18:18:43 +00:00
dc937175d4
[ROCm][Perf] New design on ROCm AITER MHA backend Implementation ( #25763 )
...
Signed-off-by: ganyi <ygan@amd.com >
2025-11-04 18:05:33 +00:00
2f1cc8cef1
Remove deprecated --rope-scaling and --rope-theta ( #28006 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-04 18:01:56 +00:00
938a81692e
[AsyncScheduling] Don't schedule past request max_tokens ( #27922 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 17:06:28 +00:00
c9f66da8fd
[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 08:33:55 -08:00
05cae69f0f
[model] Add support for openPangu_Ultra_MoE ( #27521 )
...
Signed-off-by: yuantao <2422264527@qq.com >
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 08:17:20 -08:00
5fd8f02ea9
[PERF] Decouple projections from GDN custom op ( #27512 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-11-04 08:11:41 -08:00
97e3dda84b
[Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM ( #27284 )
...
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com >
Co-authored-by: Faqin Zhong <zhofaqin@amazon.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-04 07:49:25 -08:00
5a0a6dfd55
[BugFix] Fix incorrect preallocated sampled_token_ids tensor size ( #28025 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-04 07:38:16 -08:00
938772af03
[Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. ( #27123 )
2025-11-04 21:59:45 +08:00
e4ee658672
[Model] add optimal triton fused moe configs for NemotronH MoE ( #27967 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-04 12:59:43 +00:00
77f8001f53
[Model][Bugfix] fix pipeline parallelism support for NemotronH ( #27968 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-11-04 12:28:36 +00:00
300a265978
[Core] Enable StatLogger in LLMEngine ( #28020 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-11-04 04:13:35 -08:00
03c4c4aa9d
Support using Int4PreshuffledTensor after loading ( #26066 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-11-04 06:00:57 -05:00
2ec401bc39
Load tuned fused_moe_lora shrink and expand kernel configs separately ( #27435 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 18:27:35 +08:00
4022a9d279
[BugFix][Performance] Restore flashinfer autotuning for all scenarios ( #27904 )
2025-11-04 15:56:21 +08:00
53f6e81dfd
[CI/Build] Fix OpenAI API correctness on AMD CI ( #28022 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-04 07:20:50 +00:00
43a6acfb7d
[Model] fix ernie45 reasoning_parser ( #27973 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-11-04 07:16:46 +00:00
58279c60b5
[KV Connector] Make KVCacheConfig an explicit constructor argument ( #27887 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-03 23:00:49 -08:00
2f84ae1f27
[CI/Build] Update LM Eval Version in AMD CI ( #27944 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-11-04 06:36:40 +00:00
f32cbc9a0c
[CPU]Improve dynamic 4bit moe performance ( #27240 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-11-04 06:33:23 +00:00
7e4be74104
[Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) ( #27884 )
2025-11-04 14:05:55 +08:00
380ba6816d
[Metrics] Enable sleep state metric outside of dev mode ( #27867 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-11-03 20:35:36 -08:00
14a125a06d
[NIXL][XPU] Pin NIXL version to 0.7.0 ( #27849 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-11-04 03:28:35 +00:00
c02fccdbd2
[Refactor] Lazy import tool_parser ( #27974 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-11-04 10:10:10 +08:00
6ddae74054
[LoRA] Lora shrink swizzle ( #27694 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-04 09:30:20 +08:00
b13a447546
[Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm ( #27748 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-11-03 17:12:19 -08:00
7956b0c0bc
Remove the tpu docker image nightly build. ( #27997 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-11-04 00:35:54 +00:00
3758757377
[Bugfix] Fix MoE Routing Simulation ( #28002 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-11-03 22:26:49 +00:00
ccd3e55e51
[Bugfix][plugin] fla crash on plugin ( #27322 )
2025-11-04 05:27:03 +08:00
01baefe674
Add TP parameter to attention tests ( #27683 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-03 13:04:40 -08:00
786030721e
[Docs] add runai_streamer_sharded to LoadConfig ( #27937 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-11-03 20:35:16 +00:00
145c00a4d3
[Bugfix] change FlashMLA reorder_batch_threshold ( #27777 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-03 15:17:10 -05:00
55011aef24
[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile ( #27764 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-11-03 11:12:15 -08:00
a4398fbb5e
[Feature][Benchmarks] Support inf burstiness ( #26941 )
...
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com >
2025-11-03 18:33:17 +00:00
2c19d96777
[Spec Decode] Integrate Suffix Decoding from Arctic Inference ( #25784 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com >
2025-11-03 09:23:31 -08:00
4bc400f47e
[CI/Testing] Add basic single node dual batch overlap test ( #27235 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-03 17:00:46 +00:00
cac4c10ef0
[BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile ( #27616 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-11-03 11:13:51 -05:00
f7d2946e99
[Bugfix] Skip gs:// model paths for speculator detection ( #27846 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
2025-11-03 14:31:03 +00:00
294c805f1d
Early exit for MoE LoRA kernels ( #27131 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-03 20:22:17 +08:00
40b69e33e7
[Model] Add PaddleOCR-VL Model Support ( #27758 )
...
Signed-off-by: zhangyue <zhangyue66@baidu.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: zhangyue66 <zhangyue66@baidu.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-03 19:04:22 +08:00
32257297dd
[CI/Build] Remove the flaky gpt-oss lora test ( #27966 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-03 16:50:06 +08:00
ba464e6ae2
Add ORCA endpoint load metrics support ( #24905 )
...
Signed-off-by: Misha Efimov <mef@google.com >
2025-11-03 08:21:31 +00:00
7f4bdadb92
[XPU]Refine Dockerfile.xpu, avoid oneccl dependency issue ( #27964 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-03 07:36:59 +00:00
cec7c28833
[Bugfix] Padded Eagle Specdec with Chunked Prefill ( #26263 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai >
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2025-11-03 02:22:46 -05:00
18961c5ea6
[Hybrid] Pass kernel block size to builders ( #27753 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-11-03 05:48:03 +00:00
470ad118b6
[Frontend] Align finish_reason when tool is called with OpenAI ( #25054 )
...
Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-11-03 04:21:18 +00:00
1bf43ae35d
[BugFix][LoRA] use adapter_id instead of id field of lora_request ( #27728 )
...
Signed-off-by: Biswa Panda <biswa.panda@gmail.com >
2025-11-03 10:08:08 +08:00
0ce743f4e1
Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 ( #27420 )
...
Signed-off-by: vensenmu <vensenmu@gmail.com >
2025-11-02 16:24:01 +00:00
6c317a656e
[Misc] Provide Siglip2 chat template ( #27939 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-02 13:42:38 +00:00
00b31a36a2
[V1] [Hybrid] Mamba1 Automatic Prefix Caching ( #26377 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-11-02 04:16:23 -08:00
73444b7b56
Performance fix MistralTokenizer: cache special ids and tokens ( #27925 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-11-02 08:48:33 +00:00
853a8eb53b
[Bugfix] Fix Qwen Omni audio inference ( #27920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-02 05:06:05 +00:00
758ea2e980
[CI/Build] Fix flaky test_transcription_validation.py::test_basic_audio_gemma ( #27924 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-11-02 03:45:02 +00:00
685c99ee77
[KV offload] Offloading connector async scheduling support ( #27648 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-11-01 21:08:56 +00:00
1e88fb751b
Adds anthropic /v1/messages endpoint to openai api_server ( #27882 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
2025-11-01 12:45:42 -07:00
c2ed069b32
[BugFix] Fix mixed penalties batch with async scheduling ( #27910 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 10:51:24 -07:00
af6e19f50f
[Core][TPU] Support TPU Data Parallalism ( #27365 )
...
Signed-off-by: wenxindongwork <wenxindong@google.com >
2025-11-01 17:14:44 +00:00
99d69af9ec
[Bugfix] Python 3.10 compatibility for Self ( #27918 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-01 15:28:54 +00:00
d811b442d3
[Bugfix] DeepSeek V3.2 MTP metadata & CUDA graph issues ( #26779 )
...
Signed-off-by: xiaohajiayou <923390377@qq.com >
2025-11-01 10:52:43 -04:00
30a14b034f
[V0 deprecation] Remove VLLM_USE_V1 usage in platform and v1 module ( #27798 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 10:17:45 +00:00
799ce45cc1
[Docs] Mock all imports for docs ( #27873 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 10:02:23 +00:00
2c0c7c39bd
feat(benchmarks): support HF model names in multi-turn benchmark ( #27850 )
2025-11-01 08:04:52 +00:00
e675118849
[Add] cmdline argument parsing for KV cache offloading modules ( #27621 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-01 07:17:07 +00:00
e2347dbf58
[Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration ( #27895 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-11-01 13:45:23 +08:00
879a06579e
[CI/Build] Bump transformers version ( #27528 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 22:11:07 -07:00
29de3cdee4
Adding SplitK in fused_moe_lora kernel ( #27818 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 12:55:46 +08:00
7e2729b57e
[Multimodal][XPU]Enable vision attn backend for xpu platform ( #27525 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Yejing Lai <yejing.lai@intel.com >
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2025-11-01 04:45:02 +00:00
3a5de7d2d6
[Bugfix] Fix KDA output ( #27905 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 11:54:36 +08:00
bc4486d609
[Kernel] Enable FusedMoEModularKernel support bias ( #27754 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-01 02:05:12 +00:00
0cdbe7b744
[Core] Async scheduling + structured outputs compatibility ( #26866 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-11-01 00:35:04 +00:00
df334868ca
[Hybrid] A simpler algorithm to find kernel_block_size ( #26476 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-31 21:30:28 +00:00
0e0a638c3b
Batch invariance doc ( #27839 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-31 17:22:19 -04:00
f29aeb5a25
Add FLASHINFER_MLA to test_mla_backends and add B200 CI run ( #27663 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-31 11:12:19 -07:00
5e8862e9e0
[Feature] Pydantic validation for scheduler.py and structured_outputs.py ( #26519 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 18:05:50 +00:00
9e5bd3076e
[Cleanup] Remove no-longer-used SpeculativeConfig.enable_chunked_prefill ( #27826 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-31 10:57:45 -07:00
fc16f1c477
Flashinfer_CUTLASS_MOE fuses quantization for TP ( #27223 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com >
2025-10-31 17:54:29 +00:00
bc306fe5e9
fix incorrect type annotation in KimiMLP ( #27885 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-31 17:38:02 +00:00
103a468bbf
[bugfix] Missing cached item in beam search ( #27874 )
...
Signed-off-by: fake0fan <645327136@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-31 17:34:27 +00:00
70bfbd7b16
Docs update tpu install instructions ( #27824 )
...
Signed-off-by: Rob Mulla <rob.mulla@gmail.com >
Signed-off-by: Rob Mulla <RobMulla@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-31 10:29:55 -07:00
d6517be3cd
[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node ( #26338 )
...
Signed-off-by: Guan Luo <gluo@nvidia.com >
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com >
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-31 10:16:00 -07:00
7e06c40e63
[Bugfix] Fix broken MRoPE for GLM-4.1V/GLM-4.5V ( #27860 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-31 17:04:51 +00:00
675704ac01
[Bugfix] Allow 64-bit integer values for LoRA IDs to avoid overflow/truncation ( #27876 )
...
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com >
2025-10-31 16:58:42 +00:00
0384aa7150
[CI/Build] Add gpt-oss LoRA test ( #27870 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-31 22:17:21 +08:00
3857eb8725
[Perf] Decouple torch op from GDA to leverage torch.compile ( #27871 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-31 21:35:52 +08:00
933cdea440
[BugFix] Don’t compute reorder threshold when there are no attention groups ( #27861 )
2025-10-31 11:36:18 +00:00
3933f18a5e
[Bugfix] Avoid too small block m/n for FlexAttention kernel option ( #27853 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-31 19:33:12 +08:00
e5ef4dfc11
[Kimi-Linear] Correct prefixes and add compatibility to AWQ quants ( #27834 )
...
Signed-off-by: toncao <cpatonn@gmail.com >
Co-authored-by: toncao <cpatonn@gmail.com >
2025-10-31 17:36:37 +08:00
36960501d3
[Hardware][Powerpc] Fix VLLM_CPU_OMP_THREADS_BIND="auto" low CPU utilization for Power ( #27734 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-31 07:45:26 +00:00
b2e65cb4a7
[benchmark] Make request IDs unique across clients by default ( #27723 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-10-30 17:40:35 -07:00
2bf0bcc1fc
[CI Test] Add Scheduled Integration Test ( #27765 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 17:29:26 -07:00
697f507a8e
[CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 ( #26919 )
...
Signed-off-by: jakub-sochacki <jakub.sochacki@wp.pl >
2025-10-31 07:57:22 +08:00
d5d2a0fe74
[Misc] Make all tool scripts executable ( #27831 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-30 23:46:02 +00:00
c9791f1813
[BugFix] Fix broken import in initialize_ray_cluster() ( #27838 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-30 16:26:13 -07:00
e7acb20076
[Feature] Batch invariant torch.compile ( #27660 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 13:11:29 -07:00
4b68c4a55b
[Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty ( #27799 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 19:47:30 +00:00
a8141fa649
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK ( #27750 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 15:32:39 -04:00
4917002523
[Fix] Skip record_sleep_state logic in PrometheusStatsLogger if not in dev mode ( #27789 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2025-10-30 19:26:27 +00:00
a2981c4272
[EP/DP][API Server] Enable DP-aware routing in OpenAI API requests ( #24945 )
...
Co-authored-by: Cong Chen <prowindy@gmail.com >
2025-10-30 12:10:16 -07:00
4574d48bab
[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index ( #27629 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 11:52:36 -07:00
ab98f6556f
[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) ( #27811 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-30 11:52:18 -07:00
2918c1b49c
[Model] Use the same fused_moe configs for all H200 devices ( #23642 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com >
2025-10-30 17:36:56 +00:00
1004205795
[MTP] Refactor mtp predictor to avoid d2h operation ( #27643 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-30 17:27:39 +00:00
ba33e8830d
Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27768 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-30 10:22:30 -07:00
33a0ea5f32
[Docs] add Shanghai Meetup - 2025/10 ( #27545 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com >
2025-10-31 00:33:13 +08:00
60f76baa66
[Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices ( #27564 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-30 11:41:44 -04:00
e5e076cad7
[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP ( #27762 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-30 08:24:31 -07:00
eebf00cb0c
[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend ( #27800 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-30 15:12:05 +00:00
9956aae4ea
[Model][Ouro] Support Ouro Model ( #27794 )
...
Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-30 22:34:41 +08:00
0fe0140408
[KV offload] Enable CPU KV offload on CUDA alike Platforms ( #27770 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-30 22:10:29 +08:00
4e68cc9b6a
[Model] Introduce Kimi Linear to vLLM ( #27809 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com >
2025-10-30 21:02:27 +08:00
1994de99ea
[CI Failure] Fix test_kv_cache_model_load_and_run ( #27717 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 12:27:53 +00:00
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-30 12:13:05 +00:00
74374386e2
[Bugfix] Improve GPU validation logging in Ray fallback scenarios ( #25775 )
...
Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com >
2025-10-30 11:57:59 +00:00
c01f6e525f
[CI] Fix mypy for vllm/v1/core and vllm/v1/engine ( #27108 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 11:32:17 +00:00
c7d2a554ba
[CI Failure] fix test_default_mm_loras ( #27795 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 18:13:03 +08:00
af826e0820
[V0 deprecation] Remove VLLM_USE_V1 usage in config module ( #27784 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-30 09:42:49 +00:00
e806178d2a
[BugFix][VL] Fix FA selection on Qwen2.5-VL ( #27790 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-30 07:54:44 +00:00
5be1bed790
[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 ( #27113 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 07:50:56 +00:00
31b55ffc62
use stringData in secret yaml to store huggingface token ( #25685 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
2025-10-30 00:47:36 -07:00
ded8ada86a
Add more dims for batch invariant shims ( #27489 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-30 05:28:45 +00:00
8bff831f0a
[Benchmark] Cleanup deprecated nightly benchmark and adjust the docstring for performance benchmark ( #25786 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-10-30 04:43:37 +00:00
b5d70751d8
[BugFix] Reordering extend logic fix ( #27739 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-29 21:39:34 -07:00
b8c48c5d72
kernels/moe test pruning ( #27053 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 12:10:34 +08:00
17d055f527
[Feat] Adds runai distributed streamer ( #27230 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
Co-authored-by: omer-dayan <omdayan@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-29 21:09:10 -07:00
2ce5c5d3d6
[BugFix] Handle unscheduled requests properly when async scheduling ( #27756 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 21:04:25 -07:00
b5bae42f91
[XPU] Update latest IPEX 2.8 release ( #27735 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-10-30 11:17:13 +08:00
d7fb10c574
[Bugfix] mamba-block-size is set for vision language model ( #27773 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-29 19:39:57 -07:00
b798e39f93
[XPU][bugfix] fix rope for llama4 and deepseek ( #25145 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-10-30 09:43:13 +08:00
48eb8eba58
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. ( #27760 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 23:17:48 +00:00
b5d90f7400
[Bug] Fix DBO IMA issue for DeepEPHT ( #27666 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 16:28:27 -04:00
d4aa144343
[BugFix] Fix handling of resumed reqs in SharedStorageConnector ( #27719 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 20:16:52 +00:00
fcb1d570bb
[Bug] Fix DeepEP low latency assert self.batched_router_logits.size(-1) == full_router_logits.size(-1) Bug ( #27682 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 14:50:39 -04:00
accb8fab07
[KVConnector] Add metrics to Prometheus-Grafana dashboard ( #26811 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-10-29 18:44:49 +00:00
5b0448104f
[Bug] Raise error explicitly if using incompatible backend ( #27424 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 13:29:20 -04:00
f7a6682872
[CI/Build] Test torchrun with 8 cards ( #27548 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-29 10:26:06 -07:00
a9fe0793f2
use_aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#27698 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-29 17:08:54 +00:00
7568a282b9
[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA ( #27744 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-29 16:55:35 +00:00
1da3309ace
[Core] Exposing engine sleep & wake_up state as prometheus metrics ( #24176 )
...
Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com >
2025-10-29 09:32:01 -07:00
5522fb274b
[Chore] Optimize P2PNCCLEngine http_address ( #27488 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 00:05:09 +08:00
0f95a1c3f2
[CI] Fix flaky test_two_responses_with_same_prev_id test ( #27745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-29 15:10:35 +00:00
ded24e3e54
[ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP ( #27623 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
2025-10-29 14:44:03 +00:00
d6704dd099
Fix MiniMax-M2 rmsnorm precision and remove useless code ( #27627 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-29 21:01:05 +08:00
ecca3fee76
[Frontend] Add vllm bench sweep to CLI ( #27639 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-29 05:59:48 -07:00
9a0d2f0d92
[CI/Build] Skip cpu offloading test on AMD ( #27690 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 12:55:51 +00:00
ad3ec89532
[VLM] Add Qwen3-VL generation test ( #25185 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 12:19:37 +00:00
3481e40743
[chore] Remove models weight on S3 logic ( #27725 )
...
Signed-off-by: kevin <kevin@anyscale.com >
2025-10-29 10:29:49 +00:00
5e72216d17
Feature/video support in random mm dataset ( #25963 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 18:24:52 +08:00
1a33aacf82
[Misc] Raise error for missing video metadata in MultiModalDataParser ( #27664 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-29 10:06:42 +00:00
7ba6aa8f56
[Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration ( #27670 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
2025-10-29 10:03:54 +00:00
ab2eb27b74
[Frontend] [gpt-oss] Mcp type bug ( #27689 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 10:01:32 +00:00
3c7fefdeba
[Frontend] [gpt-oss] Tool json call parsing error retry ( #27675 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 09:42:44 +00:00
1891cf605a
[Bugfix] Fix modular kernel tests ( #27707 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-29 16:14:33 +08:00
8df98c2161
[perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next ( #27578 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-29 08:12:54 +00:00
4fb8771cc0
[CI/Build] Move pre-commit only scripts to tools/pre_commit ( #27657 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-29 08:04:33 +00:00
413ef7a3b4
[Speculators] Move tests + fix integration ( #27308 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Signed-off-by: rahul-tuli <rtuli@redhat.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-10-29 00:54:21 -07:00
8b62495076
[Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl ( #27605 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 00:00:15 -07:00
83fd49b1fc
[CI/Build][Bugfix]Fix Quantized Models Test on AMD ( #27712 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 06:27:30 +00:00
a4a4f0f617
[KV Connector] Update lmcache connector with latest compatibility ( #27681 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-29 05:38:37 +00:00
0d8161b075
[Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes ( #27705 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 05:28:20 +00:00
d2c33c397a
[NIXL][XPU] update name of nixl wheel ( #27631 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-10-29 12:43:29 +08:00
f6d5f5888c
[Build] Revert triton_kernels requirements ( #27659 )
2025-10-28 21:07:09 -07:00
9007bf57e6
Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27714 )
2025-10-28 20:58:01 -07:00
f257544709
Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 ( #27598 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 19:39:15 -07:00
0b51c9bd8b
[Core] Early return in SlidingWindowManager.remove_skipped_blocks ( #27673 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-29 01:32:33 +00:00
d3ab240f39
[Bug] Fix deepep low latency use nvlink by default ( #27677 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 23:53:12 +00:00
94666612a9
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model ( #23207 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com >
2025-10-28 22:36:43 +00:00
4fe5895361
[AsyncScheduling] Make async overlap work with logprobs ( #27615 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 22:35:54 +00:00
111faf1118
[Core] Scheduler: Publish connector events after output ( #25875 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-10-28 21:01:33 +00:00
6afc28a9ba
[Test] Batch Invariant: Unit test using parameterized backend ( #27478 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 13:51:35 -07:00
141e6a0505
[Misc] Make reorder batch also separate extends ( #27367 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-28 10:55:10 -07:00
130aa8cbcf
Add load pattern configuration guide to benchmarks ( #26886 )
...
Signed-off-by: Matvei Pashkovskii <mpashkov@amd.com >
Signed-off-by: Matvei Pashkovskii <matvei.pashkovskii@amd.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-28 10:49:15 -07:00
e3d8186666
[compile] Add fallback path to AOT compile when serialization fails. ( #27350 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:54:26 -04:00
f5710ef02a
[Misc] Make LayerBlockType a Literal instead of Enum ( #27658 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 16:23:35 +00:00
a8c02fb5bf
[Bugfix][CI] Fix v1 attention backend tests and add CI coverage ( #26597 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-28 11:42:05 -04:00
02af36df36
[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer ( #27117 )
...
Signed-off-by: Kero Liang <kerorek@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: donglu <donglu@cohere.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 15:01:24 +00:00
e88bdd60d9
[FLA] Introduce Kimi Delta Attention(KDA) to VLLM ( #27654 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
2025-10-28 22:56:28 +08:00
05e034f085
[nit]: Fix import for the lmcache integration ( #27600 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-28 14:40:55 +00:00
936643a868
[BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache ( #27294 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2025-10-28 10:22:28 -04:00
b186149e8e
[Bugfix][Frontend] validate arg priority in frontend LLM class before add request ( #27596 )
...
Signed-off-by: Junpu Fan <junpufan@gmail.com >
2025-10-28 14:02:43 +00:00
2abbd351ef
[Core] Enable async scheduling for external_launcher mode ( #27394 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-28 13:52:47 +00:00
446912d1cb
fix: allow HuggingFace standard chat template params via **kwargs ( #27622 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-28 21:12:34 +08:00
a00d6254e9
[compile] Disable dynamo guards check for AOT compilation. ( #27288 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:58:12 +00:00
05181cc57f
[Hybrid] Add mamba_block_size to Engine Args ( #27289 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-28 12:54:24 +00:00
259504e147
[compile] Add enable_prompt_embeds to compile hash. ( #27285 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:46:03 +08:00
0484b64248
[Bug] Fix shape issue for eplb expert weights ( #27589 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:44:05 +08:00
f58d9b6404
[Misc] Separate out utils.counter and move utils.Device to engine ( #27588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 12:20:46 +00:00
44b5ce956d
[Bugfix] In LongRoPE, decide short vs long based on max_model_len ( #27431 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-28 12:00:56 +00:00
7a865f2325
[V0 Deprecation] Remove vestigial V0 logits_processors.py file ( #27601 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 19:17:45 +08:00
2fa90bda27
Fix a robust parsing issue in KimiK2ToolParser that causes IndexError ( #27565 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
2025-10-28 11:11:50 +00:00
0291fbf65c
[CI/Build] Fix amd model executor test ( #27612 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-28 08:58:11 +00:00
b46e4a06f1
[Core][Bookkeeping Optimization] Update against numpy view of is_token_ids tensor ( #27618 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-28 08:13:10 +00:00
d34f5fe939
[Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms ( #27526 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-27 23:25:44 -07:00
bdb01a38fe
[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X ( #27323 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-10-27 22:58:06 -07:00
5b3c35a68e
[ROCm] [Doc] Update ROCm installation docs ( #27327 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-28 13:00:50 +08:00
61fbfe5274
[Bugfix] fixed inconsistent finish_reason handling between V0 and V1 engines ( #27555 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-28 02:18:08 +00:00
255e34ca50
[Stability fix] turn off HMA allocator when connector is set ( #27592 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-27 18:32:23 -07:00
a8d2e326ec
[Bugfix][CI] Fix config resolving logic with remote models ( #27610 )
2025-10-28 00:48:32 +00:00
53a56e658b
[gpt-oss][2/N] Support input_messages in responsesRequest ( #26962 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-27 23:15:49 +00:00
69f064062b
Code quality improvements: version update, type annotation enhancement, and enum usage simplification ( #27581 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-27 17:50:22 +00:00
921e78f4bb
[ROCm] Update AITER branch for ROCm base docker ( #27586 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-27 17:22:33 +00:00
6ebffafbb6
[Misc] Clean up more utils ( #27567 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 15:30:38 +00:00
3b96f85c36
[Chore]: Stream tokens vs characters in tool call parser tests ( #26513 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-10-27 23:06:25 +08:00
23ad820553
fixing mm placeholder replacement issue with gemma3 ( #27538 )
...
Signed-off-by: tingtingtang1992 <streamttt@gmail.com >
2025-10-27 14:34:01 +00:00
5d3be3ba4c
[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement ( #27487 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-27 07:32:50 -07:00
4f882be4a0
[Model] Siglip2 Model Support ( #27566 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-27 06:57:37 -07:00
9273754222
[Hybrid] Added supports_mamba_prefix_caching Protocol ( #27339 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-27 13:05:20 +00:00
f4e8154076
[Kernel] Enable moe LoRA kernel support FP16 ( #27468 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 19:48:37 +08:00
a663f6ae64
[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 ( #27415 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-27 11:14:55 +00:00
a4fc21895e
[Bugfix] Fixed when return_token_ids=False, the first event still contains prompt_token_ids. ( #27561 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-27 11:06:43 +00:00
a3e8611da5
[Bugfix] Limit the default value of max_model_len when it is not specified by users ( #27556 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-10-27 10:16:20 +00:00
7c2bdb83dc
[Misc] Clean up utils ( #27552 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 09:05:40 +00:00
9932ed6a83
[Kernel] Adding split_K implementation for fused_moe_lora ( #27291 )
...
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 02:05:24 -07:00
2d631d28c6
[Doc] Slight improvement to M2 and beyond ( #27554 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-27 09:02:10 +00:00
b368382964
[Model] Deprecate merge_by_field_config=False ( #27551 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 16:43:00 +08:00
a806c14cc7
[Performance][LoRA] add context varying params to 'do_not_specialize' in fused moe lora ( #27445 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-10-27 06:31:55 +00:00
181bf5bbde
[Docs] reemove the incorrect enable_reasoning parameter ( #27550 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-10-26 23:17:19 -07:00
cbd5e07a51
[Model] Use merge_by_field_config for MM models (Qwen series) ( #27546 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 05:38:05 +00:00
63b22e0dbb
[Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple ( #27316 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-26 20:53:31 -07:00
5980604c44
Fix MiniMax-M2 copyright ( #27537 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 03:29:51 +00:00
361a7463d3
fix m2 test ( #27536 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-27 01:04:36 +08:00
720af6ab79
[Model][MiniMax-M2] Support MiniMax-M2 Model ( #27535 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 00:59:11 +08:00
55cba4a05c
[CI/Build] Update causal-conv1d installation ( #27529 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 22:14:22 +08:00
c7abff2990
Revert "[CI/Build] Use CPU for mm processing test on CI ( #27522 )" ( #27531 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 04:44:27 -07:00
71b1c8b667
[Chore]:Extract math and argparse utilities to separate modules ( #27188 )
...
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com >
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com >
Signed-off-by: yeshsurya <yeshsurya@gmail.com >
2025-10-26 04:03:32 -07:00
8fb7b2fab9
[Doc] Fix links to GH projects ( #27530 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 17:55:51 +08:00
be7b55a83d
[Doc] Remove Molmo warning ( #27527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 16:22:52 +08:00
315b860abe
[bugfix]fix empty prompts for async-engine mode in benchmark throughput ( #27494 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-26 08:16:35 +00:00
87c41c26ad
[Bugfix] Fix processor initialization for model from modelscope instead of HF ( #27461 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 07:44:31 +00:00
65d2cf9511
[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA ( #27190 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-10-26 15:08:52 +08:00
d63cd9ff10
[CI/Build] Use CPU for mm processing test on CI ( #27522 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 13:09:18 +08:00
66a168a197
[CI/Build] Refactor processing tests ( #27470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-25 16:14:30 +00:00
a99564ac5b
[Attention] Add missing kv cache scale setup ( #27490 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-25 00:12:49 -07:00
4c5f632165
[Misc] Simplify max tokens in multimodal registry ( #27500 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 23:56:01 -07:00
b853540388
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector ( #25712 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-24 23:34:18 -07:00
56ed7609a9
Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… ( #27502 )
2025-10-25 05:31:43 +00:00
29c9cb8007
[CI] Add tests for cudagraph ( #27391 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-25 02:37:33 +00:00
83f478bb19
[KVConnector] Migrate the LMCache integration code to be vLLM native ( #25542 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-10-25 00:23:53 +00:00
269c4db0a4
[Misc][DP] Guard mxfp4 implementation selection ( #27484 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-24 23:29:24 +00:00
52efc34ebf
[Log] Optimize Startup Log ( #26740 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-24 19:27:04 -04:00
d95d0f4b98
[Distributed] Basic set of configuration for large EP deployment on GB200 ( #27328 )
...
Signed-off-by: Pengchao Wang <wpc@fb.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-10-24 14:16:44 -07:00
0402428200
[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run ( #27455 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-10-24 20:45:36 +00:00
17af6aa0da
[Document] Add ms-swift library to rlhf.md ( #27469 )
2025-10-24 20:31:50 +00:00
fc168c33f3
[CI/Build] Fix test_torch_utils in AMD CI ( #27317 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-24 12:26:00 -07:00
acc78aeb88
[Bugfix] Fix interns1-vit qk norm code path ( #27480 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-24 17:43:45 +00:00
0f67d4d962
[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek ( #26397 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-10-24 10:24:08 -07:00
7e1d697b56
[Bugfix] Fix MultiConnector stats reconstruction across process boundaries ( #27366 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-24 17:08:05 +00:00
699d62e6cf
[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished ( #27297 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-24 17:01:41 +00:00
cd390b609d
[compile] Turn standalone_compile back on ( #27460 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-24 16:30:27 +00:00
2080b05099
[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype ( #27472 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-24 15:57:48 +00:00
6454afec90
[Doc] Fix minor issues in docs/design/metrics.md ( #27436 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2025-10-24 05:40:54 -07:00
41a62564a7
Fix test named tool use ( #27458 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-24 20:27:45 +08:00
284cc92275
[MISC] cudagraph_capture_sizes related improvements ( #26016 )
...
Signed-off-by: fhl <2410591650@qq.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-24 05:11:05 -07:00
435be10db9
Fix AArch64 CPU Docker pipeline ( #27331 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
2025-10-24 05:11:01 -07:00
b7030d962b
[Benchmark] Enable benchmark to run with encoding_format="bytes" ( #27467 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 11:16:50 +00:00
3567816932
[Refactor] move tool parsing logic from protocol.py to the tool parser ( #27383 )
...
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-10-24 09:53:23 +00:00
e0ef8a2920
[BugFix] Fix torchrun DP with LLM class ( #27395 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-24 08:11:37 +00:00
42efe609ba
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer ( #27418 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-24 07:32:47 +00:00
88d3141ec6
[Docs] remove v1 column for embedding models ( #27446 )
...
Signed-off-by: piood <2477084691@qq.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-23 23:55:03 -07:00
09a6a49eaf
[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator ( #27443 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-24 14:53:09 +08:00
074475541a
[Bugfix] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API ( #26706 )
...
Signed-off-by: Shai Trinczer <strinczer@icloud.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-23 22:53:42 -07:00
d4c574c39f
[Chore] remove structural tags logging lines ( #27451 )
2025-10-24 05:35:45 +00:00
c528b9006a
Fix EventPublisherFactory logic for disabled KV cache events ( #27419 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-24 05:00:01 +00:00
85fee74b33
[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder ( #27427 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
2025-10-23 20:31:14 -07:00
8dbe0c527f
[Misc] Add TPU usage report when using tpu_inference. ( #27423 )
...
Signed-off-by: Hongmin Fan <fanhongmin@google.com >
2025-10-23 20:29:37 -07:00
5cc6bddb6e
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm ( #26092 )
2025-10-23 23:26:13 -04:00
1f9460c4c1
Fix pooling adapters for Transformers backend ( #27338 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 20:23:55 -07:00
70022ffc00
Granite 4.0 quark quantization support ( #26944 )
...
Signed-off-by: Xiao YU <Xiao.YU@xilinx.com >
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com >
Co-authored-by: Xiao YU <Xiao.YU@xilinx.com >
2025-10-24 02:14:03 +00:00
f417746ad7
[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc ( #27422 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-23 21:21:36 +00:00
0552cfb195
[Model] Siglip Embedding Support ( #27324 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-23 20:19:48 +00:00
51dd14ac2b
[Bugfix][DP] Fix creating too many DP Placement Groups ( #26880 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-23 20:16:51 +00:00
dbfbf9f324
[Attention] Fix FlashMLA metadata builder arguments for q_len > 1 ( #27368 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-23 15:58:15 -04:00
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py ( #27374 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 19:08:06 +00:00
a9f55dc588
[Misc] Add triton_kernels dependency ( #27370 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-23 12:04:14 -07:00
81d5bb765a
[Bugfix] Fix AWQ marlin layer skipping ( #27416 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-23 18:30:28 +00:00
0825197bee
[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek ( #27373 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-23 17:43:53 +00:00
9ef3d5b875
[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer ( #27220 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-24 00:03:14 +08:00
295c7f0267
Mirroring the test definitions (2025-10-22) ( #27362 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-24 00:02:26 +08:00
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com >
2025-10-23 14:46:18 +00:00
fe2016de2d
[CI/Build] Remove unnecessary flags from test registry ( #27353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 14:42:40 +00:00
237cf6d32a
[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) ( #26709 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-23 20:58:39 +08:00
faee3ccdc2
[Feature] Pydantic validation for speculative.py ( #27156 )
...
Signed-off-by: Navya Srivastava <navya.srivastava1707@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 12:19:33 +00:00
570c3e1cd4
[Bugfix] Honor --mm_encoder_attn_backend when used ( #27124 )
...
Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-23 20:09:52 +08:00
3a4255c7c4
Run mypy on the lowest supported Python version instead of system Python ( #27048 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 05:07:44 -07:00
61089465a6
[Model] Add MoE support for NemotronH ( #25863 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-10-23 10:27:23 +00:00
88afa11010
[Metrics] [KVConnector] Add connector prefix cache hit rate stats ( #26245 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-23 12:21:08 +02:00
d00ce29d89
[CI] Reorganize entrypoints tests ( #27403 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-23 10:10:06 +00:00
3b7bdf983b
add SLA information into comparison graph for vLLM Benchmark Suite ( #25525 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: louie-tsai <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-23 08:04:59 +00:00
50b788a17a
[CI/Build] Fix AMD CI: test_cpu_gpu.py ( #27388 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-23 07:55:00 +00:00
fc059c7061
[Bugfix] Fix args settings for guided decoding args ( #27375 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-23 07:34:06 +00:00
bfb240cc49
[CI/Build] Fix Prithvi plugin test ( #27393 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 07:30:44 +00:00
e255d92990
[Chore] Remove duplicate has_ functions in vllm.utils ( #27372 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 06:11:59 +00:00
3729ed00ba
[Model] Add num_cached_tokens for PoolingRequestOutput ( #27378 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-23 14:03:42 +08:00
6644796bf4
[V1][spec decode] return logprobs for spec decoding ( #26060 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-22 22:59:59 -07:00
ff93cc8c84
[CORE] Support Prefix Caching with Prompt Embeds ( #27219 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-10-22 22:18:07 -07:00
243ed7d32e
[Bugfix][Core] running queue index leakage exception ( #26754 )
...
Signed-off-by: CLFutureX <chenyongqyl@163.com >
2025-10-22 21:40:12 -07:00
7e0941055f
[Bugfix] Fix incorrect kv cache metrics in grafana.json ( #27133 )
...
Signed-off-by: Fangping Shi <fangping_shi@apple.com >
Co-authored-by: Fangping Shi <fangping_shi@apple.com >
2025-10-22 20:58:36 -07:00
6738e4a093
[Bugfix] Fix SLA tuner initialization ( #27355 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 20:43:04 -07:00
2566dca2a9
[Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support ( #27361 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 17:15:38 -07:00
b4fda58a2d
[MLA] Bump FlashMLA ( #27354 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-22 15:48:37 -07:00
a0003b56b0
[Chore] Separate out system utilities from vllm.utils ( #27201 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 20:25:25 +00:00
5beacce2ea
[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 ( #27128 )
...
Signed-off-by: qqma <qqma@amazon.com >
Co-authored-by: qqma <qqma@amazon.com >
2025-10-22 19:36:39 +00:00
8669c69afa
[Feature] publisher default set zmq in kv_event config ( #26915 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 19:19:33 +00:00
1651003c35
[Prefix Cache] Use LoRA name for consistent KV-cache block hashing ( #27211 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2025-10-22 18:13:03 +00:00
1cb8c6c5fe
[Doc] Fix numbering sequence in prefix caching ( #27357 )
...
Signed-off-by: William Song <jinwook@umich.edu >
2025-10-22 17:35:47 +00:00
e05a6754a8
[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… ( #27309 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
2025-10-22 10:05:34 -07:00
084a9dae80
[Bugfix] Disable FlexAttention direct block mask building for encoder-only models ( #27344 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 16:39:08 +00:00
c9461e05a4
Support Anthropic API /v1/messages Endpoint ( #22627 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-22 09:13:18 -07:00
4dfdb821c8
[P/D] Dynamic kv_output_aggregator collect size ( #26734 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 18:07:58 +02:00
58fab50d82
[Frontend] Require flag for loading text and image embeds ( #27204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 15:52:02 +00:00
db6f28d898
[Bugfix] Fix HF format InternVL large variants video processing ( #27330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 08:39:23 -07:00
14e2f1231e
[Bugfix] Make get_mrope_input_positions instance methods ( #27342 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 08:38:34 -07:00
7c4767f1eb
[NIXL] use Host buffer to support TP_ratio > 1 for XPU ( #27140 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-10-22 15:28:13 +00:00
9771e0b432
[Bugfix] Add missing 'is_internal_router' attribute to FusedMoEWithLoRA ( #27351 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 08:19:12 -07:00
980de31ca0
[bugfix] remove unused parameters to reduce unnecessary vram usage ( #26789 )
...
Signed-off-by: Reinforce-II <fate@eastal.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-22 08:16:09 -07:00
1c160841ea
[Bug] Fix DeepSeek-V2.5-1210-FP8 issue ( #27267 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 11:00:10 -04:00
4ca13a8667
[NIXL] Terminate handshake listener thread in shutdown ( #26404 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-22 16:59:53 +02:00
675aa2ec64
[Model] Upstream Deepseek-OCR model ( #27247 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-22 07:59:15 -07:00
3ae082c373
[Chore] Separate out optional dependency checks from vllm.utils ( #27207 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 10:44:21 -04:00
49c00fe304
Mirroring changes in test-pipeline.yaml into test-amd.yaml ( #27242 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-22 09:59:45 -04:00
141d3b9fc5
[docs] Update v1 metrics design doc ( #27332 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: atalhens <sneh.lata@nutanix.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: atalhens <sneh.lata@nutanix.com >
2025-10-22 06:29:15 -07:00
abf3db40ef
[Core] Handle MoE LoRA edge cases ( #27335 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 13:14:33 +00:00
8e4ca4d14e
Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' ( #27311 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 12:23:57 +00:00
1a0f4defb7
[Log] Add Warning for LLM(data_parallel_size=k) single-process DP Usage ( #27282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 12:12:21 +00:00
843af7f7fc
[Bugfix][CPU] Disable dual stream execution for experts on CPU ( #27320 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-22 11:02:27 +00:00
1f633b8632
[Frontend][3/N] Improve all pooling task | Support binary embedding response ( #27066 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-22 18:38:57 +08:00
a4c29e6e82
fixed reasoning streaming with tool_choice="required" ( #24108 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-22 09:42:55 +00:00
8f18feb191
Remove last level references not removed in #26355 ( #27260 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-22 09:18:17 +00:00
ed540d6d4c
Update release pipeline for PyTorch 2.9.0 ( #27303 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-22 09:18:01 +00:00
f6027b2855
[1/N][Platform] Cleanup useless function ( #26982 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-22 09:04:57 +00:00
ab3e80042e
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled ( #27146 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-22 00:22:39 -04:00
ceacedc1f9
[Benchmark] Add plot utility for parameter sweep ( #27168 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-21 20:30:03 -07:00
bfa59be8f1
[CI] Nixl integration tests DP-EP ( #27199 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 11:17:48 +08:00
265ecb05fb
[DOC] [ROCm] Add ROCm quickstart guide ( #26505 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-22 03:10:48 +00:00
09a7e6f617
[Deepseek v3.2] Remove extra logics in indexer ( #26465 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Signed-off-by: Lain <siyuanf@nvidia.com >
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 23:34:03 +00:00
6c2eef5a5d
[P/D] KVConnector for decode benchmarking ( #25986 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-10-21 16:30:47 -07:00
19748806f0
[Bugfix] skip cuda graph for drafter when running with eager ( #26821 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-21 15:39:09 -07:00
4a8a567e16
Updated xgrammar backend to not deny supported string formats ( #27253 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-21 22:25:23 +00:00
344a0017c0
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE ( #26440 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-21 21:38:29 +00:00
becb7de40b
Update PyTorch to 2.9.0+cu129 ( #24994 )
...
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-21 17:20:18 -04:00
250fb1b8ea
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. ( #27144 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-21 18:27:03 +00:00
647214f3d5
[V0 Deprecation] Remove V0 executors ( #27142 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 11:09:37 -07:00
ddeec11ba9
[Bugfix][P/D] Reduce num_threads used by nixl ucx backend ( #27196 )
...
Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com >
2025-10-21 13:41:52 -04:00
86ed77022d
[Feature] Batch Invariant for R1 TP 8 on Blackwell ( #27229 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-21 10:25:55 -07:00
aa1356ec53
[ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile ( #27206 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-21 12:01:23 -04:00
ecc3c0940a
Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code ( #27213 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-10-21 22:59:53 +08:00
ba09652de2
[ROCM] Enable CompressedTensorsWNA16 ( #27187 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-10-21 10:43:23 -04:00
bd66b8529b
[CI] Install pre-release version of apache-tvm-ffi for flashinfer ( #27262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-21 14:23:56 +00:00
6c728f7771
[Chore] Separate out NCCL utilities from vllm.utils ( #27197 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-21 06:18:23 -07:00
80e9452984
[Deepseek v3.2] Optimize top_k_per_row ( #26763 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 08:30:07 +00:00
c3a2c6ac5f
[MM][Core] Decouple ViT backend from LM backend ( #27061 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-21 00:30:10 -07:00
72f431e709
[Nixl] Minor refactor to handshake related metadata ( #26410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-21 09:07:47 +02:00
be4445072c
[Fix][Spec Decode] Fix llama4 draft loading with different quantization ( #27136 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com >
2025-10-20 23:19:00 -07:00
f381cf2302
[Bugfix] Fix broken MTP weight loading for FP8 KV Scales ( #27227 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-20 22:51:44 -07:00
5ff5d94e77
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 ( #26729 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:51:14 -04:00
f95da13c3d
[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 ( #26135 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:50:31 -04:00
aef368aa08
[BugFix] GPT-OSS Attention DP + MoE TP weight loading issue ( #24032 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
2025-10-21 04:03:47 +00:00
5f6cbf60d6
[Feature][Kernel]FusedMoE LoRA ( #21229 )
...
Signed-off-by: wuchen <cntryroa@gmail.com >
Signed-off-by: banjuede <lmklhc@163.com >
Signed-off-by: Chen Wu <cntryroa@gmail.com >
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: wuchen <wuchen@zetyun.com >
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com >
Co-authored-by: banjuede <lmklhc@163.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
2025-10-21 03:01:37 +00:00
3ada34f9cb
[Frontend] Enforce tokenize=False when applying chat template ( #27205 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 02:57:34 +00:00
0eb8f2b880
create is_in_the_same_node on cpu ( #26832 )
...
Co-authored-by: Lunwen He <lunwenh@meta.com >
2025-10-21 02:04:14 +00:00
163965d183
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 ( #27183 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Michael Yang <Michael.Yang@arm.com >
2025-10-21 02:02:58 +00:00
a03cf9bc70
[V0 Deprecation] Remove V0 metrics code ( #27215 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 02:02:10 +00:00
352c0c8a28
[Quantization] Automatically infer AWQ modules_to_not_convert field ( #26909 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 01:49:28 +00:00
bfe0b4bd2a
[ez] add uv lock to gitignore ( #27212 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-21 00:37:44 +00:00
58fbbcb2f5
[ROCm] enable some tests in entrypoints test groups on AMD ( #26725 )
...
Signed-off-by: Yida <yida.wu@amd.com >
2025-10-21 00:37:16 +00:00
87778d5f00
[Feature][Quantization] auto_round support for mixed bits quantization ( #23812 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com >
Signed-off-by: Heng Guo <heng.guo@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-20 22:23:30 +00:00
f9e7ad5400
[Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test ( #27195 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-20 16:34:54 +00:00
4d0f266113
[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) ( #26268 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com >
2025-10-20 07:48:01 -07:00
e93ff6c8b9
Nemotron Nano V2 VL + EVS Video Support ( #27107 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Natan Bagrov <nbagrov@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 22:19:11 +08:00
1c691f4a71
AArch64 CPU Docker pipeline ( #26931 )
2025-10-20 07:09:40 -04:00
9fce7bee74
[Kernel] Accelerate solve_tril with TMA ( #26746 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-20 05:39:02 +00:00
b63f2143f8
[LoRA] LoRA cuda graph specialization ( #25914 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-20 04:21:09 +00:00
f32bf7582e
[Model][VLM] Support Bee-8B Model ( #27012 )
...
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com >
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 02:31:26 +00:00
8a81d776ce
Fix typo in ValueError message: use kv_role instead of kv_disagg_role ( #27166 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-19 19:47:19 +00:00
f6fdacd82c
[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled ( #26586 )
...
Signed-off-by: southfreebird <yvorott@gmail.com >
2025-10-19 19:24:46 +00:00
d31f7844f8
[Misc] Move utils to avoid conflicts with stdlib, and move tests ( #27169 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-19 05:20:55 -07:00
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils ( #27164 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
2025-10-19 03:06:32 -07:00
221bf72577
output type conversion fix ( #27159 )
2025-10-19 08:10:07 +00:00
b3aba04e5a
[Benchmark] Convenience script for multiple parameter combinations ( #27085 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-18 23:57:01 -07:00
8a297115e2
[Chore] Separate out hashing utilities from vllm.utils ( #27151 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-19 11:09:38 +08:00
191eed0bb9
[BugFix] Fix lazy imports involving outlines_core ( #27158 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-19 02:35:32 +00:00
fb860670da
[Minor] Remove unused env variable ( #27161 )
2025-10-18 18:48:35 -07:00
83e760c57d
[V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations ( #22456 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-18 15:12:46 -07:00
c2bba69065
[BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 ( #27121 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 22:05:23 +00:00
e133d6d218
[BugFix] fix graph partition signature ( #27139 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-18 17:34:36 -04:00
a1946c9f61
[Chore] Separate out profiling utilities from vllm.utils ( #27150 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 19:12:01 +00:00
9f020f4f31
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] ( #27111 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-18 12:44:39 -06:00
3b45075206
[Minor] Add some clarifying comments to recent changes ( #27130 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-18 09:52:45 -07:00
168e578efc
Fix incorrect string formatting in barrier timeout exceptions ( #27149 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-18 09:51:57 -07:00
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
2025-10-18 09:48:22 -07:00
5c2acb270a
[Models][QwenVL] Remove unnecessary .contiguous() calls ( #27106 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-18 07:05:05 -07:00
b26b70bec4
[Misc] Refactor get_kv_cache_spec into AttentionLayerBase ( #26587 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-18 13:51:21 +00:00
ab4be40fc5
[fix][cpu] fix prefill attention in CPU attention backend ( #27035 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-18 13:30:21 +00:00
245e4f2c01
[Feature] Batch Invariant: Support DeepGEMM and Blackwell ( #27127 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-18 09:28:05 -04:00
1d165d6d85
[Chore] Separate out vllm.utils.mem_utils ( #27143 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 10:06:59 +00:00
83004020fd
[Test] Add test for /health endpoint on engine failure ( #26074 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 09:59:05 +00:00
12e21701e7
[DOC][FEATURES][CPU]update cpu feature for v1 ( #27135 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-18 01:10:45 -07:00
30a33b92ee
[Misc] Rev DeepEP ( #27122 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-18 14:54:29 +08:00
7c572544e4
[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot ( #25515 )
...
Signed-off-by: Hanchenli <lihanc2002@gmail.com >
Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com >
Signed-off-by: Wei Wei <wwei6@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wei Wei <wwei6@meta.com >
Co-authored-by: Wei Wei <weiweinpu@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-17 21:55:54 -07:00
c312320764
[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests ( #26663 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-17 21:11:26 -07:00
c981f0ea78
[Perf] Add H100 fused MoE config ( #25398 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-18 02:21:27 +00:00
6367bde739
[BugFix][Core] Fix error when enable async-scheduling in multi-node env ( #25887 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
Signed-off-by: Lehua Ding <lehuading@qq.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-10-17 22:16:18 +00:00
f50cc221ea
[Test] Make test_failure more stable for batch invariance ( #27054 )
2025-10-17 16:59:08 -04:00
acedc74b1a
[V1][Spec Decode] Fix greedy temperature detection after sampler refactor ( #27077 )
...
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com >
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com >
2025-10-17 13:27:47 -07:00
d29483b58a
[Minor] Remove unnecessary error message ( #27115 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-17 20:02:12 +00:00
950cf9e58e
[Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 ( #27114 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-17 19:47:18 +00:00
3125d79950
[Chore] Remove unused PolyNorm layer ( #27110 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-17 19:03:43 +00:00
e33ee23ee3
[Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic ( #27029 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-17 12:51:10 -06:00
b10c64c834
[ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) ( #26192 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Signed-off-by: rasmith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 14:17:18 -04:00
0925b28a8e
[ROCM] MoE fp4 CK kernel ( #26545 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-10-17 14:06:33 -04:00
99722d5f0e
[CI] Remove forbidden slash ( #27112 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 09:38:00 -07:00
4c91a28e30
[bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True ( #27104 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
2025-10-17 16:26:33 +00:00
b038d9c40c
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) ( #26367 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-17 08:24:42 -07:00
2ba60ec7fe
[CI] Nixl integration tests ( #27010 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 07:13:31 -07:00
bd7157a071
[torch.compile] Enable attention and allreduce fusion without custom ops enabled ( #24604 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 08:10:23 -06:00
be429d0cfd
Fix incorrect docstring for stop_profile() method ( #27101 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-17 06:30:23 -07:00
c253745eb8
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 ( #25586 )
...
Signed-off-by: Reima Karhila <reima.karhila@amd.com >
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
Co-authored-by: xaguilar <Xavier.AguilarFruto@amd.com >
2025-10-17 04:56:12 -07:00
daec4d2624
[Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping ( #27096 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:47:00 -07:00
6c9fdbf725
[Docs] Replace rst style double-backtick with md single-backtick ( #27091 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:47:34 -07:00
483ea64611
[Docs] Replace all explicit anchors with real links ( #27087 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:22:06 -07:00
e20eba753b
[VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding ( #27088 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-17 02:00:30 -07:00
bbc1b29665
Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage ( #27069 )
...
Signed-off-by: cong-meta <prowindy@hotmail.com >
2025-10-17 01:53:06 -07:00
acb1bfa601
[CI] fix docs build failed ( #27082 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-17 07:53:40 +00:00
75c7ad9918
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel ( #26717 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
2025-10-17 07:30:35 +00:00
5550ff9c25
[CI/Build] Update compressed tensor test path to fix CPU CI ( #27068 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-16 22:34:56 -07:00
3aeb19a39e
[Model] Add support for LightOnOCR ( #26916 )
...
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com >
Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-17 05:05:24 +00:00
8c017b3490
[Model] Always use Transformers backend for PaliGemma and Gemma3-MM ( #26715 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 05:03:35 +00:00
9c2c2287a0
[CI/Build] Update Llama4 eval yaml ( #27070 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-17 04:59:47 +00:00
fec2b341ad
[Kernel] Lazy import FlashInfer ( #26977 )
2025-10-17 04:48:18 +00:00
87bc0c492f
[Bugfix] Fix ReplicatedLinearWithLoRA ( #27065 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:43:16 +00:00
fe3b9372ad
[Core] Change execute_model_with_error_logging() to be a ctx manager ( #27060 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-17 11:45:32 +08:00
bde9e2272a
[Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 ( #27030 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
2025-10-17 03:37:52 +00:00
08405609cc
disable graph partition in custom op ( #26952 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 11:08:47 +08:00
ab81379ea6
[Perf] Exploit out-of-band buffers in shm_broadcast ( #26961 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-16 20:08:03 -07:00
4ffd6e8942
[Docs] Reduce custom syntax used in docs ( #27009 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 20:05:34 -07:00
965c5f4914
vllm bench serve shows num of failed requests ( #26478 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2025-10-16 19:55:09 -07:00
4d055ef465
Remove unused imports ( #26972 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 19:51:17 -07:00
17c540a993
[torch.compile] fix simple inductor graph partition test ( #27050 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-16 21:09:36 -04:00
4d4d6bad19
[Chore] Separate out vllm.utils.importlib ( #27022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 00:48:59 +00:00
11ae016bd7
[torch.compile] Passing only necessary compilation config to inductor pass config ( #27041 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-10-17 00:01:52 +00:00
41d3071918
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel ( #26714 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 16:20:25 -07:00
fb5e10d3fb
Refactor Transformers backend to use mixins ( #26906 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 21:50:39 +00:00
b2f78cbad4
[small][batch invariance] Rename the env and internal flags to simplify usage ( #26855 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-10-16 21:40:25 +00:00
23583ee28c
[Bug] Add Assertion for random-input-len / random-output-len ( #26834 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 21:36:39 +00:00
01c977e96d
[CI] Prune Quantization Tests and skip compilation ( #27038 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-16 17:26:35 -04:00
b3dda72c23
[Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout ( #26935 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 16:46:48 -04:00
fb0571b077
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels ( #25997 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-16 12:53:11 -07:00
2ed8b6b3d0
[Bug] Fix batch invariant test has to is ( #27032 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 19:45:14 +00:00
013abde6ef
Adding Warmup to Benchmark Serving ( #26943 )
...
Signed-off-by: Kimbo Chen <chentenghung@gmail.com >
2025-10-16 12:44:32 -07:00
a5464dcf92
[Compressed Tensors] Always clone output for compile robustness ( #26849 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 19:29:59 +00:00
ac3ed5a815
Support block size of 256 used by Intel HPU ( #26883 )
...
Signed-off-by: mandy-li <mandy.j.li@intel.com >
2025-10-16 15:10:57 -04:00
e6ba2000ae
[gpt-oss][1/N] EZ: refactor serving_responses for modularity ( #26948 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2025-10-16 18:44:06 +00:00
aa255ff55a
Support set in the CLI generation ( #27031 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 18:07:18 +00:00
7bb736d00e
Fix Qwen2.5 VL image grid docstring ( #27033 )
...
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
2025-10-16 09:57:36 -07:00
9f4e30904b
[Model] Fix Qwen3VL mm mapping ( #27027 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-16 09:45:59 -07:00
5afd3276df
[Feature] Add process_weights_after_loading to AttentionImpl ( #26870 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-16 08:02:30 -07:00
43721bc67f
[CI] Replace large models with tiny alternatives in tests ( #24057 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 15:51:27 +01:00
02d709a6f1
[docs] standardize Hugging Face env var to HF_TOKEN (deprecates HUGGING_FACE_HUB_TOKEN) ( #27020 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-10-16 15:31:02 +01:00
4a510ab487
[NIXL] Improve request_finished() debug logs ( #25665 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-16 15:55:17 +02:00
314fa8abbf
[Attention] Tune CUTLASS MLA num_splits ( #26846 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-16 06:36:09 -07:00
334535b6fb
[Benchmark] Show E2EL by default for pooling models ( #27014 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 12:47:09 +00:00
dcbb3f1871
[Bugfix] Correct LayerNorm epsilon parameter in modernbert.py ( #27008 )
...
Signed-off-by: bogdanm <152898065+bogdan01m@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 12:27:44 +00:00
00417f4e44
[MISC] fix import violations for re and triton modules ( #26654 )
...
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
2025-10-16 03:38:27 -07:00
ed344f4116
Cleanup code after Python 3.10 upgrade ( #26520 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 03:38:23 -07:00
e51928793e
[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization ( #26885 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-16 03:37:35 -07:00
d2740fafbf
[Chore] Separate out vllm.utils.collections ( #26990 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 08:35:35 +00:00
17838e50ef
[Benchmark] Use truncation by default for pooling benchmarks ( #26992 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 16:02:39 +08:00
44c8555621
[CI/Build] Fix AMD import failures in CI ( #26841 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-16 07:28:20 +00:00
f7d318de2b
[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling ( #26987 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-15 22:36:59 -07:00
76f0d05bc6
[CI/Build] Update expected beam search output for Phi3V ( #26978 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 05:12:44 +00:00
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 ( #26609 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 22:06:02 -07:00
785d8b6410
[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) ( #26437 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-10-16 12:18:31 +08:00
f6cdc9a02f
[Chore] Rename utils submodules ( #26920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 03:58:13 +00:00
509cdc0370
[DOC][XPU]update feature parity with Intel GPU ( #26954 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 20:07:10 -07:00
9b6504c307
[BugFix] Work around graph partition x torch.compile cache issue ( #26956 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-15 20:06:11 -07:00
e19b16dde6
[bugfix] Fix SP + PP without specifying compile size ( #26955 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 20:05:33 -07:00
582f2c6be7
[BUG] Allow runai_streamer_sharded in config check ( #26958 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-15 20:05:14 -07:00
f8a0acbdbe
[CI] Enable Blackwell Llama4 MoE tests ( #26731 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 21:02:57 -06:00
1317034379
[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops ( #24097 )
...
Signed-off-by: chenjun <junchen2@amd.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-10-16 10:41:34 +08:00
0ecc553ee6
[Bugfix] reasoning_parser parameter handling in run_batch.py ( #26225 )
...
Signed-off-by: inc-jeong <inc.jeong@navercorp.com >
Signed-off-by: InChang Jeong <inc.jeong@navercorp.com >
Co-authored-by: USER <user@AL02367916.local >
2025-10-16 10:24:05 +08:00
f96bc3649c
[Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 ( #26887 )
...
Signed-off-by: Felix Zhu <felixzhu555@gmail.com >
2025-10-15 18:55:05 -07:00
938c43ea7f
[ci] Adjusting AMD test composition 2025-10-14 ( #26852 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-15 23:52:13 +00:00
0a9ef0cfce
Move query quantization to attention layer for Flashinfer & Triton. ( #26534 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Adrian Abeyta <aabeyta@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 19:01:38 -04:00
e5b438a247
[Bug] Temporally Disable VLLM_ALLREDUCE_USE_SYMM_MEM by Default ( #26925 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 16:18:50 -04:00
0b99f5d302
support flashinfer_fp4 moe for 5090 gpu ( #26669 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 15:06:47 -04:00
1f491aa0c8
Vectorize RMS norm variance using vectorize_read_with_alignment ( #26234 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 11:54:41 -07:00
de92d916fe
[NVIDIA] Add support for cudnn fp4 gemm via flashinfer ( #26107 )
...
Signed-off-by: kaixih <kaixih@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-15 13:53:00 -04:00
a1063628a4
[Chore] Clean up CODEOWNERS ( #26923 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-10-15 10:52:54 -07:00
d796375258
[ModelOpt] Remove NVFP4 MoE K%16==0 constraint ( #26891 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
2025-10-15 13:06:17 -04:00
14f8456344
[Feature]: Use pydantic validation in observability.py config ( #26637 )
...
Signed-off-by: Samuel Wu <cernunnos1710@gmail.com >
Signed-off-by: Sam/Samuel <57896620+cern1710@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 16:44:03 +00:00
4794c2bd92
Olmo 3 tool parser and tests ( #26143 )
...
Signed-off-by: Pradeep Dasigi <pradeepd@allenai.org >
2025-10-15 16:36:12 +00:00
d3cbaa08dc
Lower sevarity of log when model info cache misses due to exception ( #26917 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 09:01:09 -07:00
828523ad8e
[Chore] Separate out vllm.utils.async_utils ( #26913 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 15:33:00 +00:00
136a17fe6e
[Chore] Separate out vllm.utils.func ( #26904 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 13:03:58 +00:00
f57438338d
[BugFix] Patch inductor memory plan logic ( #26878 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 12:51:45 +00:00
5d598680e3
chore: remove unused marker ( #26890 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com >
2025-10-15 05:40:33 -07:00
8f4b313c37
[Misc] rename torch_dtype to dtype ( #26695 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 12:11:48 +00:00
f93e348010
[Misc] Remove isort and yapf ignores ( #26888 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 12:09:03 +00:00
f54f85129e
[Model][2/N] Improve all pooling task | Support multi-vector retrieval ( #25370 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-15 11:14:41 +00:00
d4d1a6024f
[Lora]Load tuned multi-lora kernel configs from json files ( #26319 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-15 09:45:14 +00:00
db1764e4e0
[Platform] allow platform to init dp group ( #22243 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 02:32:17 -07:00
7f83b4ee8e
[Easy] Get rid of unnecessary paraenthesis in kv_cache_manager ( #26842 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 09:17:43 +00:00
5c3bae1a6a
[Fix] Remove divisibility requirement between num_kv_heads and tp_size in bailing_moe ( #26876 )
...
Signed-off-by: vito.yy <vito.yy@antgroup.com >
2025-10-15 16:44:04 +08:00
5210dc3940
[Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. ( #26853 )
...
Co-authored-by: Xudong Ma <mxd@meta.com >
2025-10-15 08:37:49 +00:00
650b51f9f9
[doc] add Context Parallel Deployment doc ( #26877 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-15 16:33:52 +08:00
6256697997
[Doc] ruff format remaining Python examples ( #26795 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 01:25:49 -07:00
71557a5f7c
[CI] Fix mypy for vllm/executor ( #26845 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 01:23:33 -07:00
f3c378ffa7
[CI/Build] Add Qwen2.5-VL-7B-Instruct ChartQA Accuracy Tests in CI ( #21810 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com >
2025-10-15 08:09:56 +00:00
f5ed68ef63
[Deepseek-V3.2][Kernel] Integrate cuda indexer k cache gather ( #26456 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2025-10-15 16:05:01 +08:00
efdef57b1f
[bugfix] Lazy import cv2 ( #26869 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 07:47:50 +00:00
b8a4572157
[Misc] Use helper function to generate dummy messages in OpenAI MM tests ( #26875 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 07:17:37 +00:00
302ef403a2
[DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends ( #26656 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-15 00:16:44 -07:00
8865da157b
[Bugfix][Multi Modal] Fix incorrect Molmo token processing ( #26873 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
2025-10-15 07:13:59 +00:00
f0862eae43
[Graph Partition] pass tests for decorator ( #26831 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-15 06:39:48 +00:00
8c851f6d04
[Bugfix] Fix qwen3-omni audio truncation issue ( #26815 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-15 05:38:36 +00:00
7cfa420f49
[BugFix] Patch inductor partitioning logic ( #26735 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 05:04:32 +00:00
a27b288e4a
[Feature] default --extra-body param to disable thinking in vllm bench serve ( #26784 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-15 04:23:44 +00:00
e471d7ca7e
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR ( #26773 )
...
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 04:09:44 +00:00
c43ca8259e
[Docs] Move build.inc into arm.inc ( #26862 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-10-14 20:35:08 -07:00
85a65e7f51
[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972 ) ( #25589 )
...
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-15 11:09:52 +08:00
a2986b3e33
[Bugfix] Fixes prefix-repetition benchmark script ( #26828 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
2025-10-15 02:54:43 +00:00
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): name change compilation level to compilation mode, deprecation compilation level ( #26355 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 02:51:16 +00:00
e66d787bce
Disable FlashInfer sampler by default ( #26859 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 02:35:18 +00:00
bfad142e25
[BUGFIX][NIXL] quick fix for 'assert self.connector_worker is not None' in get_kv_connector_stats ( #26851 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 02:33:25 +00:00
9354660036
[Bugfix]fix Qwen3 xml tool parser ( #26345 )
...
Signed-off-by: Zhikaiiii <1658973216@qq.com >
2025-10-15 09:50:30 +08:00
07ca70af8d
[Core][Easy] Use envs.__getattr__ for all Unify to environment variable access ( #26810 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 01:41:18 +00:00
2dcd12d357
[torch.compile] Fix tests for torch==2.9 inductor partition ( #26116 )
...
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-10-14 19:55:02 -04:00
579d2e5458
[WideEP][P/D] Add usage stats for DP+EP and KV Connector ( #26836 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-14 23:51:54 +00:00
0512c04aee
[frontend][gptoss] Add per turn stats into Harmony Context ( #25061 )
...
Signed-off-by: lacora <hyelacora@gmail.com >
Co-authored-by: Ye Hu <yehu@fb.com >
2025-10-14 16:48:13 -07:00
7e0ef4084a
[CI Failure] Fix torchao dep failure for Quantization Test ( #26824 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 16:41:43 -07:00
4aed506b65
[Core] Streamline some structured output related code ( #26737 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 23:27:44 +00:00
a86b4c58e8
remove attn output view kernel ( #26680 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 22:53:10 +00:00
ff4810ba73
[Minor] Group async_scheduling related fields in model runner init ( #26736 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 14:46:37 -07:00
9d6964926e
fix: response_format for completion ( #23212 )
...
Signed-off-by: Nan2018 <qinnanjoshua@gmail.com >
2025-10-14 21:23:22 +00:00
0e65818910
Added MoE configs for llama 4, H200 device with tp=4/8 tuning ( #26837 )
...
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com >
2025-10-14 14:21:03 -07:00
380f17527c
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation ( #26146 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 17:03:21 -04:00
b92ab3deda
Notice for deprecation of AutoAWQ ( #26820 )
...
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 13:39:59 -07:00
acaa2c0a4a
[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs ( #24964 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 12:58:43 -07:00
82af928c41
[Attention][Spec Decode] FlashMLA spec decode support ( #26541 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-14 19:38:20 +00:00
87efc681db
llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch ( #26790 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-14 11:54:12 -07:00
c3a722fcb2
[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e ( #26816 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 18:38:59 +00:00
aba48f7db1
[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 ( #26818 )
2025-10-14 11:20:39 -07:00
04b5f9802d
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 ( #26722 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 10:52:05 -07:00
efc8f7d814
Update coveragerc and add codecov.yml for path fixes ( #26435 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com >
2025-10-14 09:45:06 -07:00
6d87a2838c
[Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH ( #26743 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-14 11:47:49 -04:00
e6cdbd6792
Revert "[issues template] Encourage the author implement their own ideas" ( #26814 )
2025-10-14 08:37:34 -07:00
df850c4912
[Feature][Responses API] Stream Function Call - harmony ( #24317 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-14 08:31:43 -07:00
720394de43
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats ( #26046 )
...
Signed-off-by: Qier Li <kevin44036@gmail.com >
2025-10-14 14:38:07 +00:00
88a49745af
[issues template] Encourage the author implement their own ideas ( #26671 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-14 22:32:36 +08:00
ca683a2a72
use combo kernel to fuse qk-norm and qk-rope ( #26682 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-14 09:40:59 -04:00
e9f1b8c9e9
Adjusted the model order of the model registration file ( #26798 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-10-14 13:26:11 +00:00
ea97940d6c
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention ( #24864 )
...
Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com >
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com >
Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com >
2025-10-14 13:07:50 +00:00
fdd32750f0
[CI/Build] Cleanup LoRA test ( #26752 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-14 12:06:35 +00:00
c715ba3735
[Feature] Change vllm.py with pydantic validation ( #26726 )
...
Signed-off-by: Vladislav <vladislav.bronzov@gmail.com >
Signed-off-by: Vladislav Bronzov <58587565+VladOS95-cyber@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-14 12:00:54 +00:00
9c4cb68339
[Chore] Remove SupportsV0Only interface and update supported models docs ( #26783 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 04:55:10 -07:00
780eb03d9b
[CI] Fix test_tool_id_kimi_k2 ( #26787 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-14 10:27:07 +00:00
ef9676a1f1
[Doc] ruff format some Python examples ( #26767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 03:21:53 -07:00
70b1b330e1
Don't allow typos to fix by default ( #26785 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-14 03:05:15 -07:00
d1d063a588
[Chore] Use max_transformers_version for Qwen-VL test ( #26792 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 03:03:46 -07:00
7e6edb1469
[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode ( #26556 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-14 09:46:05 +00:00
74704d4553
[Model] Use merge_by_field_config for MM models (O-P) ( #26776 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 09:42:45 +00:00
d2f816d6ff
[Bugfix] Standardize merging multimodal embeddings ( #26771 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 09:36:21 +00:00
577d498212
[Plugin] Make plugin group clear ( #26757 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-14 07:49:59 +00:00
fd85c9f426
[Bugfix][FE]: Always include usage with --enable-force-include-usage ( #20983 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com >
Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com >
Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com >
2025-10-14 09:17:39 +02:00
d32c611f45
[CI/Build] Use 127.0.0.1 instead of localhost in utils ( #26750 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-14 07:04:00 +00:00
01ad27faff
[Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code ( #26684 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-14 06:55:23 +00:00
481545b397
scheduler.py: Update the name of the default scheduler. ( #26758 )
...
Signed-off-by: Ryan Li <ryanli@ryanli.org >
2025-10-14 06:52:21 +00:00
d3cc8427c0
[ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) ( #26718 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-13 23:10:23 -07:00
4821ac1b4d
[CI] [ROCm] Automate CC list for ROCm related issue ( #26753 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-14 13:57:26 +08:00
4497c8f821
Fix lora tests failure in TPU CI due to the removal of LoRA bias ( #26723 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-10-14 13:04:23 +08:00
2e36cdbe2b
[Docs] Add a start tag to build.inc.md ( #26747 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-10-13 21:51:55 -07:00
fe3edb4cf0
Add support for the /rerank endpoint in vllm bench serve ( #26602 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-10-14 04:25:43 +00:00
29350922c6
[Feature][Quantization] auto_round format add support for regex ( #24024 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com >
Signed-off-by: Heng Guo <heng.guo@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 03:03:16 +00:00
8ae169286f
[torch.compile] Unwrap fused_marlin_moe custom op ( #26739 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-14 02:22:16 +00:00
8a0af6a561
[build][torch.compile] upgrade depyf version ( #26702 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-14 10:12:09 +08:00
cfded80793
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE ( #26742 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 01:46:44 +00:00
b59dd19b55
[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes ( #26681 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-13 18:15:34 -07:00
3e051bda82
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend ( #26732 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-13 18:12:52 -07:00
8317f72354
[Misc][DP] support customized aggregated logger for dp ( #24354 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-10-13 17:45:59 -07:00
d8bebb008a
Add tests for chunked prefill and prefix cache with causal pooling models ( #26526 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Ayush Singh <ayush1009208@gmail.com >
2025-10-14 07:45:04 +08:00
35bc22f23c
[ResponseAPI] Further polish message serialization and unit tests ( #26728 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-13 23:31:35 +00:00
fa96fb9c70
Pruning kernel Core Tests ( #26727 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
2025-10-13 23:08:18 +00:00
e3fdb627d9
[FrontEnd] UNREVERT CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops ( #26502 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
2025-10-13 22:47:16 +00:00
7200a21cd1
[Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' ( #26532 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-13 18:26:37 -04:00
577c72a227
[CI Perf]Prune Tests in kernel/mamba ( #26538 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-13 18:22:31 -04:00
314285d4f2
[CI] Fix mypy for vllm/distributed ( #26593 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 16:02:24 -04:00
d2a7938582
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). ( #26414 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-13 19:06:43 +00:00
89342ce4c0
[Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization ( #26051 )
...
Signed-off-by: Alex Kogan <alex.kogan@oracle.com >
Signed-off-by: Alex Kogan <82225080+sakogan@users.noreply.github.com >
2025-10-13 18:52:54 +00:00
f89f599395
[CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 ( #26698 )
2025-10-13 18:42:12 +00:00
e251e457c5
[Log] Optimize Startup Log ( #26601 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-14 02:06:57 +08:00
afc47e4de7
[Model] Use merge_by_field_config for MM models (M-N) ( #26710 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 01:27:01 +08:00
e3b90c1ba2
[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py ( #26590 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-13 17:17:13 +00:00
134f70b3ed
[Bugfix][Rocm] fix qr error when different inp shape ( #25892 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-10-13 10:04:21 -07:00
a1b2d658ee
[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 ( #26501 )
...
Signed-off-by: Sangyeon Cho <josang1204@gmail.com >
2025-10-13 12:58:33 -04:00
5c7fe25491
[Misc] Separate prompt logging to debug ( #26713 )
...
Signed-off-by: Aleksei Tsvetkov <aitsvet@ya.ru >
2025-10-13 09:04:18 -07:00
53c9a7cee2
[P/D] [NixlConnector] kv load recovery integration ( #26171 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-10-13 08:48:04 -07:00
0d21b9b51e
[UX] Speedup DeepGEMM warmup with heuristics ( #25619 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-13 07:59:27 -07:00
10214b6935
[FEATURE]: Use pydantic validation in multimodal.py config ( #26629 )
...
Signed-off-by: Anand Roy <86306690+andycandy@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 07:56:59 -07:00
4a61950f4d
[Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError ( #26693 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn
2025-10-13 07:56:01 -07:00
3263799056
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] ( #26373 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
2025-10-13 10:24:53 -04:00
8e67b2557a
[Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph ( #26687 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-13 03:21:48 -07:00
4073c82c4e
[ResponseAPI] Simplify input/output message serialization ( #26620 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-13 09:59:15 +00:00
767c3ab869
[Model][0/N] Improve all pooling task | clean up ( #25817 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-13 16:44:50 +08:00
4f207c7174
Ignore large reformatting PRs in git blame ( #26690 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 01:20:47 -07:00
782505ed8e
[Model] Add reasoning_parser and tool_parser for Ernie45 thinking ( #25027 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-13 15:55:20 +08:00
98f30b8cba
[Model] Fix Skywork R1V mlp ( #26673 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-12 22:42:17 -07:00
3cd36660f7
docs: wrong command in structured_outputs README ( #26677 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-10-12 20:59:01 -07:00
46ad73955a
[FIX] Throwing an exception when the model does not support pool tasks ( #25840 ) ( #25855 )
...
Signed-off-by: zxw <1020938856@qq.com >
Co-authored-by: wang.yuqi <noooop@126.com >
2025-10-12 20:56:21 -07:00
41f3884438
[Bugfix][Core]Fix block table out-of-range issue in priority scheduling ( #26661 )
...
Signed-off-by: quanliu <18646313696@163.com >
2025-10-13 01:25:42 +00:00
60e419c1ee
[Misc] cache result of disable_inplace ( #26666 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-13 00:17:50 +00:00
7ef6052804
[CI/Build] Add tool to build vllm-tpu wheel ( #19165 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-12 16:25:40 -06:00
4fca1a1bd2
[easy] fix pre commit error on trunk ( #26665 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-12 21:25:34 +00:00
a6049be73c
[Models][Qwen3VL] Speedup fast_pos_embed_interpolate ( #26647 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-13 01:20:07 +08:00
18ed7746ea
[Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) ( #26339 )
...
Signed-off-by: gjgjos <gjgjos@naver.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-12 17:00:52 +00:00
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-12 09:51:31 -07:00
9bb38130cb
[Bugfix] Fix GPU_ID issue in test script ( #26442 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-12 11:39:05 +00:00
b91d8db873
[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP ( #26574 )
...
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com >
2025-10-12 09:58:38 +00:00
045b396d09
[Bugfix][CI/Build] Fix failing Mteb CI ( #26638 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-12 02:42:42 -07:00
76852017ea
[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank ( #25867 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-12 09:29:08 +00:00
82e64c7a20
[PERF] [Qwen3-next] Speed up gated RMSNorm ( #26207 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-12 08:27:50 +00:00
4ca204055e
Add @noooop to codeowner for pooling models ( #26652 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-12 14:04:44 +08:00
c5c8f5ea59
[EPLB] Support ernie4.5-moe ( #22100 )
...
Signed-off-by: Haisheng Chen <langzs335@outlook.com >
Signed-off-by: Haisheng Chen <60504847+HsChen-sys@users.noreply.github.com >
Signed-off-by: Haisheng Chen <hac048@ucsd.edu >
Co-authored-by: Haisheng Chen <langzs335@outlook.com >
2025-10-12 10:40:47 +08:00
01653a917b
[compile] Fix inductor partition config ( #26645 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-11 21:03:14 +00:00
0cd103e7cb
CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding ( #26509 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-11 20:50:57 +00:00
5be7ca1b99
[Benchmark] Support Infinity API ( #26641 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-12 01:45:32 +08:00
f0a30a067b
[Bugfix] Fix qwen-moe packed_modules_mapping ( #26634 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-11 15:21:33 +00:00
9d6cff3ede
[Bugfix][Qwen3VL] fix deepstack in qwen3vl ( #26626 )
...
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com >
2025-10-11 05:58:33 -07:00
a25f2adee9
[compile] Add patched_fused_scaled_matmul_reduce_scatter ( #26604 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-11 05:44:43 -07:00
d0bed837ac
[Refactor]Reduce duplicate code in serving_chat ( #26627 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-11 12:04:49 +00:00
f7ee69868a
[CPU] fix the issue when the node is '-' cause json decode error. ( #26562 )
...
Signed-off-by: muzian666 <andylee_2001@163.com >
Co-authored-by: qingan.li <qingan.li@wizpresso.com >
2025-10-11 12:04:04 +00:00
d2a71530c1
Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE ( #26485 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-11 10:14:41 +00:00
086609de64
fix(nix): Allow local oneDNN path to fix vLLM CPU build failure ( #26401 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
2025-10-11 09:12:16 +00:00
727144bed1
[Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py ( #24172 )
...
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com >
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: wwl2755 <wangwenlong2755@gmail.com >
2025-10-11 07:21:04 +00:00
55392bc879
[Bugfix][Multi Modal] Fix incorrect Molmo image processing ( #26563 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
2025-10-10 22:28:23 -07:00
ddaff2938e
[MM] Move Qwen3Omni MRoPE impl to model file ( #26608 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-10 22:17:24 -07:00
27ed39a347
[XPU] Upgrade NIXL to remove CUDA dependency ( #26570 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-10-11 05:15:23 +00:00
8f8474fbe3
[CI/Build] Fix ppc64le CPU build and tests ( #22443 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2025-10-11 13:04:42 +08:00
be067861c6
[Frontend] Improve the performance of is_reasoning_end ( #25735 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-11 10:43:39 +08:00
5bc26c438d
[BugFix] Make penalties and bad_words work with async scheduling ( #26467 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-10 23:27:04 +00:00
eef921f45e
AOT Compilation for torch.compile (Bundled) ( #24274 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-10-10 19:02:11 -04:00
e317414ce1
Cache the environment variable check for batch invariance ( #26510 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-10-10 22:47:34 +00:00
949cb0170d
[BugFix] Fix async scheduling + request preemption ( #26385 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-10 20:29:57 +00:00
e94cfd51da
[BUG] Qwen3-next MTP. Fix attn metadata build bug ( #26564 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-10-10 14:59:03 -04:00
7c12763b24
Fix some typing issues found by mypy==1.18.2 ( #26596 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-10 18:21:25 +00:00
3b780a4bbb
Update CUDA architecture list in build pipeline for 12.9.1 wheels ( #26592 )
...
Signed-off-by: Will Eaton <wseaton@users.noreply.github.com >
2025-10-10 11:15:27 -07:00
30f78af147
Update pre-commit hook versions ( #26591 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-10 17:03:44 +00:00
19a9b169bf
Add Qwen3-Omni moe thinker ( #25550 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Xiong Wang <feizi.wx@alibaba-inc.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-10 17:00:56 +00:00
96ad65b7fe
[Transform] [Quantization] Add QuTLASS support to vLLM ( #24440 )
...
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es >
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Signed-off-by: Andrei Panferov <andrei@panferov.org >
Co-authored-by: Andrei Panferov <andrei@panferov.org >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-10 09:43:40 -07:00
8d2b8c0ff2
[Model] Add FlexOlmo model implementation ( #24923 )
...
Signed-off-by: Shane A <shanea@allenai.org >
2025-10-10 09:43:15 -07:00
b2155ed317
[Model][Qwen3VL] Compute cu_seqlens on CPU to remove ( #26496 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-10 09:42:17 -07:00
910abdbd08
[Bugfix] fixed top_logprobs: -1 does not appear to work as intended ( #26470 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-11 00:41:17 +08:00
cddce79fda
[torch.compile] Make inductor partition rules respect splitting_ops #25691 ( #25845 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-10 16:35:28 +00:00
e519281920
[Metrics] Add test for multi-modal cache stats logging ( #26588 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-10 16:00:50 +00:00
7b03584de8
Silu v2 ( #25074 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: elvircrn <elvircrn@gmail.com >
Signed-off-by: Elvir Crnčević <elvircrn@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
2025-10-10 15:19:53 +00:00
ae9d0e7da5
[Bugfix] Make DP padding optional in coordinate_batch_across_dp ( #26375 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-10-10 10:53:33 -04:00
0e67102d93
Added test_top_k_per_row to test-pipeline.yaml. ( #26569 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-10 10:48:33 -04:00
f4ba2061cf
[BugFix][torch.compile] Fix fused_scaled_matmul_reduce_scatter signature for PyTorch 2.8 ( #26038 )
...
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com >
Signed-off-by: <>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-10 07:42:13 -07:00
1e6848a65d
[CI] fix test_run_batch.py::test_completions - AssertionError ( #26578 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-10 22:16:28 +08:00
67661375fa
[BugFix] Fix noop elimination edge case ( #26394 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-10-10 13:33:04 +00:00
213b64452a
[Bugfix] Convert untraceable GroupShape to list for AMD impl ( #26535 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
2025-10-10 13:32:29 +00:00
784c231151
[NIXL] Ignore abort on already-finished request ( #25067 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-10 12:21:56 +02:00
606b00e80f
[bugfix][DCP] fix block_size of hash in DCP prefix caching ( #26296 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-10 03:02:49 -07:00
720d3cd0f0
[CI] fix ruff format ( #26579 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-10 03:02:12 -07:00
ab196edefb
Remove LoRA bias support ( #25807 )
...
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com >
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-10 09:50:33 +00:00
3ee202ea1e
[GPT-OSS] Add support for arrays at tool message content ( #25593 )
...
Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com >
2025-10-10 09:00:45 +00:00
ad430a67ca
[Metrics] Log multi-modal cache stats and fix reset ( #26285 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-10 01:45:55 -07:00
6f0f570c43
[deepseek] kernel block size for UniformTypeKVCacheSpecs ( #26559 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-10 16:40:41 +08:00
b545a0b207
fix test_simple_inductor_graph_partition ( #26522 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-10 06:39:19 +00:00
29255cfc3b
[Spec-Decode] Support piecewise cudagraphs for Eagle head ( #25109 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-10-10 01:20:31 -04:00
da4455609d
[Chore]: One pythonic tool parser test uses the wrong parser ( #26515 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-10-10 04:03:55 +00:00
aafb99a4d4
[Core] Small simplification in GPUModelRunner._update_states() ( #26508 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-10 10:53:58 +08:00
757fa4a4da
[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY ( #23849 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-09 19:53:43 -07:00
c6187f55f7
Refactor MistralTokenizer ( #26358 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-10-09 22:48:58 +00:00
8983e0216f
[CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" ( #26448 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-09 15:16:48 -07:00
1ee35382cb
[Bug] Fix modular_kernel: ZeroDivisionError: integer division or modulo by zero ( #26528 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-09 15:13:27 -07:00
6e783bc54b
[Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency ( #26499 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-09 17:12:34 -04:00
c9d33c60dc
[UX] Add FlashInfer as default CUDA dependency ( #26443 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-09 14:10:02 -07:00
2e54db4d2b
[Core] Remove unused prev_sampled_token_ids_invalid_indices input batch field ( #26514 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-09 20:22:14 +00:00
44f633dba1
[Flashinfer][gpt-oss] Support FP8-qkv Flashinfer TRTLLM Sinks Attention ( #25674 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-10-09 16:13:39 -04:00
a462331e36
[Bugfix] Disable moe inplace for torch >= 2.9 ( #26497 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-09 18:07:38 +00:00
4069db3f2e
[Bugfix] Enable padded FP4 quantization ( #25947 )
...
Signed-off-by: Roi Koren <roik@nvidia.com >
2025-10-09 10:59:41 -07:00
0d37450eb7
[BUGFIX] Add cu_tokens_across_sp to DPMetadata ( #26457 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-10-09 17:13:56 +00:00
47e66c24e2
[Model] Apply shared experts overlap optimization to all models with shared experts ( #26145 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-09 11:31:04 -04:00
3b736e1c38
[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 ( #25049 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-10-09 08:06:29 -07:00
2c1c7dfb35
[Models][Qwen] Replace pad with cat for better performance ( #26486 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-09 14:51:26 +00:00
e246ad6f0c
Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 ( #26481 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-09 06:02:40 -07:00
5728da11ea
Revert #26113 "[Frontend] CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops" ( #26472 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-09 05:43:55 -07:00
92be3f3517
[Feature] Use pydantic validation in parallel.py config ( #26417 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-09 12:41:31 +00:00
d1ddf340c8
[V0 deprecation] Remove QKVCrossParallelLinear implementation ( #26475 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-09 10:52:27 +00:00
ec10fd0abc
[Bugfix] Move current_platform import to avoid python import cache. ( #16601 )
...
Signed-off-by: iwzbi <wzbi@zju.edu.cn >
2025-10-09 10:46:19 +00:00
0426e3c5e1
[Models][Qwen3VL] Optimise _validate_and_reshape_mm_tensor ( #26426 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-09 10:25:48 +00:00
4bdf7ac593
[Bugfix] Fix SHM cache initialization ( #26427 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-09 02:48:04 -07:00
dc7976dd9f
[Misc] Upgrade more code to Python 3.10 ( #26463 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-09 10:43:53 +01:00
e4791438ed
[Feature] Use pydantic validation in lora.py and load.py configs ( #26413 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
2025-10-09 02:38:33 -07:00
e6e898f95d
[doc] add Volcengine as a compute sponsor ( #26477 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-09 17:11:47 +08:00
ddcbc2f334
[Misc] Misc code simplifications ( #26450 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-09 02:10:06 -07:00
a83ff278d6
[torchao] Add support for ModuleFqnToConfig using regex ( #26001 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-10-09 08:32:32 +00:00
cf4cd6c24f
Add: Support for multiple hidden layers in Eagle3 ( #26164 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-09 07:30:50 +00:00
b960441812
Enable RMSNorm substitution for Transformers backend ( #26353 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-09 07:28:51 +00:00
1317028aa8
[Model] Gemma3: Fix GGUF loading and quantization ( #26189 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-09 07:00:53 +00:00
5e49c3e777
Bump Flashinfer to v0.4.0 ( #26326 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-10-08 23:58:44 -07:00
0d7c3cb51d
Update Dockerfile and install runai-model-streamer[gcs] package ( #26464 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
2025-10-08 23:48:51 -07:00
1b2c440cd6
[Core] Relax the LoRA max rank ( #26461 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-08 23:47:14 -07:00
0f29dca988
[CI/Build] Fix model nightly tests ( #26466 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-08 23:44:16 -07:00
d24cf322e1
[Hybrid]: Decouple Kernel Block Size from KV Page Size ( #24486 )
...
Signed-off-by: lizhiyuan <uniartisan2017@gmail.com >
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com >
2025-10-08 23:43:39 -07:00
d17f0fbf30
[Core][KVConnector] Propagate all tokens on resumed preemptions ( #24926 )
...
Signed-off-by: Qier Li <kevin44036@gmail.com >
Co-authored-by: Qier Li <qier@fb.com >
2025-10-09 14:43:31 +08:00
43ab8cfaa5
[MM][Doc] Add documentation for configurable mm profiling ( #26200 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-10-08 23:21:20 -07:00
de253d63b7
[Hardware][AMD] Enable FlexAttention backend on ROCm ( #26439 )
...
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com >
2025-10-09 06:20:18 +00:00
8bd696fa53
[Bugfix] Incorrect another MM data format in vllm bench throughput ( #26462 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-09 05:58:46 +00:00
bb6d8c21f9
[Bugfix] Catch and log invalid token ids in detokenizer #2 ( #26445 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-08 21:20:25 -07:00
ebf6ef1a9b
[Minor] Change warning->warning_once in preprocess ( #26455 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-08 21:09:06 -07:00
0c52d6ef81
[Bugfix] Set the minimum python version for gpt-oss ( #26392 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-08 20:35:49 -07:00
467a4f98f1
[Misc] Redact ray runtime env before logging ( #26302 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-08 17:43:34 -07:00
e614ab7806
Separate MLAAttention class from Attention ( #25103 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-08 17:11:11 -07:00
2a03f93de9
[Attention] Register FLASHMLA_SPARSE ( #26441 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-08 22:28:52 +00:00
da364615fc
[Kernels] Modular kernel refactor ( #24812 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-08 17:51:52 -04:00
f08919b7d1
[Bugfix] Respect min_tokens in scheduler stop check ( #26317 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
2025-10-08 14:08:24 -07:00
93f2c0aa08
[Models] Improve iteration over layers ( #26425 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-08 20:48:33 +00:00
4ebc9108a7
[Kernel] Centralize platform kernel import in current_platform.import_kernels ( #26286 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-08 20:25:31 +00:00
e1ba235668
[BugFix] Fix failing test quantization/test_compressed_tensors.py::test_compressed_tensors_fp8_block_enabled ( #26436 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
2025-10-08 20:04:12 +00:00
b82f4307c9
[Bugfix][Flashinfer] fix VLLM_USE_TRTLLM_ATTENTION issue for models with diff hyperparameters ( #25924 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2025-10-08 19:54:48 +00:00
76879cc160
[Attention] Implement universal BACKEND_MAP ( #25900 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-08 12:00:25 -07:00
b25d7b5657
[Feature] Change cache.py with pydantic validation ( #26390 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 11:12:59 -07:00
e09d1753ec
Remove Python 3.9 support ahead of PyTorch 2.9 in v0.11.1 ( #26416 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 10:40:42 -07:00
4ba8875749
[Bug] Fix Test in Batch Invariant ( #26128 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-08 10:13:47 -07:00
6273fe8d3d
[Benchmarks] Fix imports in FP8 tuning script ( #26407 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-08 16:31:59 +00:00
9fb3ae4e6f
[Bug] Fix DeepGEMM Attention Test ( #26423 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-08 12:23:41 -04:00
76afe4edf8
[Bugfix] Fix vllm bench ... on CPU-only head nodes ( #25283 )
...
Signed-off-by: Aydin Abiar <aydin@anyscale.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Aydin Abiar <aydin@anyscale.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-08 16:06:42 +00:00
c1b06fc182
[CI Failure] Fix pre-commit issue for install_nixl_from_source_ubuntu.py ( #26424 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-08 07:55:43 -07:00
241b4cfe66
[Refactor] Refactor FP8 & INT8 Quant Folder inside w8a8 ( #25293 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: courage17340 <courage17340@163.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: zxw <1020938856@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: chenlang <chen.lang5@zte.com.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: AlonKejzman <alonkeizman@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com >
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Icey <1790571317@qq.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com >
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: zixi-qi <qizixi@meta.com >
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Naman Lalit <nl2688@nyu.edu >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Patrick Toulme <ptoulme@meta.com >
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com >
Signed-off-by: Jialin Ouyang <jialino@meta.com >
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Signed-off-by: Juechen Liu <jueliu@meta.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: yingjun-mou <renzomou@gmail.com >
Signed-off-by: zhoukz <me@zhoukz.com >
Signed-off-by: Chenxi Yang <cxyang@fb.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Signed-off-by: Lee Nau <lnau@nvidia.com >
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: Lucia Fang <fanglu@meta.com >
Signed-off-by: a120092009 <zhaoty0121@gmail.com >
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
Signed-off-by: Lehua Ding <lehuading@tencent.com >
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
Signed-off-by: anion <1005128408@qq.com >
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com >
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
Signed-off-by: David Ben-David <davidb@pliops.com >
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Salvatore Cena <cena@cenas.it >
Signed-off-by: padg9912 <phone.and.desktop@gmail.com >
Signed-off-by: nadathurv <work.vnadathur@gmail.com >
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: billishyahao <bill.he@amd.com >
Signed-off-by: Nathan Scott <nathans@redhat.com >
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp >
Signed-off-by: Johnny <johnnynuca14@gmail.com >
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: Johnny <johnnync13@gmail.com >
Signed-off-by: Huamin Li <3ericli@gmail.com >
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com >
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
Signed-off-by: Peter Schuurman <psch@google.com >
Signed-off-by: Huy Do <huydhn@gmail.com >
Signed-off-by: leo-pony <nengjunma@outlook.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Signed-off-by: huijjj <huijong.jeong@squeezebits.com >
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: kyt <eluban4532@gmail.com >
Signed-off-by: Egor <e.a.krivov@gmail.com >
Signed-off-by: Yang <lymailforjob@gmail.com >
Signed-off-by: Paul Pak <paulpak58@gmail.com >
Signed-off-by: whx-sjtu <2952154980@qq.com >
Signed-off-by: Xiang Si <sixiang@google.com >
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com >
Signed-off-by: Jun Jiang <jasl9187@hotmail.com >
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com >
Co-authored-by: Nicole LiHui 🥜 <nicolelihui@outlook.com >
Co-authored-by: courage17340 <courage17340@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Jacob Kahn <jacobkahn1@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Nicole LiHui 🥜 <nicole.li@daocloud.io >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com >
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: yyzxw <34639446+yyzxw@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: chenlang <chen.lang5@zte.com.cn >
Co-authored-by: chenlang <10346245@zte.com.cn >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: AlonKejzman <alonkeizman@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com >
Co-authored-by: Shu Wang <shuw@nvidia.com >
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Eugene Khvedchenya <ekhvedchenya@gmail.com >
Co-authored-by: yitingdc <59356937+yitingdc@users.noreply.github.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: xaguilar-amd <xavier.aguilarfruto@amd.com >
Co-authored-by: Iceber Gu <caiwei95@hotmail.com >
Co-authored-by: Tao He <linzhu.ht@alibaba-inc.com >
Co-authored-by: Icey <1790571317@qq.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Xu Wenqing <121550081+Xu-Wenqing@users.noreply.github.com >
Co-authored-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: 阿丹(adan) <47373076+LDLINGLINGLING@users.noreply.github.com >
Co-authored-by: liudan <adan@minicpm.com >
Co-authored-by: liudan <liudan@qq.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: Clouddude <kouss.hd@gmail.com >
Co-authored-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com >
Co-authored-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Naman Lalit <nl2688@nyu.edu >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Xiaohan Zou <renovamenzxh@gmail.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Peter Pan <peter.pan@daocloud.io >
Co-authored-by: Patrick C. Toulme <135739773+patrick-toulme@users.noreply.github.com >
Co-authored-by: Clayton Coleman <smarterclayton@gmail.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Jialin Ouyang <jialino@meta.com >
Co-authored-by: weiliang <weiliangl@nvidia.com >
Co-authored-by: Yuxuan Zhang <2448370773@qq.com >
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com >
Co-authored-by: Juechen Liu <grinchcoder@gmail.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Yingjun Mou <renzomou@gmail.com >
Co-authored-by: Zhou Jiahao <me@zhoukz.com >
Co-authored-by: Chenxi Yang <cxyang@cs.utexas.edu >
Co-authored-by: Chenxi Yang <cxyang@fb.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Lee Nau <lee.nau@gmail.com >
Co-authored-by: Adrian Abeyta <aabeyta@redhat.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: acisseJZhong <40467976+acisseJZhong@users.noreply.github.com >
Co-authored-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com >
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Co-authored-by: a120092009 <33205509+a120092009@users.noreply.github.com >
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
Co-authored-by: Lehua Ding <lehuading@tencent.com >
Co-authored-by: Reza Barazesh <3146276+rzabarazesh@users.noreply.github.com >
Co-authored-by: ihb2032 <40718643+ihb2032@users.noreply.github.com >
Co-authored-by: Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com >
Co-authored-by: Anion <123177548+Anionex@users.noreply.github.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Or Ozeri <oro@il.ibm.com >
Co-authored-by: cjackal <44624812+cjackal@users.noreply.github.com >
Co-authored-by: David Ben-David <sdavidbd@gmail.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
Co-authored-by: Andrew Xia <axia@mit.edu >
Co-authored-by: Andrew Xia <axia@fb.com >
Co-authored-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: Param <psch@cs.unc.edu >
Co-authored-by: Zhewen Li <zhewenli@meta.com >
Co-authored-by: nadathurv <work.vnadathur@gmail.com >
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
Co-authored-by: Wenlong Wang <wangwenlong2755@gmail.com >
Co-authored-by: billishyahao <bill.he@amd.com >
Co-authored-by: Nathan Scott <natoscott@users.noreply.github.com >
Co-authored-by: Kenichi Maehashi <939877+kmaehashi@users.noreply.github.com >
Co-authored-by: Johnny <johnnync13@gmail.com >
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com >
Co-authored-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Hosang <156028780+hyoon1@users.noreply.github.com >
Co-authored-by: Jerry Zhang <jerryzh168@gmail.com >
Co-authored-by: pwschuurman <psch@google.com >
Co-authored-by: Huy Do <huydhn@gmail.com >
Co-authored-by: leo-pony <nengjunma@outlook.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
Co-authored-by: Andrew Xia <axia@meta.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: ahao-anyscale <ahao@anyscale.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Liu-congo <1502632128@qq.com >
Co-authored-by: HUIJONG JEONG <64083281+huijjj@users.noreply.github.com >
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com >
Co-authored-by: kyt <eluban4532@gmail.com >
Co-authored-by: Egor <e.a.krivov@gmail.com >
Co-authored-by: Yang Liu <127183760+KKSK-DON@users.noreply.github.com >
Co-authored-by: Paul Pak <52512091+paulpak58@users.noreply.github.com >
Co-authored-by: whx <56632993+whx-sjtu@users.noreply.github.com >
Co-authored-by: Xiang Si <sixiang@google.com >
Co-authored-by: Aleksandr Samarin <samarin_ad@mail.ru >
Co-authored-by: Jun Jiang <jasl9187@hotmail.com >
Co-authored-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nikhil G <nrghosh@users.noreply.github.com >
2025-10-08 10:20:48 -04:00
9fc983c707
[NIXL][non-cuda] Add install script for nixl with non-cuda ucx ( #25959 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
2025-10-08 14:19:53 +00:00
2f99f2f506
Tidy vllm/config/__init__.py to only add classes and functions ( #26405 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 07:10:00 -07:00
338b1bf04f
[Benchmarks] Add support for Qwen 3 VL MoE tuning ( #26419 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-08 14:01:08 +00:00
e39dc46f8f
[CI] Pooling models mteb test disable enforce_eager ( #26408 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-08 12:15:36 +00:00
10c75b5439
[Docs] Have mergify leave a comment with the docs preview link ( #26412 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-08 12:04:00 +00:00
f9582fd8f4
[Model] Allow passing custom number of max tiles to Nano 2 VL ( #26403 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
2025-10-08 11:19:39 +00:00
f377333bd7
[Misc] add usedforsecurity=False in md5 hash call ( #26357 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
2025-10-08 10:18:32 +00:00
f8607863d8
[Feature] Enable E8M0 by Default on Hopper for DeepGEMM, 5% E2E throughput improvement ( #26197 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-08 15:33:56 +08:00
335b28f7d1
[TPU] Rename tpu_commons to tpu_inference ( #26279 )
...
Signed-off-by: Utkarsh Sharma <utksharma@google.com >
Co-authored-by: Utkarsh Sharma <utksharma@google.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
2025-10-07 23:30:52 -07:00
5e65d6b2ad
fix[DP][v1]: Prevent hangs from mismatched worker configurations ( #26218 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com >
2025-10-07 22:55:08 -07:00
0d4f48fa10
[Bugfix] Incorrect MM data format in vllm bench throughput ( #26395 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-08 13:52:19 +08:00
127c8b782a
Add gather_indexer_k_quant_cache kernel ( #25931 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-10-08 04:58:57 +00:00
cd9890544b
fix(v1/kv_cache): resolve async KV transfer bug in cascade attention ( #23485 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com >
2025-10-08 04:46:33 +00:00
067da2d1df
[Core] Simplify setting new_token_ids in CachedRequestData ( #26388 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-08 03:32:37 +00:00
046118b938
Add SwigluOAI implementation for CPUFusedMOE ( #26347 )
...
Signed-off-by: Sharif Inamdar <sharif.inamdar@arm.com >
2025-10-07 20:17:49 -06:00
b32260ab85
[torchao] safetensors integration ( #25969 )
...
Signed-off-by: Angel Li <liangel@meta.com >
2025-10-07 20:12:35 -06:00
f80e7866c0
[Misc] Clean up cruft from previous FlashMLA sparse implementation ( #26125 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-08 10:09:34 +08:00
31a4b3e6c4
Revert #24446 and #26168 ( #26332 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-07 16:38:19 -06:00
caf8b1c084
[Bugfix] Fix MTP+FlashInfer crash when trtllm kernels are available but disabled ( #26361 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-07 22:12:26 +00:00
1b86bd8e18
Add more libraries to rlhf.md ( #26374 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-10-07 20:59:41 +00:00
59012df99b
[TPU] update TPU benchmark threshold ( #25713 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-10-07 13:53:09 -07:00
3d1f67616d
[Spec Decode] Enable efficient speculative decoding with FlashInfer-MLA ( #25984 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-07 16:05:59 -04:00
6ebaf43ee4
[V1] Logit processors for rejection sampler ( #19482 )
...
Signed-off-by: southfreebird <yvorott@gmail.com >
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com >
Signed-off-by: Sergei Skvortsov <yvorott@gmail.com >
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-07 13:02:49 -07:00
0c824fc46f
[Frontend] CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops ( #26113 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
2025-10-07 12:53:43 -07:00
eb577e4655
[Bugfix] Add missing sink tensor into flash attn cascade attn implementation ( #26325 )
2025-10-07 18:56:39 +00:00
8f36850f73
[Bug] Fix Shape Validation for Fallback while Enabling E8M0 for DeepGEMM ( #26322 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-07 13:50:30 -04:00
29fd2662ba
[deepseek] add EP8 FusedMOE config for H200 and B200 ( #26331 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-07 10:38:54 -07:00
30a3e5af69
[CI] Add Qwen3 MoE NVFP4 to Blackwell lm-eval ( #26316 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-07 10:36:15 -07:00
a38c1bfe09
[ci] Rename test_mxfp4_moe.py to test_ocp_mx_moe.py ( #26364 )
...
Signed-off-by: Felix Marty <Felix.Marty@amd.com >
2025-10-07 09:52:24 -07:00
320feae6f5
[Model] Lfm2Moe ( #26344 )
...
Signed-off-by: Paul Pak <paulpak58@gmail.com >
2025-10-07 16:03:05 +00:00
1e4ecca1d0
[V0 Deprecation] Remove VLLM_USE_V1 from tests ( #26341 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 15:42:31 +00:00
c0a7b89d8e
[Misc] Move LRUCache into its own file ( #26342 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 15:08:40 +00:00
6f59beaf0b
[Model] Add support for ModernBertForTokenClassification ( #26340 )
...
Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr >
Signed-off-by: antrec <antoine.recanati@gmail.com >
Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-07 14:29:19 +00:00
41f1cf38f2
[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 ( #21166 )
2025-10-07 09:35:26 -04:00
08d26a1b7e
[Model] Use merge_by_field_config for MM models (Ovis family) ( #26308 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-07 12:54:22 +00:00
63773a6200
[Docs] add docs for cuda graph v1 ( #24374 )
...
Signed-off-by: fhl <2410591650@qq.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-07 05:25:05 -07:00
883b42896a
Add TRL example notebook to RLHF docs ( #26346 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
2025-10-07 11:31:28 +00:00
e1098ced95
Add topk logits torch op for DS3.2. ( #25945 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-10-07 10:07:32 +00:00
d100d78eb3
Optimize KV cache distribution for asymmetric pipeline parallelism ( #25164 )
...
Signed-off-by: gholmes829 <g.holmes429@gmail.com >
2025-10-07 09:20:30 +00:00
7e4cd070b0
[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts ( #26336 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 16:46:44 +08:00
46b0779996
[BugFix] Update KV block hash type from BlockHash to ExternalBlockHash in kv_events_subscriber - #26264 ( #26265 )
...
Signed-off-by: atalhens <sneh.lata@nutanix.com >
2025-10-07 08:42:28 +00:00
de342585ff
[Model] Define merge_by_field_config MM interface (R-T) ( #26260 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 16:10:55 +08:00
185d8ed44f
[responsesAPI][bugfix] serialize harmony messages ( #26185 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-07 07:07:53 +00:00
d9836d4517
[Deprecation] Deprecate LLM.set_tokenizer ( #26333 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 06:50:57 +00:00
5f7e8a916a
[Model] Define merge_by_field_config MM interface (U-Z) ( #26261 )
...
Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-07 06:45:49 +00:00
4dbdf4a294
[BUG] Fix file parsing for load_format runai_streamer_sharded ( #26324 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-07 11:23:07 +08:00
c6873c4e6d
[UX] Support nested dicts in hf_overrides ( #25727 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-07 11:19:16 +08:00
2111b4643c
[Core] Simplify the Dp padding/should ubatch coordination logic ( #25768 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-07 01:57:49 +00:00
c50901f3b9
[Docs][DBO] Add initial doc that describes the DBO implementation ( #26024 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-10-07 00:47:28 +00:00
8229280a9c
[Misc] Define EP kernel arch list in Dockerfile ( #25635 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
2025-10-07 00:05:33 +00:00
f77df94647
[Perf] Add decode full-graph support to FlashInfer-MLA backend ( #26313 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-06 23:03:49 +00:00
f231e5bc21
[ROCm] Split AITER unified attention into its own backend ( #25507 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-06 22:49:23 +00:00
2161efe978
[Bugfix] Allow skipping MoE in NVFP4 (fix for MTP) ( #25987 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-06 16:16:30 -04:00
f23b4c04fd
[BugFix] Pad input buffers in _dummy_run ( #26209 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-06 16:07:51 -04:00
93540958b8
[Docs] Fix broken table in moe_kernel_features doc ( #26314 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-06 15:58:05 -04:00
44b9af5bb2
[Benchmark] Enable MM Embedding benchmarks ( #26310 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-06 19:51:58 +00:00
7cd95dc8a3
[Bugfix] Fix gemma3 with transformers backend ( #23178 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Raushan Turganbay <raushan@huggingface.co >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-06 18:42:32 +00:00
c02058c222
Add bias handling to CPUFusedMOE kernel ( #26289 )
...
Signed-off-by: Crefeda Rodrigues <crefeda.rodrigues@arm.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Crefeda Rodrigues <65665931+cfRod@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Sharif Inamdar <Sharif.Inamdar@arm.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-10-06 18:39:10 +00:00
b2ea5ba677
[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs ( #26231 )
...
Signed-off-by: seven-mile <i@7li.moe >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-06 18:24:22 +00:00
824a3f403f
[Misc] auto_tune: kill specific vllm process ( #26304 )
...
Signed-off-by: Karan Goel <karangoel@google.com >
2025-10-06 18:02:51 +00:00
05f6846ede
Support llama3 eagle3 head with llama4 verifier ( #25961 )
...
Signed-off-by: rahul-tuli <rtuli@redhat.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-06 13:56:08 -04:00
20db99cc69
[CI Bugfix] Make sure TRTLLM attention is available in test_blackwell_moe ( #26188 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-06 13:50:11 -04:00
6431be808f
[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input ( #26295 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-06 17:19:34 +00:00
4727a8afa7
[Attention] Remove unused reorder_batch method ( #24463 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-06 13:13:39 -04:00
b8f603cebe
[Model] EVS support for nano_nemotron_vl ( #26269 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
2025-10-07 00:23:37 +08:00
fc679696f8
Fix DotsOCR tensor type ( #26281 )
...
Signed-off-by: what_in_the_nim <chatcharinsang@gmail.com >
2025-10-06 12:23:43 +00:00
ab5e7d93f4
[Bugfix] Fix mrope in Transformers Backend ( #26087 )
...
Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-06 11:40:50 +00:00
0340f45553
Support expert parallel load balancing in Transformers backend ( #26287 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-06 11:20:16 +00:00
19a00eb210
[Model] Use merge_by_field_config for MM models (Llava family) ( #26280 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-06 09:45:26 +00:00
391612e78b
[Frontend] Consolidate tokenizer init code ( #26276 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-06 09:34:52 +00:00
77c95f72f7
[Doc] add KAITO to integrations ( #25521 )
...
Signed-off-by: "Abhishek Sheth" <absheth@microsoft.com >
2025-10-06 17:30:03 +08:00
59f30d0448
[Docs] Edit HF Inference Endpoints documentation ( #26275 )
...
Signed-off-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com >
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com >
2025-10-06 10:13:09 +01:00
43c146ca42
[Misc] Clean up unnecessary E501 ignore ( #26274 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-06 07:29:18 +00:00
7c2ec0fe87
[Benchmarking] Add disable_shuffle option for dataset loading ( #26258 )
...
Signed-off-by: Yasmin Moslem <48152713+ymoslem@users.noreply.github.com >
2025-10-06 07:05:44 +00:00
039b6bade3
Bump actions/stale from 10.0.0 to 10.1.0 ( #26272 )
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 07:01:21 +00:00
6c04638214
Fix per file ruff ignores related to line length ( #26262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-06 05:12:40 +00:00
91ac7f764d
[CI][gpt-oss] Enable python tool tests in CI ( #24315 )
...
Signed-off-by: wuhang <wuhang6@huawei.com >
2025-10-06 04:20:06 +00:00
4be7d7c1c9
[MISC] Add heheda12345 to CODEOWNERS of vllm/config/cache.py ( #26270 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-06 10:58:59 +08:00
59b477645c
[Doc] Edited minor typo ( #26266 )
...
Signed-off-by: Orange Ng <ngquanhao@outlook.com >
2025-10-05 19:53:09 -07:00
778f554157
[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching ( #26222 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-06 10:40:30 +08:00
d3c84297c3
[CI] Add comment about the single cudagraph capture size that is used ( #26252 )
2025-10-06 02:35:37 +00:00
f509a20846
[DOC] Update production-stack.md ( #26177 )
...
Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com >
2025-10-05 21:32:48 +00:00
60bc25e74c
[CI] Add Blackwell LM Eval Small Models test to nightly ( #26052 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-05 14:59:50 -06:00
b893d661b1
Fix per file ruff ignores related to simplification ( #26259 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 20:31:53 +00:00
6b6e98775f
[NVIDIA] flashinfer TRTLLM attention prefill token limit ( #25998 )
...
Signed-off-by: jasonlizhengjian <jason.li@centml.ai >
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com >
2025-10-05 14:24:37 -06:00
9c3c21c519
[CI] fix mamba kernel test ( #26250 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-05 18:26:59 +00:00
512b8affa4
Update ruff pre-commit hooks version ( #26255 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-05 09:50:50 -07:00
1c0c68202c
Fix per file ruff ignores related to typing ( #26254 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 16:37:55 +00:00
5f317530ec
fix(tests): Resolve late binding of loop variable in assert message lambda ( #26249 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com
2025-10-05 09:18:22 -07:00
557b2e961d
Remove all cases of fmt: on/off ( #26253 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 09:18:14 -07:00
4e256cadc2
Remove all references to yapf as it's no longer used ( #26251 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 09:18:11 -07:00
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-05 07:06:22 -07:00
17edd8a807
[Platform][Kernel] platform-specific kernel loading ( #25823 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com >
2025-10-05 13:25:15 +02:00
3303cfb4ac
[Bugfix][Hardware][RISC-V] Limit supported dtypes to float32 to avoid scheduler segfault ( #26228 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
2025-10-05 10:36:54 +00:00
b7e8e4e6be
[Bugfix] Always apply MM processor even when no MM items are passed ( #26240 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-05 10:10:20 +00:00
432e1cbc23
[Bugfix]: Assertion error when using FlashInfer backend ( #25933 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-05 16:46:36 +08:00
201c971e96
[Perf][Easy] Early stop in request_block_hasher ( #26112 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-05 16:46:03 +08:00
e0986ea07b
Add documentation for granite 4 tool calling ( #26175 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-10-05 07:35:42 +00:00
a964e5e6c3
[Bugfix] Allow --skip-tokenizer-init with echo and return_token_ids ( #26238 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-05 05:38:53 +00:00
78c1d5bfd2
[Easy] Add str repr for IterationStats ( #26232 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-05 05:00:21 +00:00
59a85c366e
[Model] Use merge_by_field_config for MM models (H-L) ( #26230 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-05 11:54:17 +08:00
119f00630b
[Renderer] Clean up renderer code ( #26216 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 17:05:29 +00:00
a42d2df75f
[Frontend] Cache chat template kwargs resolution ( #26227 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-04 15:32:30 +00:00
5c057e068f
[CPU] Refine batch reorder of CPU attention backend ( #26096 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-04 21:54:35 +08:00
ed3aeb25a4
[V1] [Hybrid] Remove code to override default CUDA graph configuration ( #26226 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-04 13:47:48 +00:00
86ee949128
Fix tensor device and dtype placement in Qwen2VL model ( #26219 )
...
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Yuanfeng Li <yuanfengli@meta.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-04 06:41:39 -07:00
4570535ec4
[Model] CLIP Embedding Support ( #26010 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 06:21:42 -07:00
2a6dc67eb5
[Bugfix] Fix _reqs_to_process leak on abort ( #26012 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-04 11:39:31 +00:00
f05fea1f5e
[Core] Enable decode of context length equal to max model length ( #26168 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
2025-10-04 09:59:26 +00:00
d0df145c2a
Add Olmo 3 reasoning parser ( #26054 )
...
Signed-off-by: Luca Soldaini <luca@soldaini.net >
2025-10-04 17:48:29 +08:00
1838cd4860
Revert "Add batch invariant kernel override for FlashInfer backend [2/n]" ( #26220 )
2025-10-04 02:45:08 -07:00
7d6b03381e
[CI Failure] fix_test_auto_prefix_cache_support ( #26053 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-04 02:44:49 -07:00
7c2e91c4e0
[Misc] Remove unused executor.apply_model ( #26215 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 01:45:53 -07:00
736fbf4c89
[Misc] Require merge_by_field_config argument ( #26214 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 01:40:14 -07:00
44ea85137a
[Model] Support nested structures for TensorSchema ( #26212 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-04 01:20:32 -07:00
d3d649efec
Support expert parallel in Transformers backend ( #26162 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-04 04:35:04 +00:00
ea507c3a93
[V1] [Hybrid] Mamba2 Automatic Prefix Caching ( #25752 )
...
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com >
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com >
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-04 06:34:22 +02:00
9705fba7b7
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack ( #25948 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2025-10-04 12:16:38 +08:00
2f7dbc9b42
Add batch invariant kernel override for FlashInfer backend [2/n] ( #25769 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-03 19:49:30 -07:00
ea25a76c05
[BugFix] Use async Mistral Tokenizer in Chat Completions ( #26134 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-04 09:42:08 +08:00
67bc0c003e
[Bugfix] Fix qwen3 vl dummy data generation with overrides ( #26193 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-04 01:40:20 +00:00
5a05f26603
Fix issue of using only the part of video frame [Nemotron Nano] ( #26186 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
2025-10-04 00:21:00 +00:00
7ef40bb983
[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels ( #25488 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-03 20:13:13 -04:00
767cbb011d
[CI] Fix Pre-commit Mypy Error ( #26181 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 16:08:03 -07:00
7cfa4b24bf
[BugFix] Fix de-functionalization pass for rotary_embedding ( #23953 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-03 15:44:18 -07:00
b71fcd4905
[Misc] Add penalties sampling parameters to serve tool ( #25974 )
...
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com >
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com >
2025-10-03 15:43:14 -07:00
75003f34e8
[CI] Push multiarch manifests as nightly builds ( #25764 )
...
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com >
2025-10-03 15:42:55 -07:00
78b8015a4d
[Bugfix] Relax tokenizer regex for mixtral to include 'tokenizer.model' ( #25964 )
...
Signed-off-by: Bowen Bao <bowenbao@amd.com >
2025-10-03 18:31:59 -04:00
831b124151
[responsesAPI] add better error messaging for long prompts ( #25724 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-03 14:33:13 -07:00
c1ffcb55da
[Refactor] Optimize FP8 MOE Backend Choice and Log ( #26044 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 15:23:42 -06:00
0879736aab
[Perf] Remove hardcoded num_warps=1 ( #26183 )
...
Signed-off-by: Corey Lowman <clowman1993@gmail.com >
2025-10-03 20:38:50 +00:00
a26917332f
[Quantization/NVFP4] Speed up TRTLLM NVFP4 MOE weight loading and fix K/V scale loading for MLA Attn ( #25968 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-10-03 19:35:06 +00:00
cd9e5b8340
Fix V1 engine serialization error with Ray distributed executor ( #26148 )
...
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com >
2025-10-03 18:39:45 +00:00
300a59c4c3
Avoid division by zero in cache DS MLA kernel ( #26174 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-03 17:35:17 +00:00
d76541a6c5
Stop mergify from keeping stale PRs alive ( #26169 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-03 16:42:34 +00:00
dd96465fd7
[BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 ( #26123 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-03 08:52:26 -07:00
4f8f47e87e
Fix undefined symbol: cutlass_moe_mm_sm100 ( #26098 )
...
Signed-off-by: Jun Jiang <jasl9187@hotmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-03 15:48:32 +00:00
d78fda7cda
[Renderer] Move Processor out of LLMEngine ( #26165 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-03 15:08:22 +00:00
73a99cc2a5
[Model] Fixed stream generator for gpt-oss + spec-decoding ( #26027 )
...
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com >
2025-10-03 13:43:41 +00:00
adae0c1f43
[CI/Build] do not enforce precompilation on tpu ci tests ( #25992 )
...
Signed-off-by: Xiang Si <sixiang@google.com >
2025-10-03 13:38:42 +00:00
cbf9221992
[Model] Supplement to PR 24862: Pass param prefix to LLMHead ( #25805 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
2025-10-03 21:34:53 +08:00
5f42fc53b6
[backends][short_conv] CUDA graph piecewise edits ( #24215 )
...
Signed-off-by: Paul Pak <paulpak58@gmail.com >
2025-10-03 12:59:48 +00:00
8ee846c27c
[Bugfix] Re-enable prefill of max model length ( #24446 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
2025-10-03 14:13:34 +02:00
812b7f54a8
[Renderer] Move Processor out of AsyncLLM ( #24138 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-03 11:29:45 +00:00
5f2cacdb1e
Quick fix for IMA with the Prefix Prefill kernel during graph capture ( #25983 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-10-03 11:28:22 +00:00
aa5053e3fe
[Doc] Fixed shape description for fused_batched_moe.py ( #25668 )
...
Signed-off-by: Egor <e.a.krivov@gmail.com >
2025-10-03 04:00:23 -07:00
79aa244678
[Multi Modal] Configurable MM Profiling ( #25631 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-03 03:59:10 -07:00
2ed3f20dba
[openai] Fix missing tool usage check (system message) ( #24768 )
...
Signed-off-by: kyt <eluban4532@gmail.com >
2025-10-03 18:55:44 +08:00
48f309029a
[NIXL][Misc] Expose metrics from NIXL for logging to CLI ( #25388 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-03 10:47:59 +00:00
0e93ac0b3a
[CI] Fix distributed hybrid tests in CI ( #26155 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-03 09:14:18 +00:00
5446ad1d24
[test utils] correct wrong typing ( #26159 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
2025-10-03 02:11:49 -07:00
f9a8084e48
[Model] Use merge_by_field_config for MM models (InternVL family) ( #26153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-03 01:59:06 -07:00
3e70e3d4d5
add(v1): RequestStatesStats to RequestOutput ( #24947 )
...
Signed-off-by: huijjj <huijong.jeong@squeezebits.com >
2025-10-03 08:56:25 +00:00
eb0fa43868
[Perf] Optimize reshape_and_cache CUDA Kernel ( #25955 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Liu-congo <1502632128@qq.com >
2025-10-03 01:33:46 -07:00
0ad9951c41
[Input] Remove unused prompt field ( #26097 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-03 00:23:21 -07:00
8c9117181d
[Misc] Remove typing.List ( #26150 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-03 07:00:33 +00:00
c4b48d3c0f
[BUG] Reorder model config creation ( #26124 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-03 14:59:36 +08:00
10d765482d
FusedMoE support for the Transformers backend (#22650 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-02 23:12:15 -07:00
39b643dc1a
[Model] Use merge_by_field_config for MM models (G) ( #26117 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-02 22:38:29 -07:00
711f485643
[Bugfix] Fix import gemm_afp4wfp4 failure on AMD ( #26068 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-02 22:37:25 -07:00
9c5ee91b2a
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm ( #26104 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-10-02 22:34:53 -07:00
27edd2aeb4
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv ( #26103 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2025-10-02 22:21:01 -07:00
e5017cd6d6
[gpt-oss] disable tool server initialization if no tool in request ( #25790 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-03 05:08:35 +00:00
6a7796e871
[Bug]: Limit num_reqs in dummy_run when max_num_seqs is small ( #26144 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-03 04:00:20 +00:00
47b9339546
[DeepSeek] Improve performance of DS MLA cache kernel ( #26132 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-02 20:35:47 -07:00
5d5146eee3
[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper ( #26138 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-02 20:32:38 -07:00
2aaa423842
[Attention] Move Backend enum into registry ( #25893 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-02 20:32:24 -07:00
ad2d788016
[Bug][Benchmark] Fix duplicate req in oversampling ( #26140 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-03 02:55:24 +00:00
36ce76c632
[Log] Optimize DeepGEMM Missing Log ( #26106 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-02 20:02:26 -06:00
f1fc2107a3
[Bugfix] Disable cascade attention with FlashInfer ( #26130 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-02 16:30:37 -07:00
13cdc02173
Fix MTP with deepep_low_latency ( #25904 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-02 21:29:49 +00:00
502640c3f9
[Perf] Fix and reapply move apply w8a8 block fp8 linear to class ( #25696 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
2025-10-02 19:35:13 +00:00
3d5f1c8640
[Mamba][KVCacheManager] Simplify kv cache manage logic for mamba + MTP ( #25119 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-02 18:48:31 +00:00
1cab2f9cad
EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 32% instead of 5% on MTBench ( #25916 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
2025-10-02 11:29:35 -07:00
1e50f1be70
[Deepseek v3.2] Support indexer prefill chunking ( #25999 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-02 10:29:12 -07:00
ad87ba927a
[Small] Prevent bypassing media domain restriction via HTTP redirects ( #26035 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-10-02 10:27:10 -07:00
decf7f794b
[BugFix] Fix FI accuracy issue when used for MLA prefill ( #26063 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-02 17:18:13 +00:00
d00d652998
[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command ( #25967 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-02 10:04:57 -07:00
3b279a84be
[CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests ( #26040 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-02 09:07:19 -07:00
5e4a8223c6
[Qwen][ROCm] Flash Attention Rotary Embeddings ( #24642 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-02 08:26:08 -07:00
e51de388a2
[Platform][CI] Added OOT platform interface e2e test that running on Ascend NPU ( #25470 )
...
Signed-off-by: leo-pony <nengjunma@outlook.com >
2025-10-02 23:19:22 +08:00
cc253b73d3
[Model] Use merge_by_field_config for MM models (D-F) ( #26076 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-02 08:17:35 -07:00
7d6fb905d9
[Model] Use merge_by_field_config for MM models (A-C) ( #26073 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-02 08:17:31 -07:00
418d111f8c
[FA/Chore] Bump vllm-flash-attention ( #25537 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-02 11:06:14 -04:00
be8921fbba
Change size of single CUDA graph for CI to 4 ( #26089 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-10-02 14:14:28 +00:00
d4e7a1152d
Update base image to 22.04 (jammy) ( #26065 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-02 05:48:04 -07:00
be22bb6f3d
Run:ai model streamer add GCS package support ( #24909 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
2025-10-01 20:59:13 -07:00
169313b9f8
[Misc] Make handling of SamplingParams clearer in n>1 case ( #26032 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-01 19:31:39 -07:00
0b018d8baf
[ROCm][Bugfix] Add missing parameter to ROCm backend ( #26029 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-01 19:23:14 -07:00
c31246800c
Support RL online quantization with torchao ( #23014 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-10-01 16:39:29 -07:00
4134312b35
[BugFix] ChunkedLocalAttention is currently not CG compatible ( #26034 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-01 16:28:00 -07:00
da554f932e
[Bug] Fix Negative Cuda Memory Usage ( #25683 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-01 18:16:26 -04:00
aac622e0cd
[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series ( #25908 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com >
2025-10-01 21:39:49 +00:00
1726e93ef1
[BugFix][DP/EP] Fix CUTLASS MLA hang under load ( #26026 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-10-01 12:30:00 -07:00
ee04c0cd04
[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability ( #26030 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-01 12:02:17 -07:00
c36f0aa300
Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets ( #25995 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-01 18:18:36 +00:00
5234dc7451
[NVIDIA] Blackwell Family ( #24673 )
...
Signed-off-by: Johnny <johnnynuca14@gmail.com >
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: Johnny <johnnync13@gmail.com >
Signed-off-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com >
Co-authored-by: Salvatore Cena <cena@cenas.it >
2025-10-01 10:50:54 -07:00
3b7c20a6b5
[Bugfix] Apply same sampling parameters for both n=1 and n>1 ( #26005 )
...
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp >
2025-10-01 14:37:35 +00:00
f9e714813a
[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type ( #26007 )
...
Signed-off-by: Nathan Scott <nathans@redhat.com >
2025-10-01 12:41:57 +00:00
2518230d3e
[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 ( #25829 )
...
Signed-off-by: billishyahao <bill.he@amd.com >
Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com >
2025-10-01 08:39:45 -04:00
a332b84578
[CI] Only capture a single CUDA graph size in CI by default ( #25951 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-01 10:03:44 +01:00
1405f0c7ba
[Misc] Factor out common _apply_feature_select_strategy ( #26003 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-01 01:31:03 -07:00
84d57342b6
[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker ( #26004 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-10-01 08:03:25 +00:00
57b46d769e
[Doc] updating torch.compile doc link ( #25989 )
...
Signed-off-by: nadathurv <work.vnadathur@gmail.com >
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
2025-10-01 07:04:56 +00:00
f48b6a03ba
[Misc]allow disable pynccl ( #25421 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-10-01 06:04:13 +00:00
2a69ab4899
Update to Transformers v4.56.2 ( #24638 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-30 22:07:07 -07:00
8d7da92fd7
[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 ( #25988 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-09-30 21:58:31 -07:00
e952eee698
[Bugfix] Fix __syncwarp on ROCM ( #25996 )
2025-09-30 21:15:11 -07:00
66bca9b8bd
[MM] Add text-only mode for Qwen3-VL ( #26000 )
2025-09-30 21:13:42 -07:00
99028fda44
Fix INT8 quantization error on Blackwell GPUs (SM100+) ( #25935 )
...
Signed-off-by: padg9912 <phone.and.desktop@gmail.com >
2025-09-30 19:19:53 -07:00
1244948885
[Log] Optimize Log for FP8MOE ( #25709 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-09-30 19:18:43 -07:00
a73f6491c8
Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning ( #25843 )
...
Signed-off-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-30 19:18:19 -07:00
001e50c92c
[Model] MTP fallback to eager for DeepSeek v32 ( #25982 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-10-01 01:53:22 +00:00
96ebcaa3ad
[Misc] Make EP kernels install script support uv ( #25785 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-09-30 23:38:34 +00:00
5db1870bb9
[gpt-oss] use vLLM instead of openai types for streaming ( #25186 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-09-30 22:47:07 +00:00
2ce26b9b5d
[Docs] Remove API Reference from search index ( #25949 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 22:10:02 +00:00
a388252ac4
Add explicit pooling classes for the Transformers backend ( #25322 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-30 23:07:06 +01:00
9a9f48dff7
[V1] [P/D] Add Support for KV Load Failure Recovery ( #19330 )
...
Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
2025-09-30 14:57:08 -07:00
67f3fb0844
[Bench] Add DeepSeekV32 to MoE benchmark ( #25962 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-30 14:13:48 -07:00
43b752c325
[Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding ( #25889 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
2025-09-30 20:35:15 +00:00
cfd302db9b
OffloadingConnector: Fix GPU block tracking bug ( #25856 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-09-30 19:53:04 +00:00
fb610ae684
[Docs] Add moe kernel features doc ( #25297 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 19:03:15 +00:00
2f652e6cdf
[Doc] Improve MM Pooling model documentation ( #25966 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-30 18:58:29 +00:00
e6a226efba
[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' ( #25958 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-09-30 11:13:03 -07:00
a2e6fa7e03
[bugfix][deepseek] fix flashmla kernel selection ( #25956 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-01 00:30:36 +08:00
9f1c4ecaf2
[Bugfix] Token type and position embeddings fail to be applied to inputs_embeds ( #25922 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-01 00:23:12 +08:00
ef283548f7
[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging ( #25895 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-09-30 10:51:31 -04:00
f4db5e6de1
[Bugfix][Model] Fix inference for Hunyuan dense models ( #25354 )
...
Signed-off-by: anion <1005128408@qq.com >
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com >
2025-09-30 14:38:07 +00:00
099aaee536
Add Hugging Face Inference Endpoints guide to Deployment docs ( #25886 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 14:35:06 +00:00
35fe398c7c
[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 ( #25858 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-09-30 07:30:44 -07:00
bb6d43047e
[Fix] Improve CPU backend compatibility for RISC-V ( #25816 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
2025-09-30 13:48:07 +00:00
bc546f76a1
[CI] Move applicable tests to CPU ( #24080 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 14:45:20 +01:00
80608ba5af
[NIXL] Add support for MLA caches with different latent dim ( #25902 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-09-30 12:18:29 +00:00
e184c9c510
[perf] Use CPU tensor to reduce GPU->CPU sync ( #25884 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-09-30 19:51:16 +08:00
d7e34b4210
[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs ( #25938 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-30 11:24:57 +00:00
ef6e0e7132
[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 ( #25936 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-09-30 19:11:21 +08:00
1ad3aca682
Updated TRL integration docs ( #25684 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-30 03:10:55 -07:00
8d0afa9b42
[Doc] Add Cambricon MLU support ( #25942 )
...
Signed-off-by: a120092009 <zhaoty0121@gmail.com >
2025-09-30 17:59:47 +08:00
fa7e254a7f
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com >
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
2025-09-30 17:14:41 +08:00
e23cacda35
[Bugfix]: Clean up chunked prefill logging when using whisper ( #25075 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
2025-09-30 08:17:49 +00:00
2e1b8bc2b6
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not ( #25925 )
...
Signed-off-by: zhoukz <me@zhoukz.com >
2025-09-30 08:15:23 +00:00
e47433b3c1
[BugFix] Pass config_format via try_get_generation_config ( #25912 )
2025-09-30 05:09:50 +00:00
23194d83e8
[BugFix] Fix DP/EP hang ( #25906 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-09-30 04:18:59 +00:00
61aedb5ffe
MoveVllmConfig from config/__init__.py to config/vllm.py ( #25271 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-29 19:49:49 -07:00
d3bd171123
[Benchmark] Support benchmark throughput for external launcher DP ( #25913 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-09-30 01:43:57 +00:00
89e4050af4
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 ( #25909 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-30 09:15:19 +08:00
78a47f87ce
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models ( #25717 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-09-30 08:10:58 +08:00
6a113d9aed
[V0 Deprecation] Remove vllm.worker and update according imports ( #25901 )
2025-09-29 23:26:11 +00:00
2e4fe48c37
[NIXL] Increase default KV block eviction timeout on P ( #25897 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-09-29 21:35:14 +00:00
8eb0a1d906
[Doc] Polish example for torchrun dp ( #25899 )
2025-09-29 21:31:34 +00:00
fea3e476aa
[Kernel] Chunk-aligned mamba2 ( #24683 )
2025-09-29 23:18:25 +02:00
61a3431613
[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so ( #25605 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-09-29 17:01:50 -04:00
9bedac9623
[Doc] Add documentation for vLLM continuous benchmarking and profiling ( #25819 )
...
Signed-off-by: Naman Lalit <nl2688@nyu.edu >
2025-09-29 20:49:49 +00:00
c42ff4f4fd
[BugFix][torch.compile] KV scale calculation issues with FP8 quantization ( #25513 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
2025-09-29 15:52:04 -04:00
d5ab28511c
[Bugfix] Use correct key "ignore" for config.json non-quantized layers ( #25706 )
...
Signed-off-by: Lee Nau <lnau@nvidia.com >
2025-09-29 15:07:29 -04:00
e61eb5e09d
[Model] Remove MotifForCausalLM ( #25866 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-30 00:36:30 +08:00
0899ba5b42
[CI/Build] Include Transformers backend test in nightly transformers test ( #25885 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-29 09:33:39 -07:00
145ac73317
[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue ( #25883 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-09-29 11:37:20 -04:00
d0d138bc55
[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) ( #24690 )
...
Signed-off-by: Chenxi Yang <cxyang@fb.com >
Co-authored-by: Chenxi Yang <cxyang@fb.com >
2025-09-29 14:31:51 +00:00
43227236ec
[torch.compile] serialize cudagraph_mode as its enum name instead of value ( #25868 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-09-29 13:54:52 +00:00
8616300ae2
[Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models ( #25854 )
...
Signed-off-by: zhoukz <me@zhoukz.com >
2025-09-29 10:59:04 +00:00
edbaadd91f
[Bugfix] Fix requirements paths in install instructions ( #25827 )
...
Signed-off-by: yingjun-mou <renzomou@gmail.com >
2025-09-29 03:49:35 -07:00
9360d34fa1
update to latest deepgemm for dsv3.2 ( #25871 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-09-29 17:51:43 +08:00
1b67b04656
[Misc] Remove more get_input_embeddings_v0 ( #25857 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-29 08:03:37 +00:00
bd51f78e39
[V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge ( #25331 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
2025-09-29 14:09:18 +08:00
65ecb4f134
[Bugfix] Fallback ViT attn backend to SDPA for blackwell ( #25851 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-09-29 06:03:51 +00:00
143844fa43
[XPU]Fix xpu spec decoding UTs, avoid using cuda graph ( #25847 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-09-29 05:15:10 +00:00
219cfbe7f6
Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS ( #25832 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-09-29 05:08:17 +00:00
9b44a7d926
[P/D] NIXL Updates ( #25844 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-09-29 04:46:30 +00:00
a3ae45a38c
[Misc] fix tests failure by using current_platform ( #25825 )
...
Signed-off-by: Juechen Liu <jueliu@meta.com >
2025-09-29 04:18:57 +00:00
0307428d65
Remove redundant cudagraph dispatcher warning ( #25841 )
2025-09-28 17:12:42 -04:00
471997adf6
[Bugfix] fix Qwen3VLMoe load when pp > 1 ( #25838 )
...
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com >
2025-09-28 17:56:12 +00:00
b1ded114b9
Update GLM-4.5 Doc transformers version ( #25830 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
2025-09-28 12:05:51 +00:00
f4e4088c99
Fix random dataset mismatched token length with config. ( #24937 )
...
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-28 08:23:44 +00:00
0efd540dbc
[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling ( #25557 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-28 04:21:01 +00:00
6144754014
[Bugfix] Fix Qwen3-VL regression from #24982 ( #25814 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-09-28 03:21:09 +00:00
69311446ba
[MM] Optimize memory profiling for scattered multimodal embeddings ( #25810 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-09-28 02:17:58 +00:00
da63274d9f
[Bugfix][NIXL] Fix Async Scheduler timeout issue ( #25808 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-09-27 15:17:35 -04:00
c216119d64
[Core] GC Debug callback ( #24829 )
...
Signed-off-by: Jialin Ouyang <jialino@meta.com >
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Jialin Ouyang <jialino@meta.com >
2025-09-27 17:53:31 +00:00
5546acb463
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location ( #25766 )
...
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com >
2025-09-27 13:36:28 -04:00
c0ec81836f
[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable ( #25651 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-09-27 16:09:00 +00:00
b65e56babe
[Core] Refactor self.model() to call a helper for subclassing. ( #25084 )
...
Signed-off-by: Patrick Toulme <ptoulme@meta.com >
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com >
2025-09-27 08:40:59 -07:00
49996cd597
[env] default nixl side port conflicts with kv-event zmq port ( #25056 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
2025-09-27 15:02:40 +00:00
ecb37e276a
[docs] transcriptions API audio upload ( #25446 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-09-27 15:00:35 +00:00
a5354b3ed2
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models ( #24982 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-09-27 14:22:28 +00:00
f9df8b4ad7
[Bugfix] Fix triton import precommit failure ( #25803 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-09-27 07:13:11 -07:00
ec152c8748
Fix GPTQ model loading in Transformers backend ( #25770 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-27 12:18:20 +00:00
7977e5027c
Add filtering for chat template kwargs ( #25794 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-27 10:46:49 +00:00
3f5d902d2a
Validate API tokens in constant time ( #25781 )
...
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
2025-09-27 18:09:26 +08:00
27d7638b94
[Bugfix] Merge MM embeddings by index instead of token IDs ( #16229 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-27 08:15:12 +00:00
176173989a
[Bugfix] Add missing image_size for phi4_multimodal ( #25796 )
2025-09-27 07:59:22 +00:00
23b8ee672d
[Misc] Update openai client example file for multimodal ( #25795 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-27 07:57:07 +00:00
3939152069
[Misc] Fix codeowners override for v1 sample and attention ( #25037 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-09-27 07:47:29 +00:00
cd87bfbf37
[CI/Build] Reorganize root-level V1 tests ( #25767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-27 13:51:15 +08:00
b3613e3ace
[CI/Build] Add timing to Model Executor Test ( #25799 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-09-26 21:57:27 -07:00
d346ec695e
[CI/Build] Consolidate model loader tests and requirements ( #25765 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-26 21:45:20 -07:00
c242c98031
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL ( #25788 )
2025-09-26 20:44:52 -07:00
f1d53d150c
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl ( #22872 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
2025-09-27 03:35:47 +00:00
92da847cf5
Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile ( #25782 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-26 18:54:09 -07:00
3958b96bf5
Add option to restrict media domains ( #25783 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
2025-09-27 01:23:52 +00:00
8bf8f45822
[Core] Don't count preempted tokens in prefix cache hit rate ( #25787 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-09-27 00:16:40 +00:00
6f5c0931c1
[Spec decode] automatically disable mm for text-only draft models ( #25667 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
2025-09-27 08:10:21 +08:00
4e33a7ea85
[Bugfix] Optimize CpuGpuBuffer initialization ( #25447 )
...
Signed-off-by: Naman Lalit <nl2688@nyu.edu >
2025-09-27 08:07:36 +08:00
dc48ba0c75
Kernel-override Determinism [1/n] ( #25603 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-09-26 16:59:09 -07:00
4778b42660
Reduce the Cuda Graph memory footprint when running with DBO ( #25779 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-09-26 22:29:56 +00:00
c70ac4b8ff
[spec decode] Consolidate speculative decode method name for MTP ( #25232 )
...
Signed-off-by: zixi-qi <qizixi@meta.com >
2025-09-26 22:27:05 +00:00
cf89202855
[CI] Fix FlashInfer AOT in release docker image ( #25730 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-26 14:11:40 -07:00
f075693da7
[V1] address post issues related to #20059 (part 1) ( #23046 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-09-26 15:58:19 -04:00
f708bd4904
[CI] Add E2E Blackwell Quantized MoE Test ( #25723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-26 12:23:00 -07:00
0002b7f0d1
[Docs] Add Toronto Meetup ( #25773 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-09-26 12:00:46 -07:00
11aafd9886
[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition ( #25355 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-09-26 11:54:00 -07:00
b761df963c
[Doc]: improve CPU(x86) build-wheel-from-source section ( #25617 )
...
Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com >
2025-09-26 10:26:33 -07:00
33f6aaf972
Eagle3 that supports the Minicpm3 model ( #24243 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: liudan <adan@minicpm.com >
Co-authored-by: liudan <liudan@qq.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
2025-09-26 10:04:57 -07:00
56aafa8c0b
[Misc] fix unique_filepath ( #25732 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-09-26 16:56:15 +00:00
8d52f2b3a7
[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray ( #25439 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com >
2025-09-26 09:43:30 -07:00
984d18498a
[BugFix] Fix using dbo_decode_token_threshold always (and ignoring dbo_prefill_token_threshold) ( #25622 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-09-26 16:22:49 +00:00
d4d9899860
[Quantization] Add field to skip unquantized modules for GPTQ config ( #25455 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-26 15:47:41 +00:00
db1e42f627
[CI/Build] Fix some V1 tests not being run ( #25569 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-26 20:52:36 +08:00
bc9d7b5595
[CI/Build] Split up Distributed Tests ( #25572 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-26 14:49:33 +02:00
fe6b19c314
[Bugfix] Properly abort pooling request. ( #25734 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-09-26 05:47:34 -07:00
2827b3f4a3
[CI] Fix test_shared_storage_connector_hashes ( #25748 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-09-26 20:46:17 +08:00
2b6b1d7809
[Model] Mamba2 varlen refactor ( #21467 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com >
2025-09-26 11:31:14 +00:00
633f943e30
[Doc] Update Batch-level DP docs ( #25757 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-26 02:37:40 -07:00
b03b1b97f6
Support LongCat-Flash-Chat tool call ( #24083 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
2025-09-26 09:25:39 +00:00
dfb9af2014
[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk ( #25698 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-09-26 01:25:28 -07:00
19f76ee68e
[misc] refactor speculative config ( #25657 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-09-26 01:22:06 -07:00
dd70437a4f
Remove cuda hard-code in compute_causal_conv1d_metadata ( #25555 )
...
Signed-off-by: Icey <1790571317@qq.com >
2025-09-26 01:19:20 -07:00
99b3a504c5
[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. ( #25743 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
2025-09-26 01:18:58 -07:00
6e30010d2f
fix: print outputt offline_inference/base/chat.py example ( #25744 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
2025-09-26 01:18:24 -07:00
52621c8f5c
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X ( #25703 )
...
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
2025-09-26 01:18:20 -07:00
d48f4d6daf
perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled ( #25739 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-09-26 01:18:09 -07:00
e84e0735c7
fix: revert cast to cpu in MsgpackEncoder._encode_tensor to avoid hidden performance regressions ( #25738 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-09-26 01:18:05 -07:00
3edf87d25f
[CI/Build] fix doc build warning: Failed to get 'name: description' pair ( #25733 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
2025-09-26 01:18:02 -07:00
392edee34a
EVS Support (Video tokens pruning) ( #22980 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-26 11:54:54 +08:00
983056e456
[Misc] Remove unnecessary memoryviews in shm_broadcast.py ( #25721 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-09-26 03:11:44 +00:00
13dd93c667
[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder ( #25701 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-09-25 18:21:56 -07:00
53a30845be
Llamas 3.1 405B fp4 changes upstreaming from 355_wip ( #25135 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
2025-09-25 19:16:53 -06:00
8b77328ffe
[Misc] Don't log shm dequeue delay warning on worker side ( #25720 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-09-26 01:08:30 +00:00
9fe4c2bdb9
[Refactor] Remove DeepGEMM OP Register ( #25710 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-09-25 20:13:41 -04:00
081b5594a2
Fix routing_bias dtype ( #25711 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com >
2025-09-25 23:35:14 +00:00
57329a8c01
[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 ( #25708 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-09-25 16:10:29 -07:00
8c435c9bce
[Core] Enable command line logging for LLMEngine ( #25610 )
...
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-09-25 15:31:17 -07:00
e71b8e210d
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. ( #24986 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-09-25 15:22:03 -07:00
89fa54e6f7
[Optimization] Use a cheaper cache key in get_model_architecture ( #25682 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-25 17:54:20 -04:00
3d54bdcb73
[Optimization] Streamline InputPreprocessor ( #25702 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-25 21:06:49 +00:00
6b0fcbbf43
[Misc] Simplify test_argsort_mm_positions ( #25690 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-25 18:23:01 +00:00
0fa673af4c
[V0 deprecation] Clean up LoRA ( #25686 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-25 18:12:33 +00:00
3468f17ebe
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names ( #25489 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
2025-09-25 17:37:50 +00:00
71b25b0d48
[V0 deprecation] Clean up V0 fallback in compilation config ( #25675 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-25 17:29:51 +00:00
0ea80c87d9
[Model] Define merge_by_field_config MM interface ( #25676 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-25 17:13:07 +00:00
b8d9e4a326
[Model] Add optional parameter to reasoning parser constructor ( #25554 )
...
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-26 01:12:50 +08:00
13cc7f5370
[BugFix] Fix DBO hang ( #25625 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-09-25 17:04:48 +00:00
916bd9204d
Revert "[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super()" ( #25681 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-09-25 09:45:06 -07:00
e04a1b6b21
[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… ( #24662 )
...
Signed-off-by: AlonKejzman <alonkeizman@gmail.com >
2025-09-25 15:40:14 +00:00
2e5df88c92
[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning ( #25532 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-09-25 15:16:06 +00:00
0754ac4c49
[Misc] Remove cruft file in repo ( #25678 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-09-25 08:05:12 -07:00
03858e6d1c
[Bugfix] Fix InternS1 video processing after Transformers v4.56 ( #25644 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-25 14:46:04 +00:00
532a6cfccb
[ux] Switch a warning to debug about a pytorch fallback ( #23750 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-09-25 14:38:16 +00:00
eb32335e35
[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata ( #25652 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-09-25 13:29:11 +00:00
69a8c8e99a
[torch.compile] Make Query Quantization Fusable ( #24914 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
2025-09-25 09:25:12 -04:00
6c340da4df
[misc] log info messages by default for hanging / busy / idle ( #25627 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-09-25 21:14:57 +08:00
2f17117606
[mypy] Fix wrong type annotations related to tuple ( #25660 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-25 13:00:45 +00:00
1e9a77e037
[Hardware][RISC-V] Add riscv64 support for vLLM with scalar ( #22112 )
...
Signed-off-by: chenlang <chen.lang5@zte.com.cn >
Co-authored-by: chenlang <10346245@zte.com.cn >
2025-09-25 20:46:11 +08:00
d2af67441d
[XPU][Triton]add xpu config in triton_reshape_and_cache_flash ( #25643 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-09-25 12:38:11 +00:00
0bcc3a160d
[CI/Build] Fix flaky entrypoints test ( #25663 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-25 12:19:40 +00:00
70fbdb26e9
Add backward compatibility for guided_... API ( #25615 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-09-25 19:45:25 +08:00
7f570f1caa
[V0 deprecation] Remove unreachable model_config.supported_tasks ( #25642 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-09-25 11:26:31 +00:00
eaeca3cd7f
[Bugfix] Parse SpeculativeConfig Error ( #25142 )
...
Signed-off-by: zxw <1020938856@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-25 11:09:39 +00:00
12c1287d64
[mypy] Further improve MM type annotations ( #25654 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-25 10:57:36 +00:00
17b4c6685c
[Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling ( #25648 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-09-25 18:36:01 +08:00
3c2b2ccece
[Bugfix] Add triton.language.tensor placeholder ( #25649 )
...
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai >
2025-09-25 10:31:14 +00:00
7be9ffcd9f
[Misc] Fix Qwen3-VL video_grid_thw typing ( #25646 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-09-25 10:16:45 +00:00
393de22d2e
[fix] Update torch version in cpu-build.txt for AArch64/ppc64le and Darwin ( #25579 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-09-25 09:39:18 +00:00
1260180c67
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… ( #25607 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-09-25 08:05:21 +00:00
af4ee63e0e
typo: remove duplicate is ( #25641 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io >
2025-09-25 00:46:22 -07:00
bc092ea873
Map CwmForCausalLM to llama and LlamaForCausalLM ( #25611 )
...
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-09-25 07:37:03 +00:00
755ed7b05b
[Misc] Simplify PoolerOutput and move to v1/outputs ( #25629 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-09-25 06:47:03 +00:00
a676e668ee
[Bugfix] fix apply_temperature to avoid nan in probs ( #24734 )
...
Signed-off-by: courage17340 <courage17340@163.com >
2025-09-25 05:32:21 +00:00
c85be1f6dd
optimize: eliminate duplicate split_enc_dec_inputs calls ( #25573 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io >
2025-09-25 05:03:25 +00:00
845adb3ec6
[Model] Add LongCat-Flash ( #23991 )
...
Signed-off-by: yangxurui <yangxurui@meituan.com >
Co-authored-by: yangxurui <yangxurui@meituan.com >
2025-09-24 21:53:40 -07:00
90b139cfff
Enable Fbgemm NVFP4 on Dense models ( #25609 )
...
Signed-off-by: Saman Keon <samanamp@outlook.com >
2025-09-24 21:12:53 -07:00
4492e3a554
[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() ( #25613 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-24 18:52:52 -07:00
05c19485a5
[Kernel] Support DCP for Triton backend ( #25132 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-09-24 18:09:34 -07:00
52d0cb8458
[Model] Improve DotsOCRForCausalLM ( #25466 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-25 07:58:08 +08:00
5c1e496a75
[MISC] replace c10::optional with std::optional ( #25602 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
2025-09-24 16:56:21 -07:00
e7f27ea648
Improve --help for enhanced user experience ( #24903 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-24 23:08:18 +00:00
1f29141258
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor ( #25517 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-09-24 18:52:36 -04:00
6160ba4151
feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel ( #25503 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com >
2025-09-24 18:50:04 -04:00
fea8006062
[Logging] Improve log for when DeepEP HT disables CUDA Graphs ( #25531 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-09-24 22:43:06 +00:00
e6750d0b18
[V0 Deprecation] Remove unused classes in attention ( #25541 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-24 13:24:40 -07:00
8c853050e7
[Docs] Enable fail_on_warning for the docs build in CI ( #25580 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-24 19:30:33 +00:00
f84a472a03
Suppress benign cuBLAS warning when capturing cudagraphs with DBO ( #25596 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-09-24 19:02:08 +00:00