vllm-dev

Author	SHA1	Message	Date
nvjullin	7ea22e42d5	[Misc] Add override for allreduce fusion thresholds (#23639 ) Signed-off-by: Julien Lin <jullin@nvidia.com>	2025-08-26 15:53:04 +00:00
Yuekai Zhang	9d4183dd2e	[model] support qwen2audio embedding input (#23625 ) Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-26 23:48:08 +08:00
Yuekai Zhang	513298f1b4	[Bugfix] fix bf16 multimodal model hash (#23623 ) Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-26 23:47:50 +08:00
Harry Mellor	379f828fba	[Docs] Reduce requirements for docs build (#23651 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 15:43:28 +00:00
Hongxia Yang	1fdc732419	[ROCm] Starting to add AMD code reviewers for ROCm components (#23496 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2025-08-26 07:32:37 -07:00
TianyuLi0	f58675bfb3	[CPU] add cpu fused moe pytorch native implementation (#23146 ) Signed-off-by: Tianyu Li <tianyu.li@arm.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-08-26 14:09:17 +00:00
Didier Durand	7c04779afa	[Doc]: fix various spelling issues in multiple files (#23636 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-26 14:05:29 +00:00
nvjullin	f66673a39d	[Kernel] Added flashinfer fp8 per-tensor gemms (#22895 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-26 06:54:04 -07:00
En Ouyang	b78bed1bc5	[Hardware][Mac] Fix the installation fail for Apple Silicon (CPU) (#23565 ) Signed-off-by: oye93 <en.ouyang93@outlook.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-08-26 13:04:25 +00:00
Harry Mellor	164b2273c8	[Docs] Fix broken links to `docs/api/summary.md` (#23637 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 13:00:18 +00:00
Chen Zhang	2b4fc9bd9b	Support FlashAttention Backend for Hybrid SSM Models (#23299 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-26 12:41:52 +00:00
Guillaume Calmettes	ebd5a77bb5	feat: add usage to TranscriptionResponse (text and json response_format) (#23576 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-08-26 05:26:26 -07:00
Matúš Námešný	384dd1b0a8	[Bugfix] Add missing enable_log_outputs parameter to init_app_state function (#23634 ) Signed-off-by: Matúš Námešný <matus.namesny@ameria.com>	2025-08-26 12:13:15 +00:00
Jee Jee Li	fdeb3dac13	[Model] fix DeepSeek e_score_correction_bias dtype to fp32 (#23640 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-26 20:09:47 +08:00
Michael Goin	d52358c1e0	[Perf] Remove duplicated NVFP4 blockscales to save memory (#23379 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-26 19:16:33 +08:00
Huy Do	6ace2f72b0	Fix writing benchmark results with tuple keys (#23633 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-08-26 19:16:09 +08:00
Harry Mellor	b00e69f8ca	Fix nits from #20059 (#23548 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 03:27:20 -07:00
Cyrus Leung	50fede6634	[V1] Enable V1 for compute capability < 8.0 + FP32 (#23614 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-26 03:00:18 -07:00
Roger Wang	b5d34af328	[Bugfix] Fix scheduling when repeated images in one request (#23544 ) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2025-08-26 09:46:28 +00:00
Jee Jee Li	9b5f64238f	[Bugfix] Fix Qwen25VL packed_modules_mapping (#23604 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-26 01:09:14 -07:00
Raghavan	ff77764f86	Fix CLI parameter documentation inconsistency in pooling_models.md (#23630 )	2025-08-26 01:05:37 -07:00
Harry Mellor	bfc1edc9f5	[Docs] Fix titles for multi-file examples that are rendered in the docs (#23573 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 00:16:44 -07:00
Jiangyun Zhu	3ecbb14b81	[Benchmarks] add benchmark for embedding models (#23000 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-08-25 23:57:08 -07:00
Cyrus Leung	7d67a9d9f9	[mypy] Fix incorrect type hint for EAGLE3 support (#23617 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-25 23:50:17 -07:00
Bin Jia	959783fb99	[fix] fix seed-oss-parser (#23560 ) Signed-off-by: jiabin.00 <jiabin.00@bytedance.com>	2025-08-25 23:16:36 -07:00
Cyrus Leung	ce0e9dbd43	[CI/Build] Fix typo in #23561 (#23616 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-25 23:13:03 -07:00
Zijing Liu	b395b3b0a3	[Disagg][Perf] Use CUDA event sync instead of blocking `tolist` to avoid unintentional copy ops blocking across different CUDA streams, improving disagg TTIT/TTFT (#22760 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Zijing Liu <liuzijing2014@users.noreply.github.com>	2025-08-25 21:06:00 -07:00
Copilot	6fad29b11b	Remove graph_pool as member of VllmBackend and argument to CUDAGraphWrapper (#23385 ) Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-08-25 19:34:15 -07:00
Cyrus Leung	6fd45e7b8a	[CI/Build] Use vLLM client's user agent to fetch images (#23561 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-25 19:34:12 -07:00
Wentao Ye	56dcf4e7e9	[Bug] Fix DeepGEMM Env Control (#23591 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-25 18:41:21 -07:00
weiliang	ae067888d6	Update Flashinfer to 0.2.14.post1 (#23537 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Signed-off-by: siyuanf <siyuanf@nvidia.com> Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-25 18:30:44 -07:00
Michael Goin	906e461ed6	[CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-25 18:29:00 -07:00
Simon Mo	2a97ffc33d	[Misc] Add release note draft to PR template (#23598 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-08-25 16:44:51 -07:00
Woosuk Kwon	efc88cf64a	[Misc] Simplify FlashInfer attention metadata (#23585 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-25 15:42:29 -07:00
Terrence Zhao	7b6a837275	[Docs] Update Documentation of Cohere Command-A Models (#23584 ) Signed-off-by: Terrencezzj <terrence@cohere.ai> Signed-off-by: Abatom <abzhonghua@gmail.com> Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com>	2025-08-25 21:53:52 +00:00
Pate Motter	c34c82b7fe	[TPU][Bugfix] Fixes prompt_token_ids error in tpu tests. (#23574 ) Signed-off-by: Pate Motter <patemotter@google.com>	2025-08-25 14:29:16 -07:00
Chaojun Zhang	8a044754bd	[XPU] Delay BF16 check to worker init for spawn compatibility (#22979 ) Signed-off-by: chzhang <chaojun.zhang@intel.com>	2025-08-25 13:09:26 -07:00
Zhonghua Deng	9188ae7cb5	[Bugfix][V1][P/D]Fix the issue where repeated requests for the same input produce abnormal outputs for P2pNcclConnector (#23403 ) Signed-off-by: Abatom <abzhonghua@gmail.com>	2025-08-25 12:57:08 -07:00
Xin Yang	8a3cd90af5	[Kernel] Add fused grouped_topk kernel for MoE (#23274 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-25 11:47:52 -07:00
22quinn	2a167b2eeb	[test][RL] Add sleep level 2 test and fix reload with sleep mode (#23521 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-26 00:25:52 +08:00
Woosuk Kwon	0ff902f3b4	[Refactor] Refactor persistent buffers with CpuGpuBuffer (#23515 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-25 08:44:48 -07:00
Isotr0py	a9082a4d14	[Bugfix] Fix Qwen3 MoE GPTQ inference (#23490 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-25 06:40:20 -07:00
Driss Guessous	e0329ed4b4	Updates to Flex + VLLm integration (#21416 ) Signed-off-by: drisspg <drisspguessous@gmail.com>	2025-08-25 09:32:42 -04:00
Cyrus Leung	6879cd80ae	[Refactor] Pass `tokenizer` explicitly instead of binding to prompt update (#23542 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-25 06:31:57 -07:00
Cyrus Leung	e269be2ba2	[Doc] Add caution for API server scale-out (#23550 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-25 06:14:15 -07:00
Ayush Satyam	5c4b6e66fe	[Attention] Unify mamba and attention backend selection (#23171 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com>	2025-08-25 09:09:36 +00:00
youkaichao	d0a4a3f645	[misc] add shanghai meetup (#23535 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-08-25 17:00:03 +08:00
Cyrus Leung	ebafb0936d	[Bugfix] Allow dynamic number of patches for llava_onevision (#23525 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-25 08:34:54 +00:00
Breno Baldas Skuk	0cb7b065c3	Feature/benchmark/random mm data/images (#23119 ) Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>	2025-08-25 01:28:35 -07:00
ZiTian Zhao	2da02dd0d8	[Fix] DeepSeek V3.1 tool parser error message (#23492 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-08-25 00:56:39 -07:00

... 2 3 4 5 6 ...

9056 Commits