vllm-dev

Author	SHA1	Message	Date
Cyrus Leung	fe8d7b6f03	[Model] Interface to enable batch-level DP support (#23733 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-27 06:41:22 -07:00
Harry Mellor	16dc4052b0	Fix pre-commit on main (#23747 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-27 06:39:48 -07:00
rebel-hongseok	8dd2baa597	Add vLLM Korea Meetup in the README.md and meetups.md (#23746 ) Signed-off-by: rebel-hongseok <hongseok@rebellions.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-27 06:25:49 -07:00
Cyrus Leung	5eeef1b908	[Model] Explicit `default_pooling_type` interface (#23736 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-27 13:24:09 +00:00
Thomas Parnell	704432af3c	[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models (#23716 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-27 12:51:54 +00:00
Nick Hill	a403d0fa41	[Misc] Remove unnecessary `_send_reconfig_message()` in `core_client.py` (#23127 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-27 05:50:47 -07:00
cndoit18	8c13820f0b	[Bugfix] Fix task field initialization when PYTHONOPTIMIZE is enabled (#23718 ) Signed-off-by: cndoit18 <cndoit18@outlook.com>	2025-08-27 12:42:20 +00:00
tc-mb	9d30de4469	[model] Support MiniCPM-V 4.5 (#23586 ) Signed-off-by: tc-mb <caitianchi@modelbest.cn> Signed-off-by: Xin Yang <xyangx@amazon.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Signed-off-by: chzhang <chaojun.zhang@intel.com> Signed-off-by: Pate Motter <patemotter@google.com> Signed-off-by: Terrencezzj <terrence@cohere.ai> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Signed-off-by: siyuanf <siyuanf@nvidia.com> Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Zijing Liu <liuzijing2014@users.noreply.github.com> Signed-off-by: jiabin.00 <jiabin.00@bytedance.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: tc-mb <157115220+tc-mb@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Huy Do <huydhn@gmail.com> Signed-off-by: Matúš Námešný <matus.namesny@ameria.com> Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: oye93 <en.ouyang93@outlook.com> Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Tianyu Li <tianyu.li@arm.com> Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Zerohertz <ohg3417@gmail.com> Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com> Signed-off-by: Federico <65908512+coval3nte@users.noreply.github.com> Signed-off-by: Zixuan Zhang <zixuanzhang@bytedance.com> Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: czhu-cohere <conway.zhu@cohere.com> Signed-off-by: Wei Wei <wwei6@meta.com> Signed-off-by: Yiheng Xu <charlesyihengxu@gmail.com> Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Signed-off-by: wangyafeng <wangyafeng@baidu.com> Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com> Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com> Co-authored-by: Pate Motter <p@temotter.com> Co-authored-by: Terrence Zhao <32208165+Terrencezzj@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: weiliang <weiliangl@nvidia.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com> Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Raghavan <oneraghavan@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com> Co-authored-by: Matúš Námešný <matus@namesny.com> Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: En Ouyang <en.ouyang93@outlook.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Co-authored-by: nvjullin <jullin@nvidia.com> Co-authored-by: Didier Durand <2927957+didier-durand@users.noreply.github.com> Co-authored-by: TianyuLi0 <116711075+TianyuLi0@users.noreply.github.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Hyogeun Oh (오효근) <ohg3417@gmail.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Huzaifa Sidhpurwala <huzaifas@redhat.com> Co-authored-by: Federico <65908512+coval3nte@users.noreply.github.com> Co-authored-by: zixuanzhang226 <zixuanzhang@bytedance.com> Co-authored-by: wuhang <wuhang6@huawei.com> Co-authored-by: yzds <41983536+youzhedian@users.noreply.github.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: czhu-cohere <conway.zhu@cohere.com> Co-authored-by: Wei <weiweinpu@gmail.com> Co-authored-by: Yiheng Xu <charlesyihengxu@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: CSWYF3634076 <58356743+CSWYF3634076@users.noreply.github.com>	2025-08-27 05:38:00 -07:00
Michael Yao	1f7a9c95e4	[Docs] Fix a 1-2-3 list and style issues in tpu.md (#23729 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-08-27 05:37:52 -07:00
Fanli Lin	8f0d7eaea8	[XPU] Fix OOM issue for data parallel with Ray backend (#22500 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-08-27 19:57:38 +08:00
Jee Jee Li	e03940762b	[CI/Build] Reduce LoRA layer test cases (#23721 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-27 10:59:35 +00:00
Woosuk Kwon	11eddf02f0	[FlashInfer] Cache hyper params in metadata builder (#23732 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-27 03:45:04 -07:00
Woosuk Kwon	04ff1e43fb	[Misc] Move CpuGpuBuffer to vllm/v1/utils.py (#23728 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-27 03:25:00 -07:00
Woosuk Kwon	6578e87365	Optimize input preparation for FlashInfer [2/N] (#23174 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-27 02:52:45 -07:00
Michael Yao	5bd9f84158	[Docs] Fix an admonition important (#23726 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-08-27 02:50:09 -07:00
Cyrus Leung	91e382c935	[CI/Build] Remove redundant register in model init tests (#23715 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-27 08:11:15 +00:00
Kunshang Ji	6446677839	[XPU]fix cuda event used in XPU model runner (#23708 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-08-27 07:27:14 +00:00
Cyrus Leung	69244e67e6	[Core] Use key-only cache for `BaseMultiModalProcessor` (#23018 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-27 14:19:13 +08:00
rongfu.leng	8dbf6ed7be	[Bugfix] fix when config.yaml config value is list parse error (#23528 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-27 05:54:39 +00:00
Jee Jee Li	9de25c294b	[CI/Build] Remove redundant LoRA model tests (#23706 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-27 05:51:50 +00:00
Kunshang Ji	fce10dbed5	[XPU] Add xpu torch.compile support (#22609 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-08-27 05:33:27 +00:00
Dipika Sikka	d272415e57	[Quantization] Expand compressed-tensors MoE matching logic to support NFP4 + FP8 MoEs (#22674 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-08-27 05:00:21 +00:00
Chen Zhang	142ac08030	[Frontend] Optimize beam search performance by limiting concurrency (#23599 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-27 04:59:14 +00:00
Chen Zhang	3210264421	[Frontend] Add --log-error-stack to print stack trace for error response (#22960 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-27 04:58:59 +00:00
CSWYF3634076	644d57d531	[Model] Add Ernie4.5 VL Model Support (#22514 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-08-26 21:02:55 -07:00
Chenheli Hua	c905684cfe	[Core] Asynchronous h2d in merge_multimodal_embeddings via pinned memory. (#23686 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-08-26 20:05:34 -07:00
Yiheng Xu	786835807b	[Bugfix]: Qwen3 Coder Tool Parser (#23099 ) Signed-off-by: Yiheng Xu <charlesyihengxu@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-08-26 19:58:32 -07:00
Wei	fecbb7c782	[Bugfix][gpt-oss] passing the cache config in gpt-oss (#23613 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-08-27 02:54:23 +00:00
Harry Mellor	6dab89b8ec	[Docs] Fix math rendering in docs (#23676 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 18:47:08 -07:00
Michael Goin	de02b07db4	[Bugfix] Lazy import gpt_oss_triton_kernels_moe for mxfp4 (#23678 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-27 09:34:57 +08:00
Chen Zhang	eb1995167e	[gpt-oss] Enable unit test for response API harmony integration (#23533 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-26 18:23:26 -07:00
czhu-cohere	2c2b140ae8	[quantization] use channel scales for w4a8 + misc fixes (#23570 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-08-26 18:23:23 -07:00
yzds	c7c80af084	fix pynccl reduce_scatter (#23648 ) Co-authored-by: hongchao <hongchao@msh.team>	2025-08-26 18:21:11 -07:00
wuhang	6891205b16	[Feature][Responses API] Support MCP tool in background mode (#23494 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-08-27 01:06:58 +00:00
zixuanzhang226	b1625dbe9c	feat: add triton fused moe config for GLM-4.5-Air-FP8 on B200 (#23695 ) Signed-off-by: Zixuan Zhang <zixuanzhang@bytedance.com>	2025-08-26 18:06:10 -07:00
Federico	585e0bde36	[Bugfix] UnboundLocalError when GptOss reasoning specified (#23054 ) Signed-off-by: Federico <65908512+coval3nte@users.noreply.github.com>	2025-08-27 00:29:52 +00:00
Wentao Ye	714872f1a9	[Compile] Fix Cmake Warning (#23689 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-26 23:48:32 +00:00
Thomas Parnell	5f1af97f86	[V1] [Hybrid] Enable Full CUDA graph by default for hybrid models in V1 (#22594 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-26 23:28:55 +00:00
Zhonghua Deng	c3b0fd1ee6	[V1][P/D]P2pNcclConnector supports flashinfer (#23536 ) Signed-off-by: Abatom <abzhonghua@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-08-26 22:56:16 +00:00
Harry Mellor	6421b66bf4	[Docs] Move quant supported hardware table to README (#23663 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 22:26:46 +00:00
Huzaifa Sidhpurwala	2f13319f47	Enhance the pre-notification policy (#23532 ) Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>	2025-08-26 20:41:36 +00:00
Chen Zhang	d696f86e7b	[doc] Hybrid KV Cache Manager design doc (#22688 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 20:19:05 +00:00
Isotr0py	9816b81f5f	[Model] Enable video support for InternVL3.5 models (#23658 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-26 19:46:52 +00:00
Jiangyun Zhu	c37c0af990	[Misc] Fix comments in `tests/kernels/quantization` (#23675 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-08-26 19:31:20 +00:00
Cyrus Leung	9715f7bb0f	[Bugfix] Fix incorrect original shape in hashing (#23672 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-08-26 19:01:25 +00:00
Russell Bryant	98aa16ff41	[v1] Add cross-attention KV cache support for encoder-decoder models (#23664 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-08-26 18:49:06 +00:00
Thomas Parnell	227e231b55	[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (#23665 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-26 18:33:16 +00:00
Hyogeun Oh (오효근)	730d0ac8b9	[Docs] Fix warnings in `mkdocs build` (#23649 ) Signed-off-by: Zerohertz <ohg3417@gmail.com> Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 18:19:23 +00:00
Li, Jiang	9b0187003e	[Bugfix] Fix cuda event usage with CPU model runner (#23643 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-26 17:10:42 +00:00
vllmellm	44ac25eae2	[CI] [Doc]: Add GH Action for auto labeling issues with `rocm` tag (#20988 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-26 16:20:13 +00:00

1 2 3 4 5 ...

9056 Commits