vllm-dev

Author	SHA1	Message	Date
Wentao Ye	d3d2aad5a2	[Log] Use Debug Once for DeepGEMM E8M0 When not Enabled (#23858 )	2025-08-28 22:18:10 +00:00
Yong Hoon Shin	cb293f6a79	[V1] Enable prefill optimization for Gemma3n (#22628 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-28 14:54:30 -07:00
Woosuk Kwon	7ffbf27239	[BugFix][FlashInfer] Fix potential race condition for paged_kv_indptr_cpu (#23737 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-28 14:22:46 -07:00
Simon Mo	27e88cee74	chore: build release image by default (#23852 ) Signed-off-by: Codex <codex@openai.com>	2025-08-28 13:17:15 -07:00
elvischenv	16a45b3a28	[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671 ) Signed-off-by: jindih <jindih@nvidia.com> Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: jindih <jindih@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedic <lgovedic@redhat.com>	2025-08-28 19:36:50 +00:00
Jingkai He	57d4ede520	[bugfix] [spec-decoding] fix data race in sample_recovered_tokens_kernel (vLLM v1) (#23829 ) Signed-off-by: He-Jingkai <he-jingkai@outlook.com>	2025-08-28 19:05:20 +00:00
Divakar Verma	04d1dd7f4a	[ROCm][Aiter] Add triton fp8 bmm kernel for mla (#23264 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>	2025-08-28 18:18:08 +00:00
Benji Beck	f32a5bc505	Migrate Llama4ImagePatchInputs to TensorSchema (#22021 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-08-28 17:29:37 +00:00
Jean Schmidt	8805ad9fa9	Add scale_config.yml file for Meta autoscalers for GH Actions (#23840 ) Signed-off-by: Jean Schmidt <contato@jschmidt.me>	2025-08-28 09:31:20 -07:00
Jean Schmidt	0583578f42	[ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757 ) Signed-off-by: Jean Schmidt <contato@jschmidt.me>	2025-08-28 08:59:19 -07:00
Angela Yi	db74d60490	[Bugfix] Add fake mode around passes (#23349 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-08-28 11:25:56 -04:00
Po-Han Huang (NVIDIA)	95089607fa	[Model][gpt-oss] Support DP+EP for GPT-OSS with FlashInfer trtllm-gen MoE (#23819 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-08-28 06:56:20 -07:00
Thomas Parnell	1f096f9b95	[CI] Fix linting error on main (#23835 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-28 06:52:01 -07:00
YUQI.CHENG	66548f6603	[Bugfix] Fix benchmark_moe.py for blockwise fp8. (#23823 ) Signed-off-by: crischeng <420985011@qq.com> Co-authored-by: cris <grace@guisenbindeMacBook-Pro.local>	2025-08-28 21:44:09 +08:00
Didier Durand	d3da2eea54	[Doc]: fix typos in Python scripts (#23828 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-28 05:37:38 -07:00
Jiangyun Zhu	bfab219648	[Model] [gpt-oss] fix gpt-oss pp support (#23815 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-08-28 05:36:55 -07:00
Woosuk Kwon	a3432f18fd	[BugFix][Spec Decode] Use float64 for uniform_probs (#23803 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-28 12:26:45 +00:00
Li, Jiang	67cee40da0	[CI/Build][Bugfix] Fix Qwen VL tests on CPU (#23818 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-28 11:57:05 +00:00
Didier Durand	d99c3a4f7b	[Doc]: fix typos in .md files (including those of #23751 ) (#23825 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-28 04:38:19 -07:00
JartX	3462c1c522	[FIXBUG] Add return_success parameter to moe_wna16_weight_loader function (#22797 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-28 09:03:22 +00:00
Isotr0py	c5d004aaaf	[Model] Add PP support and VLM backbone compatability for GPT-OSS (#23680 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-28 16:03:28 +08:00
wang.yuqi	11a7fafaa8	[New Model]: Support GteNewModelForSequenceClassification (#23524 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-28 15:36:42 +08:00
yzds	186aced5ff	[Kernel] cuda kernels for upcoming decode context parallel feature (#23791 ) Co-authored-by: hongchao <hongchao@msh.team>	2025-08-28 15:29:11 +08:00
rongfu.leng	daa1273b14	[Bugfix] when set offline model running error (#23711 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-28 07:27:45 +00:00
Jiangyun Zhu	c07a73317d	[CI] enable idefics3 and fuyu-8b test in multimodal test (#23790 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-08-28 14:51:24 +08:00
Kyle Sayers	22feac8e95	[Transform] [Quantization] Add transforms to compressed tensors (#22486 )	2025-08-28 02:43:48 -04:00
Jinheng	c8851a4723	Add deprecation warning for lora_extra_vocab_size (#23635 ) Signed-off-by: Jinheng Li <ahengljh@gmail.com>	2025-08-27 22:34:29 -07:00
Alex	f48a9af892	[CI] make all multi-gpu weight loading tests run nightly (#23792 ) Signed-off-by: Alex Yun <alexyun04@gmail.com>	2025-08-27 21:27:36 -07:00
Jan Kessler	a11adafdca	Gracefully handle edge cases in harmony utils (#23155 ) Signed-off-by: Jan Kessler <jakessle@uni-mainz.de> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-27 20:14:00 -07:00
Michael Goin	a781e84ec2	[Perf] Tune configs for triton block fp8 gemm H100/H200 (#23748 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-28 11:12:53 +08:00
Shrey Gupta	1b7b161a09	[Feature] models: pass layer prefix to replace_linear_class for per-layer quantization routing. Addresses #23239 (#23556 ) Signed-off-by: Shrey Gupta <shreyg1303@gmail.com>	2025-08-27 20:12:44 -07:00
Benji Beck	a69693e38f	Migrate Qwen inputs to TensorSchema (#23473 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-08-28 10:43:26 +08:00
Hanchenli	5da4f5d857	[Bugfix] Fix for V1 priority scheduling crashes at preemption (#23713 ) Signed-off-by: Hanchenli <lihanc2002@gmail.com>	2025-08-28 00:44:52 +00:00
Wentao Ye	321938e9ac	[Feature] Add `VLLM_DISABLE_PAD_FOR_CUDAGRAPH` to Avoid Hang Issue (#23595 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-27 21:52:24 +00:00
Michael Goin	f9ca2b40a0	[Bugfix] Fix Marlin NVFP4 for modelopt (#23659 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-27 17:48:16 -04:00
Yongye Zhu	082cc07ef8	DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100 (#23608 )	2025-08-27 17:33:21 -04:00
Asaf Joseph Gardin	853c371fc3	[V1][Mamba] - Enable V1 by default for Mamba Models (#23650 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-08-27 20:53:30 +00:00
Roger Wang	8bf6266a17	[Multimodal] Generate mm_hash based on request metadata when caching is turned off (#23690 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-08-27 20:24:31 +00:00
Harry Mellor	0585a9e73c	Disable `torch.compile` for dynamic rope models in Transformers backend (#23738 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-27 19:03:05 +00:00
Eli Uriegas	3c0ef769ba	ci: Add arm64 docker build to release pipeline (#23210 ) Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Signed-off-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>	2025-08-27 10:41:48 -07:00
Hyogeun Oh (오효근)	4e4d017b6f	[Docs] Fix warnings in `mkdocs build` (continued) (#23743 ) Signed-off-by: Zerohertz <ohg3417@gmail.com> Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>	2025-08-27 17:17:29 +00:00
Thomas Parnell	dd58932280	[V1] [Hybrid] Enable compile and piecewise CUDA graph for MiniMax-Text models (#22589 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-27 10:05:16 -07:00
Cyrus Leung	52883ed084	[Model] Merge `SupportsMultiModalWithRawInput` with `SupportsMultiModal` (#23749 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-27 10:01:50 -07:00
Luka Govedič	4f35be10a9	[BugFix] Fix topk_softmax assert (#19764 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com>	2025-08-27 09:47:28 -07:00
Harry Mellor	2b61d2e22f	[Docs] Remove in-tree Gaudi install instructions (#23628 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-27 09:22:21 -07:00
Nick Hill	3ce8285d6d	[LogitsProcs] Deduplicate built-in LP implementation logic (#23362 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-27 23:11:33 +08:00
Didier Durand	83f555f637	[Doc]: upgrade version of crate-ci tool for improved typo detection (#23755 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-27 07:59:34 -07:00
Isotr0py	841490434a	[Model] Enable native HF format InternVL support (#23742 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-27 14:45:17 +00:00
Wentao Ye	3af47c3cc6	[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-08-27 14:09:08 +00:00
Harry Mellor	513c1fe255	Only run `get_attr_docs` if generating help text (#23723 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-27 13:55:12 +00:00

1 2 3 4 5 ...

9056 Commits