|
2e36cdbe2b
|
[Docs] Add a start tag to build.inc.md (#26747)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-10-13 21:51:55 -07:00 |
|
|
fe3edb4cf0
|
Add support for the /rerank endpoint in vllm bench serve (#26602)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-10-14 04:25:43 +00:00 |
|
|
29350922c6
|
[Feature][Quantization] auto_round format add support for regex (#24024)
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: Heng Guo <heng.guo@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-14 03:03:16 +00:00 |
|
|
8ae169286f
|
[torch.compile] Unwrap fused_marlin_moe custom op (#26739)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-14 02:22:16 +00:00 |
|
|
8a0af6a561
|
[build][torch.compile] upgrade depyf version (#26702)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-10-14 10:12:09 +08:00 |
|
|
cfded80793
|
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-14 01:46:44 +00:00 |
|
|
b59dd19b55
|
[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes (#26681)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-13 18:15:34 -07:00 |
|
|
3e051bda82
|
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-13 18:12:52 -07:00 |
|
|
8317f72354
|
[Misc][DP] support customized aggregated logger for dp (#24354)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-10-13 17:45:59 -07:00 |
|
|
d8bebb008a
|
Add tests for chunked prefill and prefix cache with causal pooling models (#26526)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Ayush Singh <ayush1009208@gmail.com>
|
2025-10-14 07:45:04 +08:00 |
|
|
35bc22f23c
|
[ResponseAPI] Further polish message serialization and unit tests (#26728)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-13 23:31:35 +00:00 |
|
|
fa96fb9c70
|
Pruning kernel Core Tests (#26727)
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
|
2025-10-13 23:08:18 +00:00 |
|
|
e3fdb627d9
|
[FrontEnd] UNREVERT CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26502)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
|
2025-10-13 22:47:16 +00:00 |
|
|
7200a21cd1
|
[Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' (#26532)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-13 18:26:37 -04:00 |
|
|
577c72a227
|
[CI Perf]Prune Tests in kernel/mamba (#26538)
Signed-off-by: Fardin Hoque <kfhfar@amazon.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-13 18:22:31 -04:00 |
|
|
314285d4f2
|
[CI] Fix mypy for vllm/distributed (#26593)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-13 16:02:24 -04:00 |
|
|
d2a7938582
|
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). (#26414)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-13 19:06:43 +00:00 |
|
|
89342ce4c0
|
[Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization (#26051)
Signed-off-by: Alex Kogan <alex.kogan@oracle.com>
Signed-off-by: Alex Kogan <82225080+sakogan@users.noreply.github.com>
|
2025-10-13 18:52:54 +00:00 |
|
|
f89f599395
|
[CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 (#26698)
|
2025-10-13 18:42:12 +00:00 |
|
|
e251e457c5
|
[Log] Optimize Startup Log (#26601)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-14 02:06:57 +08:00 |
|
|
afc47e4de7
|
[Model] Use merge_by_field_config for MM models (M-N) (#26710)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-14 01:27:01 +08:00 |
|
|
e3b90c1ba2
|
[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py (#26590)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
2025-10-13 17:17:13 +00:00 |
|
|
134f70b3ed
|
[Bugfix][Rocm] fix qr error when different inp shape (#25892)
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-10-13 10:04:21 -07:00 |
|
|
a1b2d658ee
|
[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (#26501)
Signed-off-by: Sangyeon Cho <josang1204@gmail.com>
|
2025-10-13 12:58:33 -04:00 |
|
|
5c7fe25491
|
[Misc] Separate prompt logging to debug (#26713)
Signed-off-by: Aleksei Tsvetkov <aitsvet@ya.ru>
|
2025-10-13 09:04:18 -07:00 |
|
|
53c9a7cee2
|
[P/D] [NixlConnector] kv load recovery integration (#26171)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-10-13 08:48:04 -07:00 |
|
|
0d21b9b51e
|
[UX] Speedup DeepGEMM warmup with heuristics (#25619)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-13 07:59:27 -07:00 |
|
|
10214b6935
|
[FEATURE]: Use pydantic validation in multimodal.py config (#26629)
Signed-off-by: Anand Roy <86306690+andycandy@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-13 07:56:59 -07:00 |
|
|
4a61950f4d
|
[Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError (#26693)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn
|
2025-10-13 07:56:01 -07:00 |
|
|
3263799056
|
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] (#26373)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
|
2025-10-13 10:24:53 -04:00 |
|
|
8e67b2557a
|
[Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph (#26687)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-13 03:21:48 -07:00 |
|
|
4073c82c4e
|
[ResponseAPI] Simplify input/output message serialization (#26620)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-10-13 09:59:15 +00:00 |
|
|
767c3ab869
|
[Model][0/N] Improve all pooling task | clean up (#25817)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-13 16:44:50 +08:00 |
|
|
4f207c7174
|
Ignore large reformatting PRs in git blame (#26690)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-13 01:20:47 -07:00 |
|
|
782505ed8e
|
[Model] Add reasoning_parser and tool_parser for Ernie45 thinking (#25027)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-10-13 15:55:20 +08:00 |
|
|
98f30b8cba
|
[Model] Fix Skywork R1V mlp (#26673)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-12 22:42:17 -07:00 |
|
|
3cd36660f7
|
docs: wrong command in structured_outputs README (#26677)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-10-12 20:59:01 -07:00 |
|
|
46ad73955a
|
[FIX] Throwing an exception when the model does not support pool tasks (#25840) (#25855)
Signed-off-by: zxw <1020938856@qq.com>
Co-authored-by: wang.yuqi <noooop@126.com>
|
2025-10-12 20:56:21 -07:00 |
|
|
41f3884438
|
[Bugfix][Core]Fix block table out-of-range issue in priority scheduling (#26661)
Signed-off-by: quanliu <18646313696@163.com>
|
2025-10-13 01:25:42 +00:00 |
|
|
60e419c1ee
|
[Misc] cache result of disable_inplace (#26666)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-10-13 00:17:50 +00:00 |
|
|
7ef6052804
|
[CI/Build] Add tool to build vllm-tpu wheel (#19165)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-12 16:25:40 -06:00 |
|
|
4fca1a1bd2
|
[easy] fix pre commit error on trunk (#26665)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-10-12 21:25:34 +00:00 |
|
|
a6049be73c
|
[Models][Qwen3VL] Speedup fast_pos_embed_interpolate (#26647)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-10-13 01:20:07 +08:00 |
|
|
18ed7746ea
|
[Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) (#26339)
Signed-off-by: gjgjos <gjgjos@naver.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-12 17:00:52 +00:00 |
|
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
|
9bb38130cb
|
[Bugfix] Fix GPU_ID issue in test script (#26442)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-10-12 11:39:05 +00:00 |
|
|
b91d8db873
|
[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP (#26574)
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
|
2025-10-12 09:58:38 +00:00 |
|
|
045b396d09
|
[Bugfix][CI/Build] Fix failing Mteb CI (#26638)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-12 02:42:42 -07:00 |
|
|
76852017ea
|
[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank (#25867)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-10-12 09:29:08 +00:00 |
|
|
82e64c7a20
|
[PERF] [Qwen3-next] Speed up gated RMSNorm (#26207)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-12 08:27:50 +00:00 |
|