frozenleaves/vllm - vllm - Gitea: Git for Me

mirror of https://github.com/vllm-project/vllm.git synced 2025-10-20 14:53:52 +08:00

Author	SHA1	Message	Date
Michael Yao	2e36cdbe2b	[Docs] Add a start tag to build.inc.md (#26747 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-10-13 21:51:55 -07:00
Maximilien de Bayser	fe3edb4cf0	Add support for the /rerank endpoint in vllm bench serve (#26602 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-10-14 04:25:43 +00:00
Heng Guo	29350922c6	[Feature][Quantization] auto_round format add support for regex (#24024 ) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-14 03:03:16 +00:00
Varun Sundar Rabindranath	8ae169286f	[torch.compile] Unwrap fused_marlin_moe custom op (#26739 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-14 02:22:16 +00:00
youkaichao	8a0af6a561	[build][torch.compile] upgrade depyf version (#26702 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-10-14 10:12:09 +08:00
Jialin Ouyang	cfded80793	[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-14 01:46:44 +00:00
Angela Yi	b59dd19b55	[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes (#26681 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-10-13 18:15:34 -07:00
Michael Goin	3e051bda82	[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-13 18:12:52 -07:00
Lucia Fang	8317f72354	[Misc][DP] support customized aggregated logger for dp (#24354 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-10-13 17:45:59 -07:00
Maximilien de Bayser	d8bebb008a	Add tests for chunked prefill and prefix cache with causal pooling models (#26526 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Ayush Singh <ayush1009208@gmail.com>	2025-10-14 07:45:04 +08:00
Jialin Ouyang	35bc22f23c	[ResponseAPI] Further polish message serialization and unit tests (#26728 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-13 23:31:35 +00:00
Fardin Hoque	fa96fb9c70	Pruning kernel Core Tests (#26727 ) Signed-off-by: Fardin Hoque <kfhfar@amazon.com>	2025-10-13 23:08:18 +00:00
Morrison Turnansky	e3fdb627d9	[FrontEnd] UNREVERT CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops (#26502 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>	2025-10-13 22:47:16 +00:00
Wentao Ye	7200a21cd1	[Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' (#26532 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-13 18:26:37 -04:00
Fardin Hoque	577c72a227	[CI Perf]Prune Tests in kernel/mamba (#26538 ) Signed-off-by: Fardin Hoque <kfhfar@amazon.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-13 18:22:31 -04:00
Wentao Ye	314285d4f2	[CI] Fix mypy for `vllm/distributed` (#26593 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-13 16:02:24 -04:00
wang.yuqi	d2a7938582	[Frontend][1/N] Improve all pooling task \| Support FP16 Embedding Base64 (Still uses fp32 by default). (#26414 ) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-10-13 19:06:43 +00:00
Alex Kogan	89342ce4c0	[Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization (#26051 ) Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Signed-off-by: Alex Kogan <82225080+sakogan@users.noreply.github.com>	2025-10-13 18:52:54 +00:00
Yibo Cai	f89f599395	[CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 (#26698 )	2025-10-13 18:42:12 +00:00
Wentao Ye	e251e457c5	[Log] Optimize Startup Log (#26601 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-14 02:06:57 +08:00
Cyrus Leung	afc47e4de7	[Model] Use merge_by_field_config for MM models (M-N) (#26710 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-14 01:27:01 +08:00
Rahul Tuli	e3b90c1ba2	[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py (#26590 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-10-13 17:17:13 +00:00
haoyangli-amd	134f70b3ed	[Bugfix][Rocm] fix qr error when different inp shape (#25892 ) Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-10-13 10:04:21 -07:00
Sangyeon Cho	a1b2d658ee	[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (#26501 ) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>	2025-10-13 12:58:33 -04:00
Aleksei Tsvetkov	5c7fe25491	[Misc] Separate prompt logging to debug (#26713 ) Signed-off-by: Aleksei Tsvetkov <aitsvet@ya.ru>	2025-10-13 09:04:18 -07:00
Will Eaton	53c9a7cee2	[P/D] [NixlConnector] kv load recovery integration (#26171 ) Signed-off-by: Will Eaton <weaton@redhat.com>	2025-10-13 08:48:04 -07:00
Michael Goin	0d21b9b51e	[UX] Speedup DeepGEMM warmup with heuristics (#25619 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-13 07:59:27 -07:00
Anand Roy	10214b6935	[FEATURE]: Use pydantic validation in `multimodal.py` config (#26629 ) Signed-off-by: Anand Roy <86306690+andycandy@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-13 07:56:59 -07:00
ihb2032	4a61950f4d	[Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError (#26693 ) Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com> Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn	2025-10-13 07:56:01 -07:00
Bram Wasti	3263799056	[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] (#26373 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com>	2025-10-13 10:24:53 -04:00
Isotr0py	8e67b2557a	[Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph (#26687 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-13 03:21:48 -07:00
Jialin Ouyang	4073c82c4e	[ResponseAPI] Simplify input/output message serialization (#26620 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-13 09:59:15 +00:00
wang.yuqi	767c3ab869	[Model][0/N] Improve all pooling task \| clean up (#25817 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-13 16:44:50 +08:00
Harry Mellor	4f207c7174	Ignore large reformatting PRs in `git blame` (#26690 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-13 01:20:47 -07:00
CSWYF3634076	782505ed8e	[Model] Add reasoning_parser and tool_parser for Ernie45 thinking (#25027 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-10-13 15:55:20 +08:00
Jee Jee Li	98f30b8cba	[Model] Fix Skywork R1V mlp (#26673 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-12 22:42:17 -07:00
yihong	3cd36660f7	docs: wrong command in structured_outputs README (#26677 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-10-12 20:59:01 -07:00
yyzxw	46ad73955a	[FIX] Throwing an exception when the model does not support pool tasks (#25840 ) (#25855 ) Signed-off-by: zxw <1020938856@qq.com> Co-authored-by: wang.yuqi <noooop@126.com>	2025-10-12 20:56:21 -07:00
quanliu	41f3884438	[Bugfix][Core]Fix block table out-of-range issue in priority scheduling (#26661 ) Signed-off-by: quanliu <18646313696@163.com>	2025-10-13 01:25:42 +00:00
bnellnm	60e419c1ee	[Misc] cache result of disable_inplace (#26666 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-13 00:17:50 +00:00
Michael Goin	7ef6052804	[CI/Build] Add tool to build vllm-tpu wheel (#19165 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-12 16:25:40 -06:00
Huamin Li	4fca1a1bd2	[easy] fix pre commit error on trunk (#26665 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-12 21:25:34 +00:00
Lukas Geiger	a6049be73c	[Models][Qwen3VL] Speedup `fast_pos_embed_interpolate` (#26647 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-13 01:20:07 +08:00
gjgjos	18ed7746ea	[Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) (#26339 ) Signed-off-by: gjgjos <gjgjos@naver.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-10-12 17:00:52 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Chendi.Xue	9bb38130cb	[Bugfix] Fix GPU_ID issue in test script (#26442 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-10-12 11:39:05 +00:00
Jaya Yuan	b91d8db873	[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP (#26574 ) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>	2025-10-12 09:58:38 +00:00
Isotr0py	045b396d09	[Bugfix][CI/Build] Fix failing Mteb CI (#26638 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-12 02:42:42 -07:00
wang.yuqi	76852017ea	[MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank (#25867 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-12 09:29:08 +00:00
Vadim Gimpelson	82e64c7a20	[PERF] [Qwen3-next] Speed up gated RMSNorm (#26207 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-12 08:27:50 +00:00

... 3 4 5 6 7 ...

10602 Commits