frozenleaves/vllm - vllm - Gitea: Git for Me

mirror of https://github.com/vllm-project/vllm.git synced 2025-10-20 14:53:52 +08:00

Author	SHA1	Message	Date
jiahanc	41d3071918	[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-16 16:20:25 -07:00
Harry Mellor	fb5e10d3fb	Refactor Transformers backend to use mixins (#26906 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 21:50:39 +00:00
Bram Wasti	b2f78cbad4	[small][batch invariance] Rename the env and internal flags to simplify usage (#26855 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-10-16 21:40:25 +00:00
Wentao Ye	23583ee28c	[Bug] Add Assertion for `random-input-len` / `random-output-len` (#26834 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-16 21:36:39 +00:00
Michael Goin	01c977e96d	[CI] Prune Quantization Tests and skip compilation (#27038 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-16 17:26:35 -04:00
Wentao Ye	b3dda72c23	[Feature] Migrate DeepGEMM API from `get_m_alignment_for_contiguous_layout` to `get_mk_alignment_for_contiguous_layout` (#26935 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-16 16:46:48 -04:00
Varun Sundar Rabindranath	fb0571b077	[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-16 12:53:11 -07:00
Wentao Ye	2ed8b6b3d0	[Bug] Fix batch invariant test `has` to `is` (#27032 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-16 19:45:14 +00:00
kimbochen	013abde6ef	Adding Warmup to Benchmark Serving (#26943 ) Signed-off-by: Kimbo Chen <chentenghung@gmail.com>	2025-10-16 12:44:32 -07:00
Kyle Sayers	a5464dcf92	[Compressed Tensors] Always clone output for compile robustness (#26849 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-16 19:29:59 +00:00
Mandy Li	ac3ed5a815	Support block size of 256 used by Intel HPU (#26883 ) Signed-off-by: mandy-li <mandy.j.li@intel.com>	2025-10-16 15:10:57 -04:00
Andrew Xia	e6ba2000ae	[gpt-oss][1/N] EZ: refactor serving_responses for modularity (#26948 ) Signed-off-by: Andrew Xia <axia@meta.com>	2025-10-16 18:44:06 +00:00
Harry Mellor	aa255ff55a	Support `set` in the CLI generation (#27031 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 18:07:18 +00:00
ZiTian Zhao	7bb736d00e	Fix Qwen2.5 VL image grid docstring (#27033 ) Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>	2025-10-16 09:57:36 -07:00
Jee Jee Li	9f4e30904b	[Model] Fix Qwen3VL mm mapping (#27027 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-16 09:45:59 -07:00
rongfu.leng	5afd3276df	[Feature] Add process_weights_after_loading to AttentionImpl (#26870 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-10-16 08:02:30 -07:00
Tahsin Tunan	43721bc67f	[CI] Replace large models with tiny alternatives in tests (#24057 ) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 15:51:27 +01:00
Kay Yan	02d709a6f1	[docs] standardize Hugging Face env var to `HF_TOKEN` (deprecates `HUGGING_FACE_HUB_TOKEN`) (#27020 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-10-16 15:31:02 +01:00
Mark McLoughlin	4a510ab487	[NIXL] Improve request_finished() debug logs (#25665 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-10-16 15:55:17 +02:00
Matthew Bonanni	314fa8abbf	[Attention] Tune CUTLASS MLA num_splits (#26846 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-16 06:36:09 -07:00
Cyrus Leung	334535b6fb	[Benchmark] Show E2EL by default for pooling models (#27014 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 12:47:09 +00:00
bogdanm	dcbb3f1871	[Bugfix] Correct LayerNorm epsilon parameter in modernbert.py (#27008 ) Signed-off-by: bogdanm <152898065+bogdan01m@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-16 12:27:44 +00:00
Sungjae Lee	00417f4e44	[MISC] fix import violations for re and triton modules (#26654 ) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-10-16 03:38:27 -07:00
Lukas Geiger	ed344f4116	Cleanup code after Python 3.10 upgrade (#26520 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-16 03:38:23 -07:00
CSWYF3634076	e51928793e	[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization (#26885 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-10-16 03:37:35 -07:00
Cyrus Leung	d2740fafbf	[Chore] Separate out `vllm.utils.collections` (#26990 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 08:35:35 +00:00
Cyrus Leung	17838e50ef	[Benchmark] Use truncation by default for pooling benchmarks (#26992 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 16:02:39 +08:00
Zhewen Li	44c8555621	[CI/Build] Fix AMD import failures in CI (#26841 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-16 07:28:20 +00:00
Akash kaothalkar	f7d318de2b	[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling (#26987 ) Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com>	2025-10-15 22:36:59 -07:00
Cyrus Leung	76f0d05bc6	[CI/Build] Update expected beam search output for Phi3V (#26978 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 05:12:44 +00:00
Bram Wasti	7d8975de84	Deepseek-v3 Batch Invariant on 8xH100 (#26609 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-15 22:06:02 -07:00
Vadim Gimpelson	785d8b6410	[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) (#26437 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-10-16 12:18:31 +08:00
Cyrus Leung	f6cdc9a02f	[Chore] Rename `utils` submodules (#26920 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 03:58:13 +00:00
Chendi.Xue	509cdc0370	[DOC][XPU]update feature parity with Intel GPU (#26954 ) Signed-off-by: Chendi Xue <Chendi.Xue@intel.com> Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-10-15 20:07:10 -07:00
Richard Zou	9b6504c307	[BugFix] Work around graph partition x torch.compile cache issue (#26956 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-10-15 20:06:11 -07:00
Angela Yi	e19b16dde6	[bugfix] Fix SP + PP without specifying compile size (#26955 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2025-10-15 20:05:33 -07:00
ahao-anyscale	582f2c6be7	[BUG] Allow runai_streamer_sharded in config check (#26958 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-10-15 20:05:14 -07:00
Michael Goin	f8a0acbdbe	[CI] Enable Blackwell Llama4 MoE tests (#26731 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-15 21:02:57 -06:00
kliuae	1317034379	[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097 ) Signed-off-by: chenjun <junchen2@amd.com> Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com> Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-10-16 10:41:34 +08:00
InChang Jeong	0ecc553ee6	[Bugfix] reasoning_parser parameter handling in run_batch.py (#26225 ) Signed-off-by: inc-jeong <inc.jeong@navercorp.com> Signed-off-by: InChang Jeong <inc.jeong@navercorp.com> Co-authored-by: USER <user@AL02367916.local>	2025-10-16 10:24:05 +08:00
felixzhu555	f96bc3649c	[Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 (#26887 ) Signed-off-by: Felix Zhu <felixzhu555@gmail.com>	2025-10-15 18:55:05 -07:00
Alexei-V-Ivanov-AMD	938c43ea7f	[ci] Adjusting AMD test composition 2025-10-14 (#26852 ) Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>	2025-10-15 23:52:13 +00:00
Adrian Abeyta	0a9ef0cfce	Move query quantization to attention layer for Flashinfer & Triton. (#26534 ) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Adrian Abeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 19:01:38 -04:00
Wentao Ye	e5b438a247	[Bug] Temporally Disable `VLLM_ALLREDUCE_USE_SYMM_MEM` by Default (#26925 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-15 16:18:50 -04:00
XiaobingZhang	0b99f5d302	support flashinfer_fp4 moe for 5090 gpu (#26669 ) Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-15 15:06:47 -04:00
Benji Beck	1f491aa0c8	Vectorize RMS norm variance using vectorize_read_with_alignment (#26234 ) Signed-off-by: Benji Beck <benjibeck@meta.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-15 11:54:41 -07:00
Kaixi Hou	de92d916fe	[NVIDIA] Add support for cudnn fp4 gemm via flashinfer (#26107 ) Signed-off-by: kaixih <kaixih@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-15 13:53:00 -04:00
Woosuk Kwon	a1063628a4	[Chore] Clean up CODEOWNERS (#26923 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-10-15 10:52:54 -07:00
XiaobingZhang	d796375258	[ModelOpt] Remove NVFP4 MoE K%16==0 constraint (#26891 ) Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com>	2025-10-15 13:06:17 -04:00
Sam/Samuel	14f8456344	[Feature]: Use pydantic validation in observability.py config (#26637 ) Signed-off-by: Samuel Wu <cernunnos1710@gmail.com> Signed-off-by: Sam/Samuel <57896620+cern1710@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-15 16:44:03 +00:00

1 2 3 4 5 ...

10532 Commits