b6553be1bc
[Misc] Slight improvement of the BNB ( #19418 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-10 13:51:49 +00:00
64a9af5afa
Simplify ep kernels installation ( #19412 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-06-10 20:06:08 +08:00
e4248849ec
[BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral ( #19411 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-06-10 12:02:40 +00:00
467bef18a3
[BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword use_irope ( #19134 )
...
Signed-off-by: Yunqiu Guo <guorachel@meta.com >
2025-06-10 16:48:51 +08:00
5f1ac1e1d1
Revert "[v1] Add fp32 support to v1 engine through flex attn" ( #19404 )
2025-06-10 01:30:20 -07:00
9368cc90b2
Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. ( #17930 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Co-authored-by: Li, Jiang <bigpyj64@gmail.com >
2025-06-10 06:22:05 +00:00
32b3946bb4
Add clear documentation around the impact of debugging flag ( #19369 )
...
Signed-off-by: Anna Pendleton <pendleton@google.com >
2025-06-10 06:16:09 +00:00
6b1391ca7e
[Misc] refactor neuron_multimodal and profiling ( #19397 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-10 06:12:42 +00:00
a3f66e75d1
Add security warning to bug report template ( #19365 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-06-10 06:06:36 +00:00
319cb1e351
[Core] Batch multi modal input using pinned memory ( #19169 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-06-10 13:44:59 +08:00
1efef71645
[Bugfix] Fix modelscope token passed in ( #19389 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-10 13:39:37 +08:00
646d62f636
[Core] Use tuple for kv cache group block ids ( #19175 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-10 07:01:17 +02:00
6cd4ae8acd
[Frontend] Add tqdm_leave_pbar to control progress bar visibility ( #19357 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-10 04:55:09 +00:00
c016047ed7
Fix docs/mkdocs/hooks/remove_announcement.py ( #19382 )
2025-06-09 21:36:54 -07:00
9af6d22e4c
Use xla flag to improve the quantized model performance ( #19303 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-06-10 01:28:45 +00:00
4589b94032
[Bugfix] Fix benchmark_moe.py ( #19016 )
...
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn >
2025-06-09 18:04:36 -07:00
cc867be19c
[V1] Reuse V0's memory_profiling util for gpu worker memory profiling ( #19312 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-06-10 08:40:01 +08:00
3a7cd627a8
[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration ( #19383 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-06-09 16:41:51 -07:00
8058c91108
[HOT-FIX] Add kv_sharing_target_layer_name argument to cutlass_mla backend ( #19374 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-06-09 19:00:07 -04:00
7d44c469fe
[TPU]Fix KV cache sharing tests ( #19371 )
2025-06-09 18:38:15 -04:00
31f58be96a
[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var ( #18472 )
...
Signed-off-by: liusiqian <liusiqian@tal.com >
2025-06-09 21:41:21 +00:00
ebb2f383b8
[Quantization] Bump compressed-tensors version ( #19295 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-06-09 14:33:15 -07:00
c1c7dbbeeb
[Bugfix][Core] Prevent token lengths exceeding max_model_len in V0 ( #19348 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-06-09 23:01:29 +08:00
5cf2daea9a
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. ( #19298 )
...
Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun <vsundarr@redhat.com >
2025-06-09 10:50:39 -04:00
b8089195b4
[v1] Add fp32 support to v1 engine through flex attn ( #19319 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-06-09 22:10:44 +08:00
770e5dcdb8
[full_graph] Fix query_start_loc padding ( #19321 )
...
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai >
2025-06-09 21:32:56 +08:00
c57c9415b1
[Docs] Fix a bullet list in usage/security.md ( #19358 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-06-09 13:28:51 +00:00
01810f9236
[CI] Introduce rules for llama auto-label ( #19323 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-09 20:05:42 +08:00
59abbd84f9
[Fix] Allow kernel compilation for CUDA capability 8.7 ( #19328 )
...
Signed-off-by: Conroy Cheers <conroy@corncheese.org >
2025-06-09 02:57:23 -07:00
95a6568b5c
[CI/Build] Fix LoRA test ( #19350 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-09 09:52:10 +00:00
0eca5eacd0
[Doc] Fix description in the Automatic Prefix Caching design doc ( #19333 )
...
Signed-off-by: cr7258 <chengzw258@163.com >
2025-06-09 17:30:02 +08:00
12e5829221
[doc] improve ci doc ( #19307 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-09 07:26:12 +00:00
3a4d417707
[Misc] Cleanup compilation tests ( #19343 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-06-09 15:05:44 +08:00
8335667c22
[Frontend] Remove unreachable code from llm.py ( #19288 )
...
Signed-off-by: KsuParkhamchuk <k.parkhamchuk@gmail.com >
2025-06-09 10:22:10 +08:00
e1c4380d4c
[Misc] Add documentation update reminder to PR template ( #19289 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-09 10:20:53 +08:00
e31ae3de36
[Deprecation] Remove inputs arg fallback in Engine classes ( #18799 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-09 10:19:56 +08:00
2ffb9b6e07
[Bugfix] model_max_length should consider max_model_len in tokenizer_config ( #19201 )
2025-06-08 07:17:53 -07:00
cda10fa3e2
[Multi Modal] Add an env var for message queue max chunk bytes ( #19242 )
...
Signed-off-by: yZhen <yZhen@fb.com >
Co-authored-by: yZhen <yZhen@fb.com >
2025-06-08 21:39:12 +08:00
c123bc33f9
[Quantization] Add compressed-tensors NVFP4 support ( #18312 )
2025-06-08 09:05:55 -04:00
b9a1791e2c
[Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection ( #19082 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-06-08 09:17:14 +00:00
989dcee981
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B ( #19315 )
...
Signed-off-by: Xu Wenqing <xuwq1993@qq.com >
2025-06-08 16:07:02 +08:00
3d64d366e0
[Misc] Change tests/compile to use VLLM_V1 by default ( #19302 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-06-08 16:06:48 +08:00
eaa2e51088
[Bugfix] Re-enable use_cudagraph in vLLM v1 ( #19299 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-06-08 08:56:12 +08:00
d77f7fb871
[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer ( #19283 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-06-08 08:16:31 +08:00
2d8476e465
[BugFix][V1] Fix memory profiling bug ( #18974 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-06-07 10:34:51 -07:00
88be823d57
[AMD] Update compatible packaging version ( #19309 )
...
Signed-off-by: pramkuma <Pramendra.Kumar@amd.com >
2025-06-07 20:55:09 +08:00
4e4f63ad45
[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py ( #19311 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2025-06-07 18:25:38 +08:00
d2f0e7e615
[CI/Build] Improve Llama GGUF test robustness ( #19287 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-07 17:23:28 +08:00
122cdca5f6
[Misc] refactor context extension ( #19246 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-07 05:13:21 +00:00
cf02f9b283
Add FlexAttention to V1 ( #16078 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com >
2025-06-06 21:58:55 -07:00
c4296b1a27
[CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py ( #19253 )
...
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com >
2025-06-07 11:52:52 +08:00
66c508b137
[TPU][Test] Add script to run benchmark on TPU for buildkite ( #19039 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-06-06 20:10:24 -07:00
84166fee97
[Kernel] Integrate CUTLASS MoE kernel with PPLX ( #18762 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-06-06 18:26:11 -07:00
6e0cd10f72
[Easy][Test] Simplify test_function_tool_use with multiple parametrizes ( #19269 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-07 09:19:09 +08:00
e010688f50
[Build][ROCm] Update Dockerfile.rocm ( #19296 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-06-06 19:35:16 -04:00
441b65d8c7
[Misc][Tools][Benchmark] Fix and improve auto tune script ( #19163 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-06-06 23:31:19 +00:00
46ecc57973
[BugFix] Fix tpu_model_runner block_id concatenation ( #19228 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-06 16:28:17 -07:00
b6a3a9f76d
[Core] Fix abrupt request abort ( #18485 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-06-06 16:27:59 -07:00
ca27f0f9c1
[Bugfix][Core] Update cancellation logic in generate() to handle Generator exits ( #19225 )
...
Co-authored-by: Adolfo Victoria <adovi@meta.com >
2025-06-06 20:17:54 +00:00
aad30bd306
[BugFix] Fix MultiConnector test after HMA changes ( #19291 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-06 20:16:24 +00:00
94ecee6282
Fixed ppc build when it runs on non-RHEL based linux distros ( #18422 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com >
Co-authored-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
2025-06-06 11:54:26 -07:00
8267f9916f
improve logits bias ( #19041 )
2025-06-06 19:59:25 +08:00
7353492a47
[Core] Raise when non-multi-instance DP clients target a DP rank ( #19227 )
...
Signed-off-by: Jon Swenson <jmswen@gmail.com >
2025-06-06 19:03:01 +08:00
7661e92ef8
[Model] Optimize nemotron_h implementation ( #19249 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-06 10:05:14 +00:00
f168b85725
Unit Test for run_dp_sharded_vision_model ( #19103 )
...
Signed-off-by: Siqi Yan <siqi@meta.com >
Co-authored-by: Siqi Yan <siqi@meta.com >
2025-06-06 16:24:02 +08:00
da511d54d8
Fix CompilationConfig repr ( #19091 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-06-06 16:23:35 +08:00
65c69444b1
[Docs] Improve V1 KVConnector interface documentation ( #19172 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-06 16:22:45 +08:00
94870359cd
[Quantization] Bump compressed-tensors version; update NVFP4A16 test model ( #19224 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
2025-06-06 01:21:54 -07:00
0d49483ea9
[TPU] fix kv cache dtype in model runner ( #19244 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-06-06 16:20:16 +08:00
90b78ec5f9
[v1][P/D] Fix a edge case in kv cache schedule ( #19182 )
...
Co-authored-by: jinghui <jinghui@fb.com >
2025-06-05 23:32:55 -07:00
91a2ef98ea
[Chore] update CODEOWNERS ( #19247 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-06-06 06:09:43 +00:00
3da2313d78
Support allowed_token_ids in ChatCompletionRequest ( #19143 )
...
Signed-off-by: Xu Song <xusong.vip@gmail.com >
2025-06-06 05:06:48 +00:00
b61dc5f972
[TPU] update torch_xla pin ( #19231 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-06-06 04:27:38 +00:00
f8a1a2d108
[v1] Hybrid Memory Allocator ( #17996 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-05 20:47:09 -07:00
3465b87ef8
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B ( #19033 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-06-05 19:10:08 -07:00
c8134bea15
Fix AOPerModuleConfig name changes ( #18869 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-06-05 18:51:32 -07:00
cb6d572e85
[Model] NemotronH support ( #18863 )
...
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com >
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com >
2025-06-05 21:29:28 +00:00
87360308b7
[V1] Use FlashInfer by default on Blackwell GPUs ( #19118 )
2025-06-05 15:40:39 -04:00
aa49f14832
[Quantization] Skip Fp4 Test for compressed-tensors ( #19217 )
2025-06-05 18:21:53 +00:00
9ef9173cfa
[P/D][NixlConnector] Enable FlashInfer backend ( #19090 )
2025-06-05 17:10:15 +00:00
85e2b7bb13
[MISC][Bugfix] Use less CPU when message queue has been empty for some time ( #16226 )
...
Signed-off-by: Povilas Kanapickas <povilas@radix.lt >
2025-06-05 16:53:08 +00:00
61059bee40
[Hardware][NVIDIA] FP4 MoE kernel optimization ( #19110 )
...
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com >
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com >
2025-06-05 09:48:26 -07:00
ec89524f50
Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 ( #19205 )
2025-06-05 16:38:54 +00:00
f20f9f063b
[mistral_common] Add v11 tokenizer ( #19193 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-06-05 08:27:41 -07:00
9bc8bb07cf
[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided ( #19202 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-06-05 12:59:28 +00:00
1aeb925f34
[Frontend] improve vllm run-batch --help display ( #19187 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-05 11:16:25 +00:00
188a4590d8
[Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly ( #19105 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-06-05 11:14:32 +00:00
18093084be
[Misc] Remove unnecessary fallback to prefill-decode attention ( #19138 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-06-05 16:08:26 +08:00
da40380214
[Build] Annotate wheel and container path for release workflow ( #19162 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-04 23:24:56 -07:00
8fc57501d3
[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled ( #19135 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-06-05 06:24:24 +00:00
af7fc84fd2
[BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 ( #19171 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-06-05 13:41:25 +08:00
0678b52251
Handle non-serializable objects when dumping benchmark results ( #19114 )
2025-06-04 22:40:04 -07:00
25b918eee6
[Torch Nightly]add missing dependency ( #18770 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-06-04 21:56:12 -07:00
a408820f2f
[Bugfix] Fix port handling in make_zmq_path ( #19117 )
2025-06-04 21:00:59 -06:00
c56ed8bb0e
[Bugfix][Nixl] Fix full prefix cache hit bug ( #18632 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-06-05 02:07:32 +00:00
78dcf56cb3
[doc] small fix ( #19167 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-05 09:13:50 +08:00
b2fac67130
[P/D] Heterogeneous TP ( #18833 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-06-04 23:25:34 +00:00
23027e2daf
[Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM ( #18817 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-06-04 15:37:25 -07:00
c3fd4d669a
[Kernel] Integrate batched/masked deepgemm kernel ( #19111 )
...
Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun <vsundarr@redhat.com >
2025-06-04 21:59:18 +00:00
ef3f98b59f
[Bugfix] fix v1 cpu worker fails on macOS ( #19121 )
2025-06-04 20:17:38 +00:00
7ee2590478
[TPU] Update dynamo dump file name in compilation test ( #19108 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-06-04 16:13:43 -04:00
53a5a0ce30
[Perf] Tunings for SM100 FP8 CUTLASS kernel ( #18778 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-04 10:46:28 -07:00
d459fae0a2
[Bugfix][EP+DP] Fix internode check ( #19112 )
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-04 23:39:23 +08:00
c8dcc15921
Allow AsyncLLMEngine.generate to target a specific DP rank ( #19102 )
...
Signed-off-by: Jon Swenson <jmswen@gmail.com >
2025-06-04 08:26:47 -07:00
8f4ffbd373
[Doc] Update V1 Guide for embedding models ( #19141 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-04 22:57:55 +08:00
5f2cd251d2
Sm100 blockwise fp8 swap ab ( #18564 )
2025-06-04 07:48:45 -07:00
02658c2dfe
Add DeepSeek-R1-0528 function call chat template ( #18874 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
2025-06-04 13:24:18 +00:00
01dc9a76db
[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 ( #18678 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-04 04:49:20 -07:00
35cf32df30
Improve the output precision of embedding models ( #19092 )
2025-06-04 11:48:57 +00:00
8711bc5e68
[Misc] Add packages for benchmark as extra dependency ( #19089 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-04 04:18:48 -07:00
2669a0d7b5
Fix ValueError: Missing value for tag key(s): model_name,engine. ( #19113 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-06-04 17:10:45 +08:00
8e972d9c44
[TPU] Skip hanging tests ( #19115 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-06-04 01:43:00 -07:00
3336c8cfbe
Fix #19130 ( #19132 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-06-04 01:42:06 -07:00
b124e1085b
[Bugfix] Fix FA3 full cuda graph correctness ( #19106 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-06-03 23:10:15 -07:00
41aa578428
[NVIDIA] Add Cutlass MLA backend ( #17625 )
2025-06-03 21:40:26 -07:00
8d646c2e53
[Cleanup][v1]:remote guided-decoding-backend for example ( #19059 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-06-04 04:23:26 +00:00
5d6d1adf15
[KERNEL] Sampler. CUDA kernel for applying repetition penalty ( #18437 )
2025-06-03 21:13:01 -07:00
1409ef9134
[Core] Cast multimodal input in hf processor ( #18862 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-06-03 20:24:56 -07:00
4555143ea7
[CPU] V1 support for the CPU backend ( #16441 )
2025-06-03 18:43:01 -07:00
52dceb172d
[Docs] Add developer doc about CI failures ( #18782 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-06-04 01:09:13 +00:00
abd7df2fca
[Misc] Fix path and python alias errors in disagg_prefill exmaples ( #18919 )
2025-06-03 17:15:18 -07:00
b712be98c7
feat: add data parallel rank to KVEventBatch ( #18925 )
2025-06-03 17:14:20 -07:00
a8da78eac9
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers ( #19029 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-04 00:14:06 +00:00
5d96533e22
[Bugfix][P/D] Fix Prefix Cache Bug ( #18411 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-06-03 23:53:16 +00:00
4de790fcad
[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled ( #19075 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-06-03 23:27:24 +00:00
b5fd9506c1
[Bugfix] get_num_blocks_to_allocate with null_block ( #19031 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-03 15:30:55 -07:00
135cf55cd1
[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix ( #18971 )
2025-06-03 15:26:33 -07:00
6cac54f4d1
[v1] Re-init input batch for multiple kv cache groups ( #18654 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-03 21:41:36 +00:00
6865fe0074
Fix interaction between Optional and Annotated in CLI typing ( #19093 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Yikun Jiang <yikun@apache.org >
2025-06-03 21:07:19 +00:00
e31446b6c8
[Perf] Tune scaled_fp8_quant by increasing vectorization ( #18844 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-03 13:48:25 -07:00
bdf13965ab
[V1] Support cross-layer KV sharing ( #18212 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-06-03 20:33:07 +00:00
fa98d77773
[Kernel] DeepEP dispatch-combine kernel integration ( #18434 )
...
Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-06-03 12:30:02 -07:00
01eee40536
[doc] update docker version ( #19074 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-03 19:08:21 +00:00
19bdaf32b1
[Doc] Readme standardization ( #18695 )
...
Co-authored-by: Soren Dreano <soren@numind.ai >
2025-06-03 11:50:55 -07:00
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-06-03 11:20:17 -07:00
d054da1992
[Misc] fix: add miss best_of param validation ( #18555 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-06-03 11:02:07 -07:00
4b7817c119
[Misc] Add missing _Backend enums ( #19081 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-06-03 16:15:16 +00:00
d00dd65cd4
[Doc] Improve the Pull Request template with key components ( #19086 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-03 23:44:34 +08:00
d81edded69
[Bugfix] disable processor cache ( #19068 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2025-06-03 15:06:04 +00:00
476844d44c
Fix underscores in dict keys passed via CLI ( #19030 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-06-03 14:39:24 +00:00
4e68ae5e59
[CI/Build] Remove V0 LoRA test ( #19066 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-03 14:30:18 +00:00
4e88723f32
[doc] clarify windows support ( #19088 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-06-03 21:42:17 +08:00
118ff92111
[Doc] Update V1 user guide for embedding and enc-dec models ( #19060 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-03 02:29:41 -07:00
ec2dcd80bc
[Misc] Update WeightsMapper for qwen2-vl/qwen2.5-vl ( #19054 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-03 09:08:20 +00:00
42243fbda0
[Doc] Add InternVL LoRA support ( #19055 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-03 09:08:03 +00:00
6d18ed2a2e
Update docker docs with ARM CUDA cross-compile ( #19037 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
2025-06-03 08:21:53 +00:00
f32fcd9444
[v1][KVCacheManager] Rename BlockHashType to BlockHash ( #19015 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-03 08:01:48 +00:00
d32aa2e670
[Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure ( #19019 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-03 00:16:17 -07:00
cc977286e7
Reduce logs in CLI scripts and plugin loader ( #18970 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-03 06:00:45 +00:00
17430e3653
[bugfix] small fix logic issue ( #18999 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-03 05:35:12 +00:00
1282bd812e
Add tarsier model support ( #18985 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-06-03 13:13:13 +08:00
bdce64f236
[V1] Support DP with Ray ( #18779 )
2025-06-02 21:15:13 -07:00
9e6f61e8c3
[ROCm][Build] Clean up the ROCm build ( #19040 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-06-02 20:47:47 -07:00
8655f47f37
[CPU][CI] Re-enable the CPU CI tests ( #19046 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-06-02 20:46:47 -07:00
4ce42f9204
Adding "LoRA Test %N" to AMD production tests ( #18929 )
...
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu >
2025-06-02 20:46:44 -07:00
8a57872b2a
[Bugfix][EP+DP] Use pplx-kernel internode instead of intranode ( #19034 )
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-06-03 11:36:51 +08:00
5bc1ad6cee
[Doc] Remove duplicate TOCs during MkDocs migration ( #19021 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-06-02 19:49:48 -07:00
9112b443a0
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD ( #18011 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
2025-06-03 00:06:20 +00:00
c57d577e8d
add an absolute path for run.sh ( #18258 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-06-02 19:38:23 +00:00
ca2f6b9c30
[Bugfix][Model] Attempt to fix eagle in V0. ( #18978 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-06-02 08:15:53 -07:00
20133cfee2
[Frontend] enable custom logging for the uvicorn server (OpenAI API server) ( #18403 )
...
Signed-off-by: François Paupier <francois.paupier@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-06-02 15:04:23 +00:00
ebb1ec9318
[Model] enable data parallel for Llama4 vision encoder ( #18368 )
...
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com >
Co-authored-by: yZhen <yZhen@fb.com >
Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com >
2025-06-02 19:22:54 +08:00
5b168b6d7a
[doc] add pytest tips ( #19010 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-02 11:07:26 +00:00
9760fd8f6a
[Core] Support inplace model weights loading ( #18745 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-06-02 17:38:50 +08:00
b9f61e1387
[Bugfix][Nixl] Fix DP Metadata Handshake ( #19008 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-06-02 03:30:41 +00:00
d6fd3a33b8
[Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context ( #18935 )
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-01 19:41:18 +00:00
432ec9926e
[doc] wrong output ( #19000 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-01 11:26:14 +00:00
2b102d51ad
[BugFix] Fix incorrect metrics shutdown error log message ( #18992 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-01 11:42:23 +08:00
aa54a7bf7b
[BugFix] fix data parallel construct ipv6 url addres ( #18991 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-06-01 11:42:10 +08:00
2ad6194a02
Let max_num_batched_tokens use human_readable_int for large numbers ( #18968 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-01 11:41:29 +08:00
c594cbf565
[doc] small fix - mkdocs ( #18996 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-31 20:23:43 -07:00
a35ca765a5
[LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components ( #18987 )
...
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-01 11:06:57 +08:00
6aa8f9a4e7
[Core] Rework dtype resolution ( #18751 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-01 11:04:23 +08:00
1bc86a3da1
[Bugfix] Fix EAGLE3 broken logits ( #18909 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-05-31 19:58:07 -07:00
bbfa0c61d1
[Misc][Benchmark] Add support for CustomDataset ( #18511 )
2025-05-31 19:07:38 +00:00
20079c6e36
[Misc] add return token strs for tokenize ( #18941 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-31 18:00:11 +00:00
9a1b9b99d7
[BugFix] Fix multi-node offline data-parallel ( #18981 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com >
2025-05-31 08:34:52 -07:00
8bf507d766
[P/D] NixlConnector use cache device index for memory registration ( #18969 )
...
Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com >
2025-05-31 11:19:18 -04:00
306d60401d
[ROCm][Kernel] Add gfx950 support for skinny gemms ( #18010 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-05-31 07:40:05 -07:00
f2c3f66d59
[Bugfix] Fix for issue 17396 ( #18773 )
...
Signed-off-by: Fred Reiss <frreiss@us.ibm.com >
2025-05-31 11:58:17 +00:00
0f5e0d567e
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 ( #18825 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-31 03:39:31 -07:00
c55d804672
[BugFix] Pydantic part 2 ( #18911 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-05-31 03:39:28 -07:00
749f5bdd38
[doc] fix the list rendering issue - security.md ( #18982 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-31 10:39:21 +00:00
2a50ef5760
[Neuron] Add Multi-Modal model support for Neuron ( #18921 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com >
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com >
Co-authored-by: FeliciaLuo <luof@amazon.com >
Co-authored-by: Elaine Zhao <elaineyz@amazon.com >
2025-05-31 10:39:11 +00:00
b8b904795d
fix security issue of logging llm output ( #18980 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-05-31 10:38:56 +00:00
ba5111f237
[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled ( #18879 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-31 09:20:54 +00:00
1e123529d7
[Misc] Fix estimated max model len msg ( #18966 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-31 16:43:44 +08:00
dff80b0e42
[Frontend] Add rerank support to run_batch endpoint ( #16278 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
2025-05-31 07:40:01 +00:00
7782464a17
create util function for batched arange ( #18937 )
2025-05-31 13:50:38 +08:00
0f71e24034
[Docs] Correct multiprocessing design doc ( #18964 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-31 01:30:15 +00:00
1dab4d5718
Tool parser regex timeout handling ( #18960 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-05-30 21:02:54 +00:00
7f21e8052b
[Misc] add group_size is -1 in awq quantization ( #18910 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-05-30 17:34:22 +00:00
5a8641638a
[VLM] Add PP support and fix GPTQ inference for Ovis models ( #18958 )
...
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-30 17:11:44 +00:00
f49239cb45
Benchmark script for fp8 vs bf16 gemm ( #17126 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-30 10:56:11 -06:00
2dbe8c0774
[Perf] API-server scaleout with many-to-many server-engine comms ( #17546 )
2025-05-30 08:17:00 -07:00
84ec470fca
Improve "failed to get the hash of the compiled graph" error ( #18956 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-30 15:00:54 +00:00
b29ca5c4d5
[Docs] Update SECURITY.md with link to our security guide ( #18961 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-30 07:37:27 -07:00
ec6833c5e9
[doc] show the count for fork and watch ( #18950 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-30 06:45:59 -07:00
e1fadf1197
[Feature] minicpm eagle support ( #18943 )
...
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com >
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com >
2025-05-30 06:45:56 -07:00
43ff405b90
[CI/Build] remove regex from build dependencies ( #18945 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-30 04:02:50 -07:00
fba02e3bd1
[Bugfix][TPU] Fix tpu model runner testcase failure ( #18810 )
...
Signed-off-by: Carol Zheng <cazheng@google.com >
2025-05-30 18:04:03 +08:00
4577fc9abb
[Misc]Fix typo ( #18947 )
2025-05-30 02:21:35 -07:00
5f1d0c8118
[Bugfix][Failing Test] Fix test_vllm_port.py ( #18618 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-30 17:13:47 +08:00
c3bb9f2331
[Model] Use in-place adds in SigLIP ( #18922 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-30 17:12:59 +08:00
8f8900cee9
[doc] add mkdocs doc ( #18930 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-30 07:58:44 +00:00
6acb7a6285
[Misc]Fix benchmarks/README.md for speculative decoding ( #18897 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-30 07:58:04 +00:00
4f4a6b844a
[Deprecation] Remove mean pooling default for Qwen2EmbeddingModel ( #18913 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-30 06:53:37 +00:00
4d0a1541be
[Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy ( #18861 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-30 13:37:36 +08:00
77b6e74fe2
[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. ( #18938 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-29 22:33:17 -07:00
5acf828d99
[docs] fix: fix markdown syntax ( #18927 )
2025-05-30 05:20:48 +00:00
3987e2ae96
[Model] Use AutoWeightsLoader for mamba2 ( #18918 )
...
Signed-off-by: iLeGend <824040212@qq.com >
2025-05-30 04:50:10 +00:00
77164dad5e
[Bugfix] Consistent ascii handling in tool parsers ( #18883 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-30 04:44:43 +00:00
3de3eadf5b
improve the robustness of parsing vlms config in AutoRound ( #18894 )
...
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com >
2025-05-29 19:24:47 -07:00
3132290a14
[TPU][CI/CD] Clean up docker for TPU tests. ( #18926 )
...
Signed-off-by: Carol Zheng <cazheng@google.com >
2025-05-30 10:24:19 +08:00
1aa2f81b43
[Misc] Update type annotation for rotary embedding base ( #18914 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-30 10:17:01 +08:00
d54af615d5
[Bugfix] Fix PP default fallback behavior for V1 ( #18915 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-30 10:13:17 +08:00
a1cc9f33a3
[TPU] remove transpose ops in moe kernel ( #18923 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-05-29 23:00:11 +00:00
a521ef06e5
Use standalone_compile by default in torch >= 2.8.0 ( #18846 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-30 06:41:58 +08:00
64eaf5fe05
[P/D] NixlConnector DP fixes ( #18903 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-05-29 18:08:40 +00:00
d1d61f3351
[BugFix] Make DP work with connector-delayed new requests ( #18559 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Will Eaton <weaton@redhat.com >
2025-05-29 18:04:18 +00:00
32ce3cf7c9
[V1] Allocate kv_cache with stride order for V1 ( #18775 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-05-29 17:54:16 +00:00
d58f9c7f7a
[Misc] Remove duplicate init for self.vllm_config ( #18896 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-29 17:26:07 +00:00
c29034037d
[Deprecation] Disallow pos-args other than model when initializing LLM ( #18802 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-29 09:36:58 -07:00
1b7cfd5a36
[ROCm][V0][Attention] Revert to the previous FA triton kernel ( #18226 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-29 12:13:18 -04:00
da4b69d0b4
[Attention][V1] Toggle for v1 attention backend ( #18275 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-29 10:48:24 -04:00
c9479b2920
[Bugfix] Fix the failing gte embedding test ( #18720 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-29 07:39:25 -07:00
6f2909405e
[Doc] Fix codeblocks formatting in LoRA adapters documentation ( #18907 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-29 07:38:55 -07:00
b169d5f7b6
[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. ( #18692 )
...
Signed-off-by: Duyi-Wang <duyi.wang@intel.com >
2025-05-29 20:02:08 +08:00
f8977c233f
Fix an error in dummy weight loading for quantization models ( #18855 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-05-29 03:07:20 -07:00
f274581f44
[BugFix] Update pydantic to fix error on python 3.10 ( #18852 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-05-29 03:05:46 -07:00
0b1447f890
[Bugfix] Ensure tensors are contiguous during serialisation ( #18860 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-29 03:05:20 -07:00
24d0ef8970
[Misc] Replace TODO in serving transcription ( #18895 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-05-29 02:58:14 -07:00
7fcfd954ff
[Bugfix] Fix misleading information in the documentation ( #18845 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-29 02:54:14 -07:00
e740d07f07
[doc] add CLI doc ( #18871 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-29 09:51:36 +00:00
a652e71dd0
[Doc] Remove redundant spaces from compatibility_matrix.md ( #18891 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-05-29 02:51:20 -07:00
34d6c447c4
[LoRA] Add LoRA support for InternVL ( #18842 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-29 08:46:24 +00:00
972eddf7c9
[Neuron] Add multi-LoRA support for Neuron. ( #18284 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-29 16:41:22 +08:00
fd7bb88d72
Fixes a dead link in nightly benchmark readme ( #18856 )
...
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com >
2025-05-29 04:41:39 +00:00
3c49dbdd03
Skip device and quant Pydantic validation to make plugin device work ( #18843 )
...
Signed-off-by: Yikun Jiang <yikunkero@gmail.com >
2025-05-28 20:12:30 -07:00
1661a9c28f
[Doc][Neuron] Update documentation for Neuron ( #18868 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
2025-05-28 19:44:01 -07:00
8e882ffdc0
[Bugfix][TPU] fix moe custom kernel import ( #18853 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-05-28 19:34:19 -07:00
26b4fa45be
Add ability to use CUDAGraphs with use_inductor=False ( #17345 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-29 10:16:52 +08:00
515b413ebf
Prevent the cross-encoder logic from being applied to classification tasks ( #18838 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-28 19:16:17 -07:00
269d901734
[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix ( #18100 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-29 07:21:46 +08:00
7951d78738
[Core] Enable CUDA graphs for DP + All2All kernels ( #18724 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-05-28 22:55:30 +00:00
6dbe5b5c93
Remove checks for None for fields which should never be None ( #17985 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-28 21:32:19 +00:00
643622ba46
[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend ( #15655 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Signed-off-by: xihajun <junfan@krai.ai >
Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk >
Signed-off-by: Jorge de Freitas <jorge@krai.ai >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: xihajun <junfan@krai.ai >
Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk >
Co-authored-by: Jorge de Freitas <jorge@krai.ai >
2025-05-28 19:59:09 +00:00
a09c7ca9f2
[Chore][Spec Decode] Update check NoneType instead of assigning variables ( #18836 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-28 18:57:19 +00:00
0e98964e94
[V1][Metrics] Remove metrics that were deprecated in 0.8 ( #18837 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-28 18:54:12 +00:00
c68b5c63eb
[Misc] fix olmoe model layer can't laod in tp gt 1 ( #18828 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-05-28 17:36:21 +00:00
fced756923
[Chore] update ty configuration ( #18839 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-28 08:59:11 -07:00
321331b8ae
[Core] Add Lora Support to Beam Search ( #18346 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-05-28 08:58:24 -07:00
6e4cea1cc5
decrement server_load on listen for disconnect ( #18784 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2025-05-28 22:15:12 +08:00
435fa95444
[Frontend] add run batch to CLI ( #18804 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-28 07:08:57 -07:00
4c2b38ce9e
Enable Pydantic mypy checks and convert configs to Pydantic dataclasses ( #17599 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-28 12:46:04 +00:00
d781930f90
[Platform][Dist] Make torch distributed process group extendable ( #18763 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-28 10:52:34 +00:00
ce75efeecb
[BugFix] FA2 MLA Accuracy Issue ( #18807 )
...
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com >
2025-05-28 08:59:39 +00:00
aa42561e40
Fix PiecewiseCompileInterpreter ( #17338 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-28 08:40:53 +00:00
de65fc8e1e
[CI] improve embed testing ( #18747 )
2025-05-28 00:16:35 -07:00
0c492b7824
[Deprecation] Remove fallbacks for Embeddings API ( #18795 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-28 15:09:04 +08:00
0f0926b43f
[Deprecation] Remove unused sync methods in async_timeout ( #18792 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-28 15:08:48 +08:00
7f2c1a87e9
[Deprecation] Require overriding get_dummy_text and get_dummy_mm_data ( #18796 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-28 15:08:35 +08:00
b78f844a67
[Bugfix][FailingTest]Fix test_model_load_with_params.py ( #18758 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-28 05:42:54 +00:00
5e13c07d00
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (2) ( #18781 )
...
Signed-off-by: Ronald Xu <ronaldxu@amazon.com >
2025-05-28 05:09:14 +00:00
774c5fde30
[V1] fix torch profiling for V1 offline scenarios ( #18445 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-05-28 04:16:30 +00:00
9a21e331ff
[Bugfix]: correctly propagate errors message caught at the chat_templating step to the client ( #18769 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-05-28 03:35:43 +00:00
3e9ce609bd
[Bugfix] Fix nomic max_model_len ( #18755 )
2025-05-27 20:29:53 -07:00
794ae1f551
[rocm] Fix wrong attention log ( #18764 )
...
Signed-off-by: Felix Marty <felmarty@amd.com >
2025-05-27 19:45:41 -07:00
d73a9457a5
[Core] Improve Tensor serialisation ( #18774 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-28 09:46:21 +08:00
a3896c7f02
[Build] Fixes for CMake install ( #18570 )
2025-05-27 20:49:24 -04:00
51e98e4ffd
[Bugfix] Disable prefix caching by default for benchmark ( #18771 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-28 08:18:09 +08:00
e56f44d9ec
Support datasets in vllm bench serve and sync with benchmark_[serving,datasets].py ( #18566 )
2025-05-27 19:59:48 -04:00
e0cbad4e30
[Neuron] Support quantization on neuron ( #18283 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-27 22:10:33 +00:00
b48d5cca16
[CI/Build] [TPU] Fix TPU CI exit code ( #18282 )
...
Signed-off-by: Carol Zheng <cazheng@google.com >
2025-05-27 14:54:59 -07:00
5873877241
[Bugfix] Mistral tool calling when content is list ( #18729 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-27 09:05:37 -07:00
696259ca01
[Core] Automatically cast multi-modal input dtype ( #18756 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 23:45:48 +08:00
6b6d496114
optimize get_kv_cache_torch_dtype ( #18531 )
...
Signed-off-by: idellzheng <idellzheng@tencent.com >
2025-05-27 13:08:44 +00:00
aaa4ac1c95
Disable prefix cache by default for benchmark ( #18639 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-27 20:06:34 +08:00
06a0338015
[V1][Metrics] Add API for accessing in-memory Prometheus metrics ( #17010 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-27 09:37:06 +00:00
4318c0559d
[CI/Build] Remove imports of built-in re ( #18750 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 09:19:18 +00:00
a68e293cb9
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking ( #18663 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-27 01:44:20 -07:00
6881107948
[BUG FIX] minicpm ( #18739 )
...
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com >
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com >
2025-05-27 01:04:49 -07:00
e0f0ff87b8
[Build] fix cpu build missing libtbbmalloc.so ( #18744 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-05-27 01:03:56 -07:00
c24b1572ac
Minor fix about MooncakeStoreConnector ( #18721 )
...
Signed-off-by: baoloongmao <baoloongmao@tencent.com >
2025-05-27 08:02:28 +00:00
4693a3438c
[Doc] cleanup deprecated flag for doc ( #18715 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-27 07:12:02 +00:00
bbd9a84dc5
[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh ( #18752 )
...
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai >
2025-05-27 00:10:26 -07:00
a547aeb828
feat(rocm-support): support mamba2 on rocm ( #18565 )
...
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai >
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai >
2025-05-27 00:07:53 -07:00
fc6d0c290f
[Misc] improve docs ( #18734 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-27 07:07:01 +00:00
753944fa9b
[Doc] Update reproducibility doc and example ( #18741 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 07:03:13 +00:00
25a817f202
[Doc] Update OOT model docs ( #18742 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 06:30:31 +00:00
d260f799a9
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. ( #18271 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-26 23:14:07 -07:00
b50602d5f0
[Model][Gemma3] Cast image pixel values already on CPU ( #18732 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-27 05:42:54 +00:00
1f1b1bc03b
[V1][Quantization] Add CUDA graph compatible v1 GGUF support ( #18646 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-27 04:40:28 +00:00
1f88dbd2bb
[Misc] improve web section group title display ( #18684 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-27 04:35:16 +00:00
0eebd74842
[Model][Gemma3] Simplify image input validation ( #18710 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-27 11:13:37 +08:00
27bebcd897
Convert examples to ruff-format ( #18400 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-26 16:57:54 +00:00
e7523c2e03
[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs ( #18608 )
2025-05-26 11:49:36 -04:00
a869baca73
[Bugfix] Fix Llama GGUF initialization ( #18717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:49:22 -07:00
82e2339b06
[Doc] Move examples and further reorganize user guide ( #18666 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:38:04 -07:00
9553fdb41e
[Doc] Improve API docs ( #18713 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:33:34 -07:00
243eb9199f
[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM ( #18701 )
2025-05-26 07:10:56 -07:00
0665e29998
[Misc] add AutoGen integration ( #18712 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-26 13:56:18 +00:00
e76be06550
[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI ( #18709 )
...
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai >
2025-05-26 05:26:07 -07:00
0877750029
[CI/Build] Split pooling and generation extended language models tests in CI ( #18705 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-26 04:00:08 -07:00
6d68030f1c
[Model] Add support for YARN in NemotronNAS models ( #18427 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com >
2025-05-26 10:31:49 +00:00
5a2c76cbe1
[CI] fix dump_input for str type ( #18697 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-26 18:23:35 +08:00
38b13dfe78
[CI/Build] Replace math.isclose with pytest.approx ( #18703 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 02:05:17 -07:00
61a45e7a72
[Bugfix] Fix Mistral-format models with sliding window ( #18693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 01:44:04 -07:00
65523a0995
[Doc] Fix issue template format ( #18699 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 00:45:39 -07:00
4b7740a105
[GH] Add issue template for reporting CI failures ( #18696 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 00:42:04 -07:00
4ea62c0ea0
[CI] add missing argument ( #18694 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-26 00:22:04 -07:00
561b77a0d6
[Bugfix] Fix the lm_head in gpt_bigcode in lora mode ( #6357 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
2025-05-26 14:52:25 +08:00
abd4030d94
refactor: simplify request handler, use positive condition check for handler assignment ( #18690 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-26 06:32:28 +00:00
8820821b59
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example ( #18644 )
...
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com >
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com >
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com >
2025-05-26 13:51:27 +08:00
fba0642704
[CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage ( #18683 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-05-25 20:27:50 -07:00
6071e989df
[Core][Multimodal] Convert PIL Image to array without data copy when hashing ( #18682 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-25 17:33:35 +00:00
57fd13a707
[Bugfix] Fix profiling dummy data for Pixtral ( #18677 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-25 14:05:30 +00:00
3a886bd58c
[Misc] small improve ( #18680 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 06:05:38 -07:00
35be8fad62
[CI/build] fix no regex ( #18676 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 10:10:51 +00:00
f2faac745d
[Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment ( #18674 )
...
Signed-off-by: zzzyq <zhangyuqi94@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-25 02:36:06 -07:00
279f854519
[doc] improve readability ( #18675 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 01:40:31 -07:00
624b77a2b3
[doc] fix broken links ( #18671 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 01:36:33 -07:00
503f8487c2
[Misc] Reduce logs on startup ( #18649 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 23:03:53 -07:00
44073a7ac3
[BUGFIX] catch subclass first for try...except ( #18672 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-25 05:34:24 +00:00
63934543a0
Speed up the kernels/quantization/ tests ( #18669 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-25 05:02:59 +00:00
75f81750f3
[VLM] Initialize video input support for InternVL models ( #18499 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-25 04:51:25 +00:00
6ab681bcbe
[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE ( #18655 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-05-25 04:51:21 +00:00
cebc22f3b6
[Misc]Replace cuda hard code with current_platform in Ray ( #14668 )
...
Signed-off-by: noemotiovon <757486878@qq.com >
2025-05-24 20:26:31 -07:00
6c6dcd8611
[MISC] correct signature for LoaderFunction ( #18670 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-24 20:17:47 -07:00
7891fdf0c6
[V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... ( #18640 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-05-24 20:07:20 -07:00
6825d9a998
[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding ( #18668 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-24 17:33:46 -07:00
b554ab736e
[CI/Build] fix permission denied issue ( #18645 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-24 16:09:10 +00:00
9ea7f1abf3
fix(regression): clone from reference items ( #18662 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-24 15:25:20 +00:00
2807271c86
[CI] enforce import regex instead of re ( #18665 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-24 08:04:14 -07:00
b9018a3f9f
[BugFix] Fix import error for fused_moe ( #18642 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-05-24 07:53:36 -07:00
4ceafb6299
[MISC] typo fix and clean import ( #18664 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-24 07:52:09 -07:00
2e6705784f
[CI/Build] chmod +x to cleanup_pr_body.sh ( #18650 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 07:26:45 -07:00
1cb194a018
[Doc] Reorganize user guide ( #18661 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 07:25:33 -07:00
2cd4d58df4
[Model] use AutoWeightsLoader for gpt2 ( #18625 )
...
Signed-off-by: zt2370 <ztang2370@gmail.com >
2025-05-24 13:36:13 +00:00
6d166a8d35
[Doc] Add community links ( #18657 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 06:06:38 -07:00
ef1dd6870f
[Doc] Fix indentation problems in V0 Paged Attention docs ( #18659 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 06:06:35 -07:00
e77dc4bad8
[MISC][pre-commit] Add pre-commit check for triton import ( #17716 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-24 20:09:15 +08:00
07458a51ce
[Doc] Update README links, mark external links ( #18635 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 09:57:15 +00:00
c1e4a4052d
[V1][Spec Decode] Support multi-layer eagle draft model ( #18030 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-05-24 09:45:34 +00:00
a859320575
[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) ( #18647 )
2025-05-24 09:15:36 +00:00
441dc63ac7
[Frontend] improve vllm serve --help display ( #18643 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-24 07:53:22 +00:00
d55e446d13
[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance ( #18424 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-05-24 06:51:22 +00:00
ec82c3e388
FIX MOE issue in AutoRound format ( #18586 )
...
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com >
2025-05-23 22:01:40 -07:00
45ab403a1f
config.py: Clarify that only local GGUF checkpoints are supported. ( #18623 )
...
Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com >
2025-05-24 08:46:34 +08:00
2b10ba7491
[Bugfix][Nixl] Fix Preemption Bug ( #18631 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-05-23 23:30:16 +00:00
4fc1bf813a
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking ( #18454 )
...
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com >
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com >
2025-05-23 16:16:26 -07:00
f2036734fb
[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation ( #18160 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-05-23 15:52:20 -07:00
7d9216495c
[Doc] Update references to doc files ( #18637 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 15:49:21 -07:00
0ddf88e16e
[CI] Enable test_initialization to run on V1 ( #16736 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 15:09:44 -07:00
1645b60196
Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI ( #18537 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-05-23 21:17:16 +00:00
2628a69e35
[V1] Support Deepseek MTP ( #18435 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-05-23 10:26:28 -07:00
371f7e4ca2
[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar ( #18627 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 10:22:40 -07:00
15b45ffb9a
[Doc] Avoid documenting dynamic / internal modules ( #18626 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 09:58:02 -07:00
273cb3b4d9
[Doc] Fix top-level API links/docs ( #18621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 09:46:56 -07:00
8ddd1cf26a
[Doc] fix list formatting ( #18624 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-23 09:41:17 -07:00
6550114c9c
[v1] Redo "Support multiple KV cache groups in GPU model runner ( #17945 )" ( #18593 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-23 09:39:47 -07:00
9520a989df
[Docs] Change mkdocs to not use directory urls ( #18622 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 09:33:21 -07:00
3d28ad343f
Fix figures in design doc ( #18612 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 09:09:54 -07:00
6a7988c55b
Refactor pplx init logic to make it modular (prepare for deepep) ( #18200 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-05-23 23:43:43 +08:00
022d8abe29
[Doc] Use a different color for the announcement ( #18616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 08:25:03 -07:00
5221815a00
[Doc] Fix markdown list indentation for MkDocs rendering ( #18620 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-23 08:23:21 -07:00
1068556b2c
[Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS ( #18579 )
2025-05-23 07:43:58 -07:00
2cd1fa4556
[Misc] add Haystack integration ( #18601 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-23 06:21:19 -07:00
d4c2919760
Include private attributes in API documentation ( #18614 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 06:18:31 -07:00
6220f3c6b0
[Bugfix] Fix transformers model impl ignored for mixtral quant ( #18602 )
...
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com >
2025-05-23 05:54:13 -07:00
52fb23f47e
Fix examples with code blocks in docs ( #18609 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 05:53:44 -07:00
6dd51c7ef1
[CI/Build] Fix V1 flag being set in entrypoints tests ( #18598 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 05:51:53 -07:00
2edb533af2
Replace {func} with mkdocs style links ( #18610 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 05:51:38 -07:00
38a95cb4a8
[Doc] Fix indent of contributing to vllm ( #18611 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-23 05:50:07 -07:00
cd821ea5d2
[CI] fix kv_cache_type argument ( #18594 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-23 04:49:18 -07:00
7ab056c273
[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to requirements/cpu.txt ( #18542 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-05-23 04:38:42 -07:00
6526e05111
Add myself as docs code owner ( #18605 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 04:08:31 -07:00
e493e48524
[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled ( #17731 )
...
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-05-23 03:38:23 -07:00
4ce64e2df4
[Bugfix][Model] Fix baichuan model loader for tp ( #18597 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-23 02:39:05 -07:00
fbb13a2c15
Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal ( #18034 )" ( #18600 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 02:18:22 -07:00
a1fe24d961
Migrate docs from Sphinx to MkDocs ( #18145 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 02:09:53 -07:00
d0bc2f810b
[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform ( #18430 )
...
Signed-off-by: Yuqi Zhang <yuqizhang@google.com >
Co-authored-by: Yuqi Zhang <yuqizhang@google.com >
2025-05-23 01:41:37 -07:00
b046cf792d
[Feature][V1]: suupports cached_tokens in response usage ( #18149 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu >
2025-05-23 01:41:03 -07:00
54af915949
[Doc] Update quickstart and install for cu128 using --torch-backend=auto ( #18505 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 08:36:37 +00:00
71ea614d4a
[Feature]Add async tensor parallelism using compilation pass ( #17882 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-23 01:03:34 -07:00
4c611348a7
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal ( #18034 )
...
Signed-off-by: Ronald Xu <ronaldxu@amazon.com >
2025-05-23 00:37:18 -07:00
60cad94b86
[Hardware] correct method signatures for HPU,ROCm,XPU ( #18551 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-22 22:31:59 -07:00
9c1baa5bc6
[Misc] Replace cuda hard code with current_platform ( #16983 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-05-23 04:38:50 +00:00
4be2255c81
[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key ( #17291 )
...
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com >
2025-05-23 12:30:47 +08:00
ed5d408255
[Neuron] Remove bypass on EAGLEConfig and add a test ( #18514 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
2025-05-22 21:26:32 -07:00
583507d130
[Spec Decode] Make EAGLE3 draft token ID mapping optional ( #18488 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-22 20:17:39 -07:00
e44d8ce8c7
[Bugfix] Set KVTransferConfig.engine_id in post_init ( #18576 )
...
Signed-off-by: Linkun Chen <github@lkchen.net >
2025-05-23 02:54:42 +00:00
93ecb8139c
[BugFix] Increase TP execute_model timeout ( #18558 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-23 10:22:11 +08:00
fae453f8ce
[Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs ( #18482 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-23 10:15:32 +08:00
4b0da7b60e
Enable hybrid attention models for Transformers backend ( #18494 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 10:12:08 +08:00
c6b636f9fb
[V1][Spec Decoding] Use model_loader.get_model() to load models ( #18273 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-23 02:05:44 +00:00
04eb88dc80
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. ( #18569 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-05-23 01:59:18 +00:00
46791e1b4b
[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh ( #18568 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-05-22 18:45:35 -07:00
c32e249a23
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization ( #17926 )
...
Signed-off-by: Sanger Steel <sangersteel@gmail.com >
2025-05-22 18:44:18 -07:00
c91fe7b1b9
[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser ( #17917 )
...
Signed-off-by: Kai Wu <kaiwu@meta.com >
2025-05-22 16:44:08 -07:00
a04720bc36
[V1][Spec Decode][Bugfix] Load quantize weights for EAGLE ( #18290 )
2025-05-22 15:17:33 -07:00
7b9d832c80
[Tool] Add NIXL installation script ( #18172 )
...
Signed-off-by: Linkun <github@lkchen.net >
2025-05-22 14:33:16 -07:00
6e588da0f4
[Build/CI] Fix CUDA 11.8 build ( #17679 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-22 12:13:54 -07:00
f8d2cc5f55
[Compile][Platform] Make PiecewiseBackend pluggable and extendable ( #18076 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-05-22 12:11:53 -07:00
721fb9b181
[Platform] Move platform check to right place ( #18470 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-05-22 12:11:28 -07:00
1f3a1200e4
[Bugfix] make test_openai_schema.py pass ( #18224 )
...
Signed-off-by: David Xia <david@davidxia.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-22 18:34:06 +00:00
54631f8262
[Misc] Call ndarray.tobytes() directly instead of ndarray.data.tobytes() ( #18347 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-22 09:00:13 -07:00
cb506ecb5a
[Misc] improve Automatic Prefix Caching example ( #18554 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-22 14:50:46 +00:00
93f71673ce
[BugFix][CPU] Fix x86 SHM distributed module initialization ( #18536 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-05-22 07:35:00 -07:00
3f505233fd
[Doc] Add stream flag for chat completion example ( #18524 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-22 14:07:10 +00:00
4e04eceb58
[Bugfix] Use random hidden states in dummy sampler run ( #18543 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com >
2025-05-22 06:48:56 -07:00
71075029f2
[Doc] Support --stream arg in openai_completion_client.py script ( #18388 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-22 13:20:17 +00:00
ca86a7cf6e
[CI/Build] Update bamba test model location ( #18544 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-22 06:01:07 -07:00
a35a494745
[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible ( #18513 )
...
Signed-off-by: Linkun <github@lkchen.net >
2025-05-22 05:24:43 -07:00
f6037d1907
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18526 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-22 05:22:53 -07:00
fa72f9a812
Order sequence ids + config update to support specifying custom quantization layers ( #18279 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Tailin Pan <tailinpa@amazon.com >
Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com >
Co-authored-by: Yishan McNabb <yishanm@amazon.com >
Co-authored-by: Patrick Lange <patlange@amazon.com >
Co-authored-by: Maxwell Goldberg <mgld@amazon.com >
Co-authored-by: Aakash Shetty <sheaak@amazon.com >
2025-05-22 02:20:36 -07:00
ebed81fbf5
Update default neuron config for speculation ( #18274 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com >
Co-authored-by: Aakash Shetty <sheaak@amazon.com >
2025-05-22 02:18:55 -07:00
e2d7d31244
[Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) ( #18512 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-22 02:17:34 -07:00
23b67b37b2
[Doc] Fix invalid JSON in example args ( #18527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-22 07:11:46 +00:00
db5a29ba19
[Bugfix] Fix LoRA test ( #18518 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-21 21:48:53 -07:00
51797775c3
[Bugfix][Model] Make Olmo2Model weight loading return loaded weights ( #18504 )
...
Signed-off-by: Shane A <shanea@allenai.org >
2025-05-21 21:17:03 -07:00
cf5984b2fe
[BugFix][DP] Send DP wave completion only from dp_rank==0 ( #18502 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com >
2025-05-21 20:25:25 -07:00
d022115cc6
[Bugfix] Inconsistent token calculation compared to HF in llava family ( #18479 )
...
Signed-off-by: jaycha <jaycha@ncsoft.com >
2025-05-21 20:21:47 -07:00
acb54ca8e1
Intialize io_thread_pool attribute in the beginning. ( #18331 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-21 20:21:14 -07:00
6e0fd34d3c
[CI] Fix race condition with StatelessProcessGroup.barrier ( #18506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-21 20:19:13 -07:00
176d62e4ea
[MISC] update project urls in pyproject.toml ( #18519 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-21 20:17:34 -07:00
20bd6f4d2e
[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) ( #18500 )
...
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae >
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae >
2025-05-21 19:23:59 -07:00
1f079540db
[Bugfix] Consistent ascii handling in tool parsers ( #17704 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com >
2025-05-21 20:41:23 +00:00
94d8ec8d2b
[FEAT][ROCm] Upgrade AITER MLA v1 backend ( #18338 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-05-21 10:34:28 -07:00
bb0a311213
Revert "[v1] Support multiple KV cache groups in GPU model runner ( #17945 ) ( #18459 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-21 10:25:23 -07:00
dd5fa7e04f
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 ( #17004 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com >
2025-05-21 08:35:00 -07:00
2b16104557
[Misc] Update deprecation message for --enable-reasoning ( #18404 )
2025-05-21 07:33:11 -07:00
371376f996
[Build] fix Dockerfile shell ( #18402 )
2025-05-21 07:32:06 -07:00
c6c10ca920
[Bugfix] Reduce moe_sum test size to avoid OOM ( #18484 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-05-21 06:46:39 -07:00
c154d89306
[Doc] fix arg docstring in linear layers ( #18410 )
...
Signed-off-by: giantcroc <1204449533@qq.com >
2025-05-21 06:45:57 -07:00
eca18691d2
[MODEL] FalconH1 ( #18406 )
...
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae >
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae >
2025-05-21 04:59:06 -07:00
61acfc45bc
[Bugfix][Failing Test] Fix test_events.py ( #18460 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-21 04:57:28 -07:00
107f5fc4cb
[Misc] refactor disaggregated-prefill-v1 example ( #18474 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-21 11:10:14 +00:00
907f935de9
[V1] Fix general plugins not loaded in engine for multiproc ( #18326 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-21 01:21:49 -07:00
5d7f545204
[Frontend] deprecate --device arg ( #18399 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-05-21 01:21:17 -07:00
cd8dfc6dfc
[Misc] MultiConnector._connectors type ( #18423 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-05-20 22:48:43 -07:00
d06dd72ba9
[Bugfix][Failing Test] Fix nixl connector test when promt size < block size ( #18429 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-05-20 22:41:44 -07:00
ad0012a0ac
Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18407 )" ( #18456 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-20 22:39:22 -07:00
92247c522e
[Bug] Fix moe_sum signature ( #18440 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-05-20 22:37:08 -07:00
0c15c2e486
[Bugfix] config.head_dim is now explicitly set to None ( #18432 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-20 21:04:33 -07:00
3b17ea26e4
[TPU] Re-enable the Pallas MoE kernel ( #18025 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-05-20 19:52:27 -07:00
23baa2180b
fix:Build torch wheel inline rather than picking from nightly ( #18351 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
2025-05-20 22:22:24 +00:00
980a172474
[Kernel] update comment for KV shape in unified triton attn ( #18099 )
...
Signed-off-by: haochengxia <xhc_1007@163.com >
2025-05-20 11:19:34 -07:00
e1f5a71ed7
[Model] use AutoWeightsLoader for bloom ( #18300 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-20 09:40:05 -07:00
f4a8a37465
[Minor] Rename quantization nvfp4 to modelopt_fp4 ( #18356 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-20 09:08:37 -07:00
8f55962a7f
[Misc] refactor prompt embedding examples ( #18405 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-20 15:26:12 +00:00
be48360c1f
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18407 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
2025-05-20 06:59:48 -07:00
86847700d7
[CI] Add mteb testing to test the accuracy of the embedding model ( #17175 )
2025-05-20 06:51:12 -07:00
d6c86d09ae
Update cpu.txt ( #18398 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-05-20 10:53:23 +00:00
6b35cb10a0
[Misc] Add LoRA code owner ( #18387 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-20 03:27:30 -07:00
1b1e8e05ff
[doc] update env variable export ( #18391 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-20 08:53:27 +00:00
bca55b556f
[Bugfix] fix adding bias twice in ipex GPTQ quantization ( #18363 )
...
Signed-off-by: rand-fly <randfly@outlook.com >
2025-05-20 00:54:33 -07:00
d981396778
[release] Change dockerhub username for TPU release ( #18389 )
2025-05-19 23:49:23 -07:00
9609327fa4
[Core] [Bugfix]: tensor parallel with prompt embeds ( #18171 )
...
Signed-off-by: Nan2018 <nan@protopia.ai >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
2025-05-19 20:21:27 -07:00
f07a673eb2
[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name ( #18358 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-19 20:20:12 -07:00
d565e0976f
[neuron] fix authorization issue ( #18364 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-05-19 23:30:32 +00:00
258bf621d5
fix CUDA_check redefinition in #17918 ( #18287 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-05-19 13:42:35 -07:00
dc1440cf9f
Neuron up mistral ( #18222 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-19 09:54:47 -07:00
8171221834
[Misc] Fix typo ( #18330 )
2025-05-19 09:51:01 -07:00
7937c2fd52
Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup ( #18337 )
2025-05-19 09:49:57 -07:00
e2ee1e8e9e
[Feature]Add support for models quantized with AutoRound ( #17850 )
...
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com >
2025-05-19 09:38:53 -07:00
20d8ce81eb
[Frontend] add --quick option for vllm chat/complete ( #18297 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-19 09:36:13 -07:00
84ab4feb7e
[Doc] Fix typo ( #18355 )
2025-05-19 16:05:16 +00:00
6781af5608
[Quantization] Pool model support bitsandbytes ( #18087 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-19 09:03:43 -07:00
1b15df2546
[BugFix] Fix handling of num_computed_tokens with connector ( #18232 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-05-19 09:03:25 -07:00
43b5f61dce
[Doc] Move input-related docs to Features ( #18353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-19 15:08:39 +00:00
c5bb0ebdc6
[Doc] Fix prompt embedding examples ( #18350 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-05-19 06:48:16 -07:00
d637b96099
[BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS ( #18319 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
Signed-off-by: shaoyuyoung <shaoyuyoung@gmail.com >
Co-authored-by: cascade <cascade812@outlook.com >
2025-05-19 01:31:23 -07:00
275c5daeb0
fix: Add type specifications for CLI arguments in tensorizer options ( #18314 )
2025-05-18 23:42:17 -07:00
47fda6d089
[Build] Supports CUDA 12.6 and 11.8 after Blackwell Update ( #18316 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-05-18 23:19:33 -07:00
27d0952600
[Misc] extract parser.parse_args() ( #18323 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-19 04:06:26 +00:00
221cfc2fea
Feature/vllm/input embedding completion api ( #17590 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: Nan2018 <nan@protopia.ai >
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com >
Co-authored-by: Bryce1010 <bryceyx@gmail.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-18 20:18:05 -07:00
9da1095daf
[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa ( #18175 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-05-18 19:49:46 -07:00
d1211f8794
[Doc] Add doc to explain the usage of Qwen3 thinking ( #18291 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-05-18 23:04:07 +00:00
b6a6e7a529
[Misc] add litellm integration ( #18320 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-18 15:32:30 +00:00
4fb349f66a
Fix copy-paste error in phi4mm image processing ( #18315 )
...
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com >
2025-05-18 07:00:12 -07:00
908733aca7
[Model] Use sigmoid for single-label classification ( #18313 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-05-18 07:00:09 -07:00
1a8f68bb90
[doc] update reasoning doc ( #18306 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-18 06:59:14 -07:00
9ab2c02ff8
Support sequence parallelism combined with pipeline parallelism ( #18243 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-17 22:47:25 +00:00
66e63e86ec
[MISC] fix typo ( #18305 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-17 10:52:09 -07:00
9214e60631
[Model] use AutoWeightsLoader for solar ( #18113 )
2025-05-17 00:24:17 -07:00
f880d42582
Fixed build on ppc64le due to openssl conflicts ( #18262 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2025-05-17 00:23:46 -07:00
dcfe95234c
Update Dockerfile to build for Blackwell ( #18095 )
2025-05-17 00:23:25 -07:00
48ac2bed5b
[Hardware][TPU] Optionally import for TPU backend ( #18269 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
Co-authored-by: Carol Zheng <cazheng@google.com >
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com >
Co-authored-by: Hongmin Fan <fanhongmin@google.com >
2025-05-17 15:23:12 +08:00
3e0d435027
[P/D][V1] Support dynamic loading of external KV connector implementations ( #18142 )
...
Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
2025-05-17 06:40:39 +00:00
4ee4826ede
[BugFix] Correct max_model_len derivation from config.json for Mistral format ( #17937 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
Co-authored-by: tracelogfb <48808670+tracelogfb@users.noreply.github.com >
Co-authored-by: Stephen Chen <tracelog@meta.com >
2025-05-17 04:20:13 +00:00
60017dc841
[Misc] reformat the collect-env output ( #18285 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-16 19:46:18 -07:00
55f1a468d9
Move cli args docs to its own page ( #18228 ) ( #18264 )
...
Signed-off-by: Trevor Royer <troyer@redhat.com >
2025-05-16 19:43:45 -07:00
fd195b194e
[V1][P/D] Local attention optimization for NIXL ( #18170 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-16 21:16:33 -04:00
fabe89bbc4
[Spec Decode] Don't fall back to V0 when spec decoding is enabled ( #18265 )
2025-05-16 16:10:27 -07:00
e73b7dfd69
[Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order ( #18245 )
2025-05-16 16:02:44 -07:00
7fdfa01530
[Sampler] Adapt to FlashInfer 0.2.3 sampler API ( #15777 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-05-16 15:14:03 -07:00
aef94c6d07
[CI] Assign reviewer to mergify with changes to Tensorizer files ( #18278 )
2025-05-16 12:04:14 -07:00
0ceaebf87b
[BugFix] Fix ordering of KVConnector finished send/rcv sets ( #18211 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-16 09:20:54 -07:00
1db4f47f81
[BugFix] Fix multi async save in MultiConnector ( #18246 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-16 08:13:47 -07:00
d3d91b6f71
[Misc][MacOS] fix bfloat16 error ( #18249 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-16 15:05:59 +00:00
87d871470d
[Model] Use autoweightloader for dbrx ( #18251 )
...
Signed-off-by: learner0810 <zhongjun.li@daocloud.io >
2025-05-16 07:54:13 -07:00
a5f8c111c2
[Fix] Fix typo in resolve_hf_chat_template ( #18259 )
...
Signed-off-by: Felix Marty <felmarty@amd.com >
2025-05-16 14:52:41 +00:00
e23564cb70
use ceil_div in cutlass block scaling shape check ( #17918 )
2025-05-16 03:02:58 -07:00
390ec88905
[Misc] Consolidate Audio tests into multimodal common generation tests ( #18214 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-16 09:18:08 +00:00
541817670c
[Misc] Add Ray Prometheus logger to V1 ( #17925 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-05-16 01:02:42 -07:00
67da5720d4
[PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding ( #17973 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai >
2025-05-15 23:31:02 -07:00
5c04bb8b86
[doc] fix multimodal example script ( #18089 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-16 06:05:34 +00:00
3d2779c29a
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 ( #17827 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-05-15 22:28:27 -07:00
6b31c84aff
Throw better error for when running into k8s service discovery issue ( #18209 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-05-15 21:07:28 -07:00
b18201fe06
Allow users to pass arbitrary JSON keys from CLI ( #18208 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-15 21:05:34 -07:00
f4937a51c1
[Model] vLLM v1 supports Medusa ( #17956 )
...
Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com >
Signed-off-by: skylee-01 <497627264@qq.com >
Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com >
2025-05-15 21:05:31 -07:00
ee659e3b60
[Bugfix][ROCm] Use chunked_prefill_paged_decode as fallback for V1 attention on ROCm ( #18093 )
...
Signed-off-by: kf <kuanfu.liu@embeddedllm.com >
2025-05-15 19:30:17 -07:00
4e1c6a0264
[Bugfix] fix rotary embedding test for _get_padded_tensor_shape ( #18229 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-16 01:32:45 +00:00
c7852a6d9b
[Build] Allow shipping PTX on a per-file basis ( #18155 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-15 16:41:55 -07:00
8795eb9975
[Bugfix] Fix test_eagle test ( #18223 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-05-15 15:59:42 -07:00
0b34593017
Adding "AMD: Tensorizer Test" to amdproduction. ( #18216 )
2025-05-15 11:01:25 -07:00
e3f3aee6f4
[Misc] Avoid cuda graph log when sizes still match ( #18202 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-05-15 09:59:38 -07:00
92540529c0
[Bugfix] [ROCm]: Remove assertion logic when using AITER fused moe in unquantizedMethod to reenable LLama4 BF16 ( #18205 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-15 09:53:18 -07:00
fadb8d5c2d
[Bugfix]Change the exception thrown by call_hf_processor from RuntimeError to ValueError ( #18181 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com >
2025-05-15 09:01:47 -07:00
2aa5470ac5
[Frontend] Fix chat template content format detection ( #18190 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com >
2025-05-15 09:00:21 -07:00
51ff154639
Improve examples rendering in docs and GitHub ( #18203 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-15 15:57:49 +00:00
566ec04c3d
Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline ( #18106 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-15 08:49:23 -07:00
01c22335ba
[Kernel] [V1] Fix performance regression for triton unified attention ( #18161 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-15 06:39:00 -07:00
451da4bcbd
add tools into TokenizeChatRequest ( #18187 )
...
Signed-off-by: yangxia <yangxiast@gmail.com >
2025-05-15 04:01:49 -07:00
07ad27121f
Update deprecated type hinting in model_loader ( #18130 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-15 04:00:21 -07:00
a9944aabfa
fix: typos ( #18151 )
...
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com >
2025-05-15 02:16:15 -07:00
a8f5aec20a
[V1] Update zmq socket creation in nixl connector ( #18148 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-14 23:17:57 -07:00
de71fec81b
[CI] don't skip fixed test_kv_cache_events() ( #18183 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-14 23:17:16 -07:00
70f8b96724
[Bugfix] Fix FusedMoEPrepareAndFinalize for cuda-disalike backends ( #18178 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-14 23:16:31 -07:00
dd2a94596a
[Model] Allow the use of sliding window in Qwen2 ( #17772 )
...
Signed-off-by: inkcherry <mingzhi.liu@intel.com >
2025-05-14 22:29:38 -07:00
420caf7557
[UT] Add ut for none hash ( #17892 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-15 13:28:11 +08:00
4f07a64075
Support custom implementations of VideoLoader backends. ( #18091 )
2025-05-15 13:26:49 +08:00
e6b8e65d2d
[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 ( #18013 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-15 13:26:34 +08:00
26d0419309
Update deprecated type hinting in models ( #18132 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-14 22:06:50 -07:00
83f74c698f
[Fix][ROCm] Enforce eager for all encoder-decoder models on ROCm ( #18154 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-05-14 22:04:43 -07:00
2dff093574
[Misc] add lobe-chat support ( #18177 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-15 05:02:23 +00:00
afe3236e90
[Chore] astral's ty ( #18116 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-15 05:00:43 +00:00
65334ef3b9
[V1][Metrics] Remove unused code ( #18158 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-14 20:13:17 -07:00
e60f550b38
[v1] Support multiple KV cache groups in GPU model runner ( #17945 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-14 18:54:54 -07:00
f25e0d1125
[Bugfix]: make most of test_openai_schema.py pass ( #17664 )
2025-05-14 17:04:35 -07:00
09f106a91e
Upload vllm index for the rc builds ( #18173 )
2025-05-14 16:35:56 -07:00
2142035b51
[V1] Support multiple kv connectors ( #17564 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-05-14 16:28:02 -07:00
78aa341d12
[CI] Fix race condition in test_kv_cache_events test ( #18169 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-14 16:27:48 -07:00
7974736740
Add support for loading torchao models with AOPerModuleConfig ( #17826 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-05-14 16:24:59 -07:00
2fc9075b82
[V1] Structured Outputs + Thinking compatibility ( #16577 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-05-14 15:45:24 -07:00
d93c976a0d
[Kernel] Have rotary embeddings support tensors ( #18046 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-14 15:43:55 -07:00
749f792553
[Frontend] decrease import time of vllm.multimodal ( #18031 )
...
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com >
2025-05-14 15:43:32 -07:00
856865008e
[CI] Disable Failing Tests ( #18165 )
2025-05-14 13:49:56 -07:00
f9c069c85e
Modularize fused experts and integrate PPLX kernels ( #15956 )
2025-05-14 13:11:54 -07:00
418d2f8bfb
[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model ( #17326 )
...
Co-authored-by: root <root@ekagra-8xh100.us-east5-a .c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-14 12:31:46 -07:00
964472b966
[Doc] Update prefix cache metrics to counting tokens ( #18138 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-14 15:23:30 +00:00
59dd311cf5
[KVConnector] Keep KVTransferParams as a dict ( #18033 )
2025-05-14 08:05:57 -07:00
d066e52013
[Bugfix] Fix chat utils tests ( #18139 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-14 05:38:21 -07:00
c8ea982d9b
Update deprecated type hinting in platform, plugins, triton_utils, vllm_flash_attn ( #18129 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-14 05:28:16 -07:00
dc372b9c8a
Update deprecated type hinting in vllm/device_allocator and vllm/distributed ( #18126 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-14 04:07:57 -07:00
9b5b39b650
Update deprecated type hinting in vllm/lora ( #18128 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-14 03:57:59 -07:00
9ccc6ded42
[doc] add missing import ( #18133 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-14 10:57:34 +00:00
d62a076e84
[Model] GritLM supports other attention backends ( #18109 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-14 03:33:19 -07:00
259127f8b8
[Bugfix] Fix LoRA test ( #18123 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-14 10:25:47 +00:00
612c2edb4f
[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support ( #17110 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-14 03:03:11 -07:00
38fe728d60
[Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile ( #17844 )
...
Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai >
2025-05-14 09:39:51 +00:00
82e7f9bb03
[Misc] replace does not exist model ( #18119 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-05-14 02:13:47 -07:00
63dc3426e0
[Model] Add packed_modules_mapping for Qwen3-MOE ( #18118 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-14 02:13:19 -07:00
8f5dc41481
[Bugfix] Fix entrypoints audio test failure ( #18111 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-14 09:08:07 +00:00
63ad622233
[New Model]: support GTE NewModel ( #17986 )
2025-05-14 01:31:31 -07:00
e7ef61c1f0
[Bugfix][Example] make lmcache v0 work. ( #18051 )
...
Signed-off-by: Ma, Jianpeng <jianpeng.ma@intel.com >
2025-05-13 23:43:44 -07:00
d4154c35a2
[Bugfix] fix moe marlin topk_weight loading ( #18080 )
...
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-05-13 23:31:57 -07:00
6685890d11
[Fix] Move "model_config" as keyword args in chat_utils.py ( #18098 )
...
Signed-off-by: Linkun <github@lkchen.net >
2025-05-13 23:27:26 -07:00
33011318c2
Fix broken example: examples/offline_inference/profiling at scheduler_config ( #18117 )
2025-05-13 23:19:14 -07:00
4f8b373225
[BugFix][AMD] Compatible patch for AITER lib after 04/20 ( #17912 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com >
2025-05-13 23:05:20 -07:00
7b2f28deba
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm ( #18082 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-05-13 22:13:56 -07:00
2d912fb66f
[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 ( #17955 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-13 22:03:47 -07:00
12e6c0b41c
[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig ( #18086 )
2025-05-13 20:36:17 -07:00
9a2a6357de
[Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models ( #18026 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-13 19:48:33 -07:00
6266c57bae
[core][distributed] add ep group and all2all interface ( #18077 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-05-14 10:46:49 +08:00
754b699cbe
[Bug]: Fix S3 model/tokenizer path resolution ( #18083 )
...
Signed-off-by: Jon Gill <jon@yurts.ai >
2025-05-13 19:34:17 -07:00
6e27c6d86b
[Misc] Remove unused numpy tensor ( #18084 )
...
Signed-off-by: Roger Wang <hey@rogerw.me >
2025-05-13 19:33:40 -07:00
d5af47a149
[P/D] Add some more debug logs to NixlConnector ( #18102 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-13 19:33:03 -07:00
65f0f74b66
[Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile ( #18101 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-05-13 19:33:00 -07:00
176a95c670
[Fix] Support CUDAGraph capture for encoder-decoder on ROCm ( #18104 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-05-13 19:31:42 -07:00
f2ae883b67
[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager ( #18001 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-13 19:09:39 -07:00
40de1ef455
[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature ( #14968 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-13 19:08:20 -07:00
0189a65a2e
[Docs] Expand security doc with firewall info ( #18081 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-13 19:36:00 +00:00
55aa7af994
[V1] DP scale-out (2/N): Decouple engine process management and comms ( #15977 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-13 10:48:21 -07:00
0b217da646
Update deprecated type hinting in vllm/adapter_commons ( #18073 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 08:32:51 -07:00
19324d660c
Update deprecated type hinting in vllm/compilation ( #18072 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 08:32:48 -07:00
fc407a1425
Give auto-merge label workflow permission to add labels to issues ( #18078 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 07:53:13 -07:00
009d9e7590
Convert benchmarks to ruff format ( #18068 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 13:43:29 +00:00
b922c2ebd2
[Bugfix] Fix entrypoints metrics tests ( #18063 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-13 06:42:43 -07:00
00b14e0f16
[CI] set token permissions for pre-commit CI job ( #17729 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 13:38:30 +00:00
54e467e6f8
[CI] Add token permissions for add-ready-label CI job ( #17730 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 13:38:13 +00:00
79a1d25bbd
[CI] Add workflow permissions for helm CI job ( #17727 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 12:49:07 +00:00
9944011b30
[CI] Set token permissions for reminder comment CI job ( #17728 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 12:46:58 +00:00
8c946cecca
Update deprecated type hinting in vllm/transformers_utils ( #18058 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 04:34:37 -07:00
ff334ca1cd
Update deprecated type hinting in vllm/profiler ( #18057 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 04:34:34 -07:00
6223dd8114
Update deprecated type hinting in model_executor/layers ( #18056 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 04:17:23 -07:00
906f0598fc
[doc] add download/list/delete HF model CLI usage ( #17940 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-13 11:15:51 +00:00
cb528d0585
[Fix] check to make sure processor has chat templates ( #18047 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-13 03:04:10 -07:00
98fcba1575
Convert .buildkite to ruff format ( #17656 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 09:28:31 +00:00
23b3134eb5
[Benchmarks] Refactor run_structured_output_benchmarks.sh ( #17722 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-13 01:47:29 -07:00
ea6ae8cb45
[Bugfix] Fix marlin moe fallback logic for llama4 ( #18042 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-13 07:53:28 +00:00
2ff297dce9
[BugFix] Set default random seed to 0 for V1 ( #17929 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-13 07:52:19 +00:00
8dd0671bac
[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP ( #17916 )
...
Signed-off-by: Jin Huang <jinhun@amazon.com >
Co-authored-by: Jin Huang <jinhun@amazon.com >
2025-05-13 15:10:07 +08:00
f0d610a8ae
[v1][KVCacheManager] Avoid full cache hit by controlling max_length ( #17999 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-13 06:50:38 +00:00
e57e4d6e9e
Fix Broken macro for cutlass moe ( #18049 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com >
2025-05-12 23:31:06 -07:00
ee5be834e7
[BugFix] Fix 4-GPU RLHF tests ( #18007 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-12 23:03:55 -07:00
48545728d8
cleanup invalid prints ( #18050 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-12 23:01:57 -07:00
dc1a821768
[Feature][V1] Support tool_choice: required when using Xgrammar as the StructuredOutputBackend. ( #17845 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-12 23:01:31 -07:00
61e0a506a3
[Bugfix] Avoid repeatedly creating dummy data during engine startup ( #17935 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-12 22:40:19 -07:00
1df491c522
[Bugfix] Fixes for new marlin moe usage ( #18017 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-13 03:50:04 +00:00
d8487ef557
[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 ( #13779 )
...
Signed-off-by: Arjun Kathuria <arjun.kathuria8@gmail.com >
2025-05-12 20:36:33 -07:00
c06af9a959
[Misc] Slight spelling modification ( #18039 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-12 20:36:27 -07:00
60f7624334
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support ( #11844 )
2025-05-12 19:52:47 -07:00
f6518b2b48
[ROCm] Skip tests for quantizations incompatible with ROCm ( #17905 )
...
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com >
2025-05-12 18:39:28 -06:00
d67085c2c8
Remove noisy warnings from SchedulerConfig ( #17995 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 00:33:45 +00:00
307939f299
Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 ( #18000 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com >
Co-authored-by: Dipika <dipikasikka1@gmail.com >
2025-05-12 18:07:34 -06:00
9d7ea9dbbf
Update some more deprecated type hinting ( #17998 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-12 23:49:33 +00:00
acee8f48aa
[Model] Support MiMo-7B inference with MTP ( #17433 )
...
Signed-off-by: wp-alpha <wangpeng66@xiaomi.com >
Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com >
2025-05-12 23:25:33 +00:00
f065de4e88
Fix FBGEMM integration ( #18002 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-12 23:02:07 +00:00
dc9905368d
[V1][Spec Decode] Eagle unit tests ( #17350 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-05-12 23:01:17 +00:00
ebab1ac37c
[CI] Make JSON output tests less likely to fail ( #17859 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-12 22:31:54 +00:00
2b0db9b0e2
Enable standard language model for torhc nightly ( #18004 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-05-12 14:00:04 -07:00
195adb47c0
[Chore] Remove unused method ( #18024 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-05-12 13:59:47 -07:00
302f3aca7e
[v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens ( #18003 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-12 13:46:12 -07:00
e9c730c9bd
Enabling "Weight Loading Multiple GPU Test - Large Models" ( #18020 )
2025-05-12 13:05:33 -07:00
289199feb6
[Core] Use platform-agnostic device control for DP engine core ( #17245 )
...
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
2025-05-12 12:09:16 -07:00
b9fd0d7a69
[CI/Build] Fix TPU V1 Test mixed use of & and && across tests ( #17968 )
2025-05-12 12:06:59 -07:00
72a3f6b898
Construct KVTransferConfig properly from Python instead of using JSON blobs without CLI ( #17994 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-12 11:25:33 -07:00
98ea35601c
[Lora][Frontend]Add default local directory LoRA resolver plugin. ( #16855 )
...
Signed-off-by: jberkhahn <jaberkha@us.ibm.com >
2025-05-12 10:39:10 -07:00
d19110204c
[P/D] NIXL Integration ( #17751 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: ApostaC <yihua98@uchicago.edu >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Brent Salisbury <bsalisbu@redhat.com >
2025-05-12 09:46:16 -07:00
05a4324f8e
Initialize the delta tool call fields explicitly ( #17340 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: igmainc <igmainc@icloud.com >
2025-05-12 13:28:58 +00:00
7ea6cb28b2
[Misc] Improve modelscope import error ( #17983 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-12 10:46:45 +00:00
9fbf2bfbd5
Correcting testcases in builkite job for IBM Power ( #17675 )
...
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com >
2025-05-12 08:11:55 +00:00
3a5ea75129
[Feature] Support DeepSeekV3 Function Call ( #17784 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
Signed-off-by: Xu Wenqing <xuwq1993@qq.com >
2025-05-12 00:45:21 -07:00
891b9d33de
[Fix] Benchmark "EngineClient" has no attribute "model_config" ( #17976 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-05-11 22:55:53 -07:00
430783018c
[Bugfix][TPU] Use np array when updating cache slot_mapping ( #17971 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-05-12 12:58:33 +08:00
19a3c78d1f
[Bugfix] Fix pydantic.errors.PydanticUserError ( #17962 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-05-12 12:58:23 +08:00
ada50aa295
[bugfix] fix the wrong parser ( #17958 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-12 04:58:02 +00:00
08bf784078
[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails ( #17623 )
...
Signed-off-by: Jason Cheng <jasoncky96@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-05-12 09:06:10 +08:00
d45fe333fb
[misc] add instructions on how to install nvshmem/pplx/deepep ( #17964 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-05-11 18:02:39 -07:00
021c16c7ca
[Model] Broadcast Ovis2 implementation to fit Ovis1.6 ( #17861 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-11 17:56:30 -07:00
7de18d541b
[BUG] [ROCm] [MLA] Fix variable name bug due to change in variable name in PR #17483 ( #17961 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-11 09:14:30 -07:00
a810b5b088
[BugFix] [ROCm]: Bugfix and handle addition case of input for rocm_aiter_rms_norm ( #17857 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-11 04:17:11 -07:00
009b3d5382
[Misc] not show --model in vllm serve --help ( #16691 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-11 08:47:58 +00:00
e4b8713380
[New Model]: nomic-embed-text-v2-moe ( #17785 )
2025-05-11 00:59:43 -07:00
06c0922a69
[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 ( #17870 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-11 15:58:45 +08:00
cd3edfc908
[Misc] Add compressed-tensors NVFP4A16 emulation support ( #17914 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com >
2025-05-11 15:58:38 +08:00
9cea90eab4
[Frontend] Add /classify endpoint ( #17032 )
...
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com >
2025-05-11 07:57:07 +00:00
d1110f5b5a
[doc] update lora doc ( #17936 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-11 15:56:21 +08:00
8132365b74
[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids ( #17855 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-05-11 00:53:58 -07:00
eea22a56ab
fix amd triton mla path ( #17871 )
2025-05-11 07:53:31 +00:00
9112155283
[Perf] Use small max_num_batched_tokens for A100 ( #17885 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-05-11 07:53:23 +00:00
90d0a74b60
[Bugfix] Add revision to transformers.Auto*.from_pretrained processors ( #17948 )
...
Signed-off-by: Xin Li <xin@centml.ai >
2025-05-11 07:52:44 +00:00
d74e5f37bc
[Kernel] fp4 marlin kernel ( #17687 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-05-10 19:58:49 -07:00
ca66a1674c
[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py ( #17946 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-10 16:14:12 -07:00
950751a987
[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders ( #17483 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-10 16:12:04 -07:00
4c31218f80
[Misc] remove --model from vllm serve usage ( #17944 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-10 13:23:31 +00:00
68311891f5
Don't default construct ModelConfig when default constructing VllmConfig ( #17943 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-10 13:23:00 +00:00
fc4441a4ee
Add missing content type headers to /ping and /health ( #17036 ) ( #17786 )
...
Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-10 07:13:32 +01:00
246e3e0a36
fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn ( #17873 )
...
Co-authored-by: Stephen Chen <tracelog@meta.com >
2025-05-10 10:46:54 +08:00
7042cc96b0
[V1][Spec Decoding] Log accumulated metrics after system goes idle ( #17913 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-09 18:23:07 -07:00
0c0fdae84f
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model ( #16362 )
2025-05-09 16:24:41 -07:00
3b602cdea7
AMD conditional all test execution // new test groups ( #17556 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu >
2025-05-09 15:35:58 -07:00
4b2ed7926a
Improve configs - the rest! ( #17562 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-09 15:18:44 -07:00
7e3571134f
[V1][Spec Decoding] Include bonus tokens in mean acceptance length ( #17908 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-09 13:32:36 -07:00
ea2236bf95
Add option to use torch._inductor.standalone_compile ( #17057 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-09 12:59:04 -07:00
7d4aedae7c
Handle error when str passed to /v1/audio/transcriptions ( #17909 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-09 19:23:59 +00:00
22481fbfa3
Update CT WNA16MarlinMoE integration ( #16666 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-09 13:19:45 -04:00
5c4c08f6f1
[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config ( #17265 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-09 17:16:12 +00:00
c44c384b1c
[Misc] Add references in ray_serve_deepseek example ( #17907 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-05-09 16:59:36 +00:00
85b72cb7b1
Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" ( #17910 )
2025-05-09 08:58:18 -07:00
6e5595ca39
[CI/Build] Automatically retry flaky tests ( #17856 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-09 09:55:17 -06:00
200da9a517
[v1] Move block management logic from KVCacheManager to SpecializedManager ( #17474 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-09 15:25:34 +00:00
9f64e93415
[BugFix][AMD] Compatible patch for latest AITER(05/07/2025) ( #17864 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com >
2025-05-09 08:59:36 -06:00
ec61ea20a8
[Misc] add dify integration ( #17895 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-09 03:42:39 -07:00
c6798baa9c
Change top_k to be disabled with 0 (still accept -1 for now) ( #17773 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-09 10:01:49 +00:00
5b2dcbf0b8
Fix Whisper crash caused by invalid`` max_num_batched_tokens`` config ( #17853 )
...
Signed-off-by: inkcherry <mingzhi.liu@intel.com >
2025-05-09 09:16:26 +00:00
6e4a93e3f7
[Bugfix][CPU] Fix broken AVX2 CPU TP support ( #17252 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-09 08:55:14 +00:00
217db4baa6
[Bugfix][ROCm] Fix AITER MLA V1 ( #17880 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-09 08:38:21 +00:00
ff8c400502
[Doc] remove visible token in doc ( #17884 )
...
Signed-off-by: yan <yanma1@habana.ai >
2025-05-09 01:21:31 -07:00
89a0315f4c
[Doc] Update several links in reasoning_outputs.md ( #17846 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-05-09 01:20:55 -07:00
3d1e387652
[Docs] Add Slides from NYC Meetup ( #17879 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-05-08 21:46:54 -07:00
d310e6de98
[BUGFIX]: return fast when request requires prompt logprobs ( #17251 )
2025-05-08 21:25:41 -07:00
5e6f939484
[Attention] MLA move rotary embedding to cuda-graph region ( #17668 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-09 11:14:42 +08:00
760e3ecc8f
[V1][Structured Output] Update llguidance (>= 0.7.11) to avoid AttributeError (no StructTag) ( #17839 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-05-08 20:14:18 -07:00
3c9396a64f
[FEAT][ROCm]: Support AITER MLA on V1 Engine ( #17523 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
2025-05-09 10:42:05 +08:00
376786fac1
Add cutlass support for blackwell fp8 blockwise gemm ( #14383 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
2025-05-08 15:09:55 -07:00
4f605a6de5
Fix noisy warning for uncalibrated q_scale/p_scale ( #17414 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-08 15:56:59 -04:00
8342e3abd1
[CI] Prune down lm-eval small tests ( #17012 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-08 19:00:26 +00:00
a83a0f92b5
[Test] Attempt all TPU V1 tests, even if some of them fail. ( #17334 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-05-08 17:20:54 +00:00
226a4272cf
[V1] Improve VLLM_ALLOW_INSECURE_SERIALIZATION logging ( #17860 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-08 16:57:35 +00:00
ec54d73c31
[CI] Fix test_collective_rpc ( #17858 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-08 16:47:12 +00:00
a944f8ede7
[Misc] Delete LoRA-related redundancy code ( #17841 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-08 06:02:21 -07:00
015815fe01
[Bugfix] use_fast failing to be propagated to Qwen2-VL image processor ( #17838 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-08 05:39:21 -07:00
e4ca6e3a99
Fix transient dependency error in docs build ( #17848 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-08 03:42:03 -07:00
53d0cb7423
[Misc] add chatbox integration ( #17828 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-08 10:05:26 +00:00
f50dcb7c21
[Easy] Eliminate c10::optional usage in vllm/csrc ( #17819 )
2025-05-08 03:05:10 -07:00
a1e19b635d
[Doc] Fix a typo in the file name ( #17836 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-08 18:04:18 +08:00
bb239a730f
[Bugfix] Fix quark fp8 format loading on AMD GPUs ( #12612 )
...
Signed-off-by: Felix Marty <felmarty@amd.com >
Signed-off-by: kewang2 <kewang2@amd.com >
Co-authored-by: kewang2 <kewang2@amd.com >
2025-05-08 02:53:53 -07:00
a463555dee
[TPU] Fix the test_sampler ( #17820 )
2025-05-08 05:51:33 -04:00
ca04b97c93
[Bugfix] Fix tool call template validation for Mistral models ( #17644 )
...
Signed-off-by: Rick Yuan <yuan821120@gmail.com >
Signed-off-by: RIck Yuan <yuan821120@gmail.com >
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com >
2025-05-08 09:47:19 +00:00
0a9bbaa104
[Misc] support model prefix & add deepseek vl2 tiny fused moe config ( #17763 )
...
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com >
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com >
2025-05-08 07:50:22 +00:00
39956efb3f
[Bugfix] Fix bad words for Mistral models ( #17753 )
...
Signed-off-by: Qiong Zhou Huang <qiong@phonic.co >
2025-05-07 23:32:10 -07:00
597051e56f
[Qwen3]add qwen3-235b-bf16 fused moe config on A100 ( #17715 )
2025-05-07 23:09:32 -07:00
96722aa81d
[Frontend] Chat template fallbacks for multimodal models ( #17805 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-07 23:05:54 -07:00
843b222723
[Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU ( #17648 )
...
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai >
2025-05-07 22:37:03 -07:00
e515668edf
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER ( #17153 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-05-07 22:35:03 -07:00
5a499e70d5
[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs ( #17071 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: charlifu <charlifu@amd.com >
2025-05-07 22:34:49 -07:00
6930a41116
[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var ( #17490 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-05-08 13:34:02 +08:00
998eea4a0e
Only log non-default CLI args for online serving ( #17803 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-07 22:33:29 -07:00
c747d84576
[Installation] OpenTelemetry version update ( #17771 )
...
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com >
2025-05-07 22:32:49 -07:00
b2da14a05a
Improve exception reporting in MP engine ( #17800 )
...
Signed-off-by: Vadim Markovtsev <vadim@poolside.ai >
2025-05-08 05:32:39 +00:00
7ea2adb802
[Core] Support full cuda graph in v1 ( #16072 )
...
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com >
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com >
2025-05-07 22:30:15 -07:00
3d13ca0e24
[BugFix] Fix --disable-log-stats in V1 server mode ( #17600 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-08 04:08:15 +00:00
66ab3b13c9
Don't call the venv vllm ( #17810 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-08 04:06:39 +00:00
a8238bbdb0
[Chore][Doc] uses model id determined from OpenAI client ( #17815 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-08 01:48:57 +00:00
d43f914d42
[Core][Feature] Input metadata dump on crash ( #13407 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com >
2025-05-07 22:15:09 +00:00
ed5272cf21
[BugFix] Avoid secondary missing MultiprocExecutor.workers error ( #17811 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-07 21:55:04 +00:00
c20ef40fd0
[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend ( #14238 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
2025-05-07 16:28:47 -04:00
db593aa67f
[Quantization] Quark MXFP4 format loading ( #16943 )
2025-05-07 15:05:05 -04:00
f98e307588
[Bugfix] Fix missing lora name mapping for lora without prefix ( #17793 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-07 16:17:12 +00:00
646a31e51e
Fix and simplify deprecated=True CLI kwarg ( #17781 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-07 16:51:06 +01:00
be8ff88e66
[Bugfix] Fix Video IO error for short video ( #17791 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-07 15:36:06 +00:00
1a6af1453d
Only depend on importlib-metadata for Python < 3.10 ( #17776 )
...
Signed-off-by: Christian Heimes <christian@python.org >
2025-05-07 07:51:06 -07:00
32aa74c09c
[ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention ( #17139 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-07 07:12:35 -07:00
7377dd0307
[doc] update the issue link ( #17782 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-07 20:29:05 +08:00
98c89e16ff
Make key optional for rotary embedding ( #17566 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-07 00:11:46 -07:00
324a3119b0
Fix test_memory_usage_no_spec ( #17754 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-07 00:10:33 -07:00
8a15c2603a
[Frontend] Add missing chat templates for various MLLMs ( #17758 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-07 00:10:01 -07:00
043e4c4955
Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling ( #16357 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
Co-authored-by: Aaron Dou <yzdou@amazon.com >
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com >
Co-authored-by: Chongming Ni <chongmni@amazon.com >
Co-authored-by: Amulya Ballakur <amulyaab@amazon.com >
Co-authored-by: Patrick Lange <patlange@amazon.com >
Co-authored-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Lin Lin Pan <tailinpa@amazon.com >
Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com >
Co-authored-by: Yishan McNabb <yishanm@amazon.com >
Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com >
2025-05-07 00:07:30 -07:00
ba7703e659
[Misc] Remove qlora_adapter_name_or_path ( #17699 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-06 23:10:37 -07:00
f80ae5bdcf
[Kernel] Use fused rmsnorm for some models like qwen3 series ( #17735 )
...
Signed-off-by: evian <eviantai@u.nus.edu >
Co-authored-by: evian <eviantai@u.nus.edu >
2025-05-06 23:10:02 -07:00
1a45a61387
[Kernel] GGUF MoeVec kernel ( #16780 )
...
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com >
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-05-06 23:07:23 -07:00
c3e9d5060e
[Misc] Use apply_rotary_emb from vllm_flash_attn for Qwen2-VL vision RoPE ( #17726 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-07 04:51:33 +00:00
822de7fb94
[Misc] Split model loader ( #17712 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-07 12:42:26 +08:00
8d84d836d1
[BugFix][Spec Decode] Fix hidden size mismatch between target and eagle head ( #17740 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-06 19:51:26 -07:00
950b71186f
Replace lm-eval bash script with pytest and use enforce_eager for faster CI ( #17717 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-06 18:00:10 -07:00
e50a1f1a9c
[TPU] Add kernel test for moe_pallas ( #17496 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-05-06 17:59:57 -07:00
a17cef70ea
Removed unused marlin cuda code ( #17684 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-06 17:59:47 -07:00
18dd5e01f2
[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels ( #17146 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
2025-05-06 17:59:30 -07:00
6de3e13413
Add logging for torch nightly version ( #17669 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-05-07 00:45:51 +00:00
ed3a1d2106
[ROCm] fix num_stages for default moe config to avoid triton OutOfResource error ( #17744 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2025-05-07 00:39:48 +00:00
022afbeb4e
Fix doc build performance ( #17748 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-07 00:36:41 +00:00
2f925e5777
[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode ( #16828 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-06 18:21:48 -04:00
de906b95f9
[Bugfix] Fix for the condition to accept empty encoder inputs for mllama ( #17732 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-06 19:59:06 +00:00
d456aea71f
[Misc] Add Next Edit Prediction (NEP) datasets support in benchmark_serving.py ( #16839 )
...
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b .c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b .c.jetbrains-grazie.internal>
2025-05-06 15:38:45 -04:00
621ca2c0ab
[TPU] Increase block size and reset block shapes ( #16458 )
2025-05-06 13:55:04 -04:00
6115b11582
Make right sidebar more readable in "Supported Models" ( #17723 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-06 16:48:26 +00:00
5b8c390747
[Bugfix] Fix modality limits in vision language example ( #17721 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-06 16:12:28 +00:00
7525d5f3d5
[doc] Add RAG Integration example ( #17692 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-06 16:10:23 +00:00
aabcd2cae3
[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager ( #17479 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-06 08:50:34 -07:00
0d115460a7
[Docs] Use gh-file to add links to tool_calling.md ( #17709 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-05-06 15:27:19 +00:00
175bda67a1
[Feat] Add deprecated=True to CLI args ( #17426 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-06 08:11:27 -07:00
cba31c47c4
[v1] AttentionMetadata for each layer ( #17394 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-06 07:58:37 -07:00
a6fed02068
[V1][PP] Support PP for MultiprocExecutor ( #14219 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-05-06 07:58:05 -07:00
d419aa5dc4
[V1] Enable TPU V1 backend by default ( #17673 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-06 06:49:49 -07:00
f9bc5a0693
[Bugfix] Fix triton import with local TritonPlaceholder ( #17446 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-06 17:53:09 +08:00
05e1f96419
Fix dockerfilegraph pre-commit hook ( #17698 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-06 08:56:48 +00:00
6eae34533a
[Misc] Fix ScalarType float4 naming ( #17690 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-06 01:07:15 -07:00
63ced7b43f
[Doc] Update notes for H2O-VL and Gemma3 ( #17219 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-06 07:51:02 +00:00
dc47ba32f8
[Bugfix] Fixed prompt length for random dataset ( #17408 )
...
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com >
2025-05-06 07:00:08 +00:00
edbf2d609e
[easy] Fix logspam on PiecewiseBackend errors ( #17138 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-05 23:46:11 -07:00
999328be0d
[Model] Add GraniteMoeHybrid 4.0 model ( #17497 )
...
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com >
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com >
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-05-06 12:00:31 +08:00
98834fefaa
Update nm to rht in doc links + refine fp8 doc ( #17678 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-06 00:41:14 +00:00
90bd2ae172
[Bugfix] LoRA - Retire unused maxnreg LoRA kernel argument ( #17677 )
2025-05-05 17:34:29 -07:00
5941e0b7ea
[TPU][V1] Add support for top-logprobs ( #17072 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-05-05 14:20:15 -07:00
9765940824
[TPU] Enable gemma3-27b with TP>1 on multi-chips. ( #17335 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-05-05 14:19:58 -07:00
5ea5c514da
[BugFix] Increase timeout for startup failure test ( #17642 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-05 20:53:19 +00:00
d3efde8176
[Benchmarks] Remove invalid option under V1 engine ( #17651 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-05 16:30:22 -04:00
aea302be6c
Use git-path commit in hook ( #17616 )
...
Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com >
2025-05-05 17:55:32 +00:00
cc05b90d86
[Doc] Fix broken cuda installation doc rendering ( #17654 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-05 17:52:40 +00:00
1d0c9d6b2d
[Kernel] some optimizations for dense marlin and moe marlin ( #16850 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-05-05 09:39:30 -07:00
f62cad6431
[Build/CI] Upgrade CUTLASS to 3.9.2 ( #17641 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-05-04 19:23:17 -07:00
5394ad7387
[Bugfix] fix KeyError on top logprobs are special tokens ( #17637 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-04 19:22:35 -07:00
68e1ee0072
[Bugfix][Easy] Fix whitespace in shm_broadcast.py logging ( #17635 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-05-04 19:20:19 -07:00
2858830c39
[Bugfix] Prioritize dtype in root config before checking text config ( #17629 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-04 12:43:05 +00:00
d6484ef3c3
Add full API docs and improve the UX of navigating them ( #17485 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-03 19:42:43 -07:00
46fae69cf0
[Misc] V0 fallback for --enable-prompt-embeds ( #17615 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-03 22:59:24 +00:00
f66f1e0fa3
[Bugfix] Fix broken Qwen2.5-omni tests ( #17613 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-03 17:08:14 +00:00
887d7af882
[Core] Gate prompt_embeds behind a feature flag ( #17607 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-04 00:19:20 +08:00
a92842454c
[Bugfix][ROCm] Using device_type because on ROCm the API is still torch.cuda ( #17601 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-02 22:25:47 -07:00
c8386fa61d
[Build/CI] Upgrade CUTLASS to 3.9.1 ( #17602 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-05-02 22:25:14 -07:00
87baebebd8
[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name ( #17508 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-05-02 21:42:44 -07:00
e3d0a1d190
[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm ( #17558 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-05-02 21:41:10 -07:00
d47b605eca
Update test requirements to CUDA 12.8 ( #17576 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-05-02 21:40:15 -07:00
22c6f6397f
[Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 ( #17603 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-05-03 02:41:59 +00:00
3ec97e2cc5
[release] Add command to clean up Docker containers/images in TPU release machine ( #17606 )
2025-05-02 18:54:34 -07:00
9b103a1d76
fix typo in logging ( #17605 )
2025-05-02 18:04:40 -07:00
b90b0852e9
[easy] Print number of needed GPUs in skip message ( #17594 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-02 15:27:43 -07:00
9352cdb56d
[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning ( #16263 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Lu Fang <lufang@fb.com >
2025-05-02 19:44:19 +00:00
182f40ea8b
Add NVIDIA TensorRT Model Optimizer in vLLM documentation ( #17561 )
2025-05-02 11:36:46 -07:00
3e887d2e0c
permute/unpermute kernel for moe optimization ( #14568 )
...
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn >
2025-05-02 11:31:55 -07:00
0f87d8f7b2
[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results ( #17574 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-02 11:01:38 -07:00
4c33d67321
[Bugfix] fix tmp_out and exp_sums dimensions ( #17438 )
...
Signed-off-by: Hui Liu <96135754+hliuca@users.noreply.github.com >
2025-05-02 16:44:07 +00:00
cb234955df
[Misc] Clean up input processing ( #17582 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-02 08:11:53 -07:00
3a500cd0b6
[doc] miss result ( #17589 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-02 07:04:49 -07:00
868c546da4
Support W8A8 INT8 MoE for compressed-tensors ( #16745 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-02 10:03:32 -04:00
99404f53c7
[Security] Fix image hash collision ( #17378 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-02 08:36:39 -04:00
785d75a03b
Automatically tell users that dict args must be valid JSON in CLI ( #17577 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-02 05:24:55 -07:00
6d1479ca4b
[doc] add the print result ( #17584 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-02 05:24:45 -07:00
b8b0859b5c
add more pytorch related tests for torch nightly ( #17422 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-05-02 03:29:59 -07:00
d7543862bd
[Misc] Rename assets for testing ( #17575 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-02 03:29:25 -07:00
c777df79f7
[BugFix] Fix Memory Leak ( #17567 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-05-02 01:07:03 -07:00
cc2a77d7f1
[Core] [Bugfix] Add Input Embeddings ( #15428 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com >
Co-authored-by: Bryce1010 <bryceyx@gmail.com >
Co-authored-by: Nan2018 <nan@protopia.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-02 01:06:39 -07:00
9e2de9b9e9
[Bugifx] Remove TritonPlaceholder from sys.modules ( #17317 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-02 00:45:01 -07:00
109e15a335
Add pt_load_map_location to allow loading to cuda ( #16869 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-05-01 23:23:42 -07:00
f192ca90e6
Fix PixtralHF missing spatial_merge_size ( #17571 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-01 22:14:09 -07:00
f89d0e11bf
[Misc] Continue refactoring model tests ( #17573 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 22:06:08 -07:00
b4003d11fc
Check if bitblas is installed during support check ( #17572 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-02 04:32:54 +00:00
292fc59d61
[CI] Actually run tests/kv_transfer/test_disagg.py in CI ( #17555 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-02 04:05:04 +00:00
afcb3f8863
[Attention] MLA move o_proj q_proj into cuda-graph region ( #17484 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-02 03:16:26 +00:00
afb12e4294
[Doc] note that not all unit tests pass on CPU platforms ( #17554 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-02 02:57:21 +00:00
24aebae177
[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 ( #17541 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-01 17:59:35 -07:00
39c0813a7f
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 ( #17504 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-05-01 16:19:30 -07:00
9b70e2b4c1
[Misc][Tools][Benchmark] Publish script to auto tune server parameters ( #17207 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-05-01 19:53:03 +00:00
173daac19d
[Bug]change the position of cuda_graph_sizes in dataclasses ( #17548 )
...
Signed-off-by: CXIAAAAA <cxia0209@gmail.com >
2025-05-01 11:52:37 -07:00
04f2cfc894
Remove duplicate code from dbrx.py ( #17550 )
2025-05-01 11:51:58 -07:00
811a6c0972
[ROCM] Add gfx950 to the custom attention archs ( #16034 )
...
Signed-off-by: jpvillam <Juan.Villamizar@amd.com >
Signed-off-by: seungrokjung <seungrok.jung@amd.com >
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: seungrokjung <seungrok.jung@amd.com >
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-01 11:18:28 -07:00
9b1769dd9a
[Bugfix] Fix lint error ( #17547 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 11:12:19 -07:00
61c299f81f
[Misc]add configurable cuda graph size ( #17201 )
...
Signed-off-by: CXIAAAAA <cxia0209@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-01 11:04:50 -07:00
4acfa3354a
[ROCm] update installation guide to include build aiter from source instructions ( #17542 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-01 11:01:28 -07:00
88c8304104
[Model] Refactor Ovis2 to support original tokenizer ( #17537 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-01 11:00:53 -07:00
6768ff4a22
Move the last arguments in arg_utils.py to be in their final groups ( #17531 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-01 10:31:44 -07:00
f2e7af9b86
[CI/Build] Remove awscli dependency ( #17532 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 09:20:54 -07:00
7423cf0a9b
[Misc] refactor example - cpu_offload_lmcache ( #17460 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-01 15:05:24 +00:00
460a2b1100
[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations ( #10867 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-05-01 07:59:28 -07:00
28566d73b3
[ROCm] remove unsupported archs from rocm triton flash-attention supported list ( #17536 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2025-05-01 07:54:25 -07:00
98060b001d
[Feature][Frontend]: Deprecate --enable-reasoning ( #17452 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-01 06:46:16 -07:00
f5a3c655b2
[FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config ( #17535 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-01 06:37:17 -07:00
7169f87ad0
[doc] add streamlit integration ( #17522 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-01 13:34:02 +00:00
b74d888c63
Fix more broken speculative decode tests ( #17450 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-05-01 06:05:58 -07:00
2007d4d54f
[FEAT] [ROCm]: Add Qwen/Qwen3-30B-A3B-FP8 fused moe config for MI300X ( #17530 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-01 06:03:13 -07:00
48e925fab5
[Misc] Clean up test docstrings and names ( #17521 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 05:19:32 -07:00
1903c0b8a3
[Frontend] Show progress bar for adding requests ( #17525 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 05:15:32 -07:00
86a1f67a3b
[Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model ( #17285 )
...
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com >
2025-05-01 11:54:51 +00:00
a257d9bccc
Improve configs - ObservabilityConfig ( #17453 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-01 03:52:05 -07:00
015069b017
[Misc] Optimize the Qwen3_ReasoningParser extract_reasoning_content ( #17515 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-01 03:29:01 -07:00
fbefc8a78d
[Core] Enable IPv6 with vllm.utils.make_zmq_socket() ( #16506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-01 09:38:18 +00:00
26bc4bbcd8
Avoid overwriting vllm_compile_cache.py ( #17418 )
...
Signed-off-by: Keyun Tong <tongkeyun@gmail.com >
2025-05-01 07:30:57 +00:00
3c3d767201
[BugFix] Fix mla cpu - missing 3 required positional arguments ( #17494 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-01 14:36:52 +08:00
13cf6b6236
[BugFix] fix speculative decoding memory leak when speculation is disabled ( #15506 )
...
Signed-off-by: Noah Yoshida <noahcy117@gmail.com >
2025-04-30 23:28:17 -07:00
90d0a54c4d
[ROCm] Effort to reduce the number of environment variables in command line ( #17229 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2025-04-30 23:27:06 -07:00
7a0a146c54
[Build] Require setuptools >= 77.0.3 for PEP 639 ( #17389 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-30 23:25:36 -07:00
7ab643e425
FIxing the AMD test failures caused by PR#16457 ( #17511 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-04-30 23:23:07 -07:00
afb4429b4f
[CI/Build] Reorganize models tests ( #17459 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-30 23:03:08 -07:00
aa4502e7f3
[CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg ( #17500 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-30 21:03:30 -07:00
17b4d85f63
[CI][TPU] Skip structured outputs+spec decode tests on TPU ( #17510 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-30 20:36:20 -07:00
1144a8efe7
[Bugfix] Temporarily disable gptq_bitblas on ROCm ( #17411 )
...
Signed-off-by: Yan Cangang <nalanzeyu@gmail.com >
2025-04-30 19:51:45 -07:00
08fb5587b4
[Bugfix][ROCm] Fix import error on ROCm ( #17495 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-30 19:51:42 -07:00
dbc18e7816
[CI][TPU] Skip Multimodal test ( #17488 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-04-30 19:51:39 -07:00
02bd654846
[Misc] Rename Audios -> Audio in Qwen2audio Processing ( #17507 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-30 19:51:36 -07:00
200bbf92e8
Bump Compressed Tensors version to 0.9.4 ( #17478 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-30 15:24:45 -07:00
81ecf425f0
[v1][Spec Decode] Make sliding window compatible with eagle prefix caching ( #17398 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-30 18:25:53 +00:00
42d9a2c4c7
doc: fix bug report Github template formatting ( #17486 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-04-30 10:03:20 -07:00
2ac74d098e
[doc] add install tips ( #17373 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-30 17:02:41 +00:00
584f5fb4c6
[Bugfix][ROCm] Restrict ray version due to a breaking release ( #17480 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-30 09:59:06 -07:00
d586ddc691
[BugFix] Fix authorization of openai_transcription_client.py ( #17321 )
...
Signed-off-by: zh Wang <rekind133@outlook.com >
2025-04-30 09:51:05 -07:00
0b7e701dd4
[Docs] Update optimization.md doc ( #17482 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-30 09:34:02 -07:00
947f2f5375
[V1] Allow turning off pickle fallback in vllm.v1.serial_utils ( #17427 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-30 16:10:54 +00:00
739e03b344
[Bugfix] Fixed mistral tokenizer path when pointing to file ( #17457 )
...
Signed-off-by: Pete Savage <psavage@redhat.com >
2025-04-30 08:08:37 -07:00
da4e7687b5
[Fix] Support passing args to logger ( #17425 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-04-30 08:06:58 -07:00
39317cf42b
[Docs] Add command for running mypy tests from CI ( #17475 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-30 08:06:09 -07:00
2990cee95b
[Feature] The Qwen3 reasoning parser supports guided decoding ( #17466 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-30 07:48:21 -07:00
0be6d05b5e
[V1][Metrics] add support for kv event publishing ( #16750 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-04-30 07:44:45 -07:00
77073c77bc
[Core] Prevent side-channel attacks via cache salting ( #17045 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com >
2025-04-30 20:27:21 +08:00
a7d5b016bd
[TPU][V1][CI] Update regression test baseline for v6 CI ( #17064 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-30 04:03:22 -07:00
d803786731
[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None ( #15755 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-30 18:20:39 +08:00
1534d389af
[Misc] Remove deprecated files ( #17447 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-30 01:52:19 -07:00
ece5a8b0b6
Make the _apply_rotary_emb compatible with dynamo ( #17435 )
2025-04-30 07:52:48 +00:00
54072f315f
[MODEL ADDITION] Ovis2 Model Addition ( #15826 )
...
Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-04-30 07:33:29 +00:00
be633fba0f
[Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' ( #17434 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-30 00:11:04 -07:00
ed6cfb90c8
[Hardware][Intel GPU] Upgrade to torch 2.7 ( #17444 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com >
2025-04-30 00:03:58 -07:00
6ed9f6047e
[Intel GPU] [CI]Fix XPU ci, setuptools >=80.0 have build issue ( #17298 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-04-29 22:54:10 -07:00
a44c4f1d2f
Support LoRA for Mistral3 ( #17428 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-29 21:10:30 -07:00
88fcf00dda
Fix some speculative decode tests with tl.dot ( #17371 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-04-29 19:41:02 -07:00
d1f569b1b9
Fix call to logger.info_once ( #17416 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 19:39:18 -07:00
13698db634
Improve configs - ModelConfig ( #17130 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-30 10:38:22 +08:00
2c4f59afc3
Update PyTorch to 2.7.0 ( #16859 )
2025-04-29 19:08:04 -07:00
1c2bc7ead0
Truncation control for embedding models ( #14776 )
...
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Max de Bayser <mbayser@br.ibm.com >
2025-04-30 09:24:57 +08:00
4055130a85
[release] Always git fetch all to get latest tag on TPU release ( #17322 )
2025-04-29 17:52:11 -07:00
34120f5acd
[V1][Feature] Enable Speculative Decoding with Structured Outputs ( #14702 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-04-30 00:02:10 +00:00
7489ec0bab
Remove Bamba 9B from CI ( #17407 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 21:10:31 +00:00
70788bdbdc
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE ( #17211 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
2025-04-29 21:10:00 +00:00
c9c1b59e59
Fix: Python package installation for opentelmetry ( #17049 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
2025-04-29 20:20:24 +00:00
0350809f3a
Remove Falcon3 2x7B from CI ( #17404 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 19:52:25 +00:00
a6977dbd15
Simplify (and fix) passing of guided decoding backend options ( #17008 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 19:02:23 +00:00
2fa2a50bf9
[Bugfix] Fix Minicpm-O-int4 GPTQ model inference ( #17397 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-29 18:21:42 +00:00
08e15defa9
[CI/Build] Add retry mechanism for add-apt-repository ( #17107 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-29 10:40:52 -07:00
b37685afbb
[CI] Uses Python 3.11 for TPU ( #17359 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-04-29 17:39:16 +00:00
792595b59d
[TPU][V1][CI] Replace python3 setup.py develop with standard pip install --e on TPU ( #17374 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-29 10:36:48 -07:00
0c1c788312
[Doc][Typo] Fixing label in new model requests link in overview.md ( #17400 )
2025-04-29 10:29:48 -07:00
56d64fbe30
[Docs] Propose a deprecation policy for the project ( #17063 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-29 10:29:44 -07:00
608968b7c5
Enabling multi-group kernel tests. ( #17115 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-04-29 10:27:27 -07:00
06ffc7e1d3
[Misc][ROCm] Exclude cutlass_mla_decode for ROCm build ( #17289 )
...
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
2025-04-29 10:26:42 -07:00
d3cf61b89b
fix gemma3 results all zero ( #17364 )
...
Signed-off-by: mayuyuace <qiming1.zhang@intel.com >
2025-04-29 09:40:25 -07:00
a39203f99e
[Bugfix] add qwen3 reasoning-parser fix content is None when disable … ( #17369 )
...
Signed-off-by: mofanke <mofanke@gmail.com >
2025-04-29 16:32:40 +00:00
24e6ad3f16
[V1] Remove num_input_tokens from attn_metadata ( #17193 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-29 09:28:41 -07:00
2ef5d106bb
Improve literal dataclass field conversion to argparse argument ( #17391 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 16:25:08 +00:00
0ed27ef66c
Fix: Spelling of inference ( #17387 )
2025-04-29 09:23:39 -07:00
900edfa8d4
Transformers backend tweaks ( #17365 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 09:08:03 -07:00
88ad9ec6b2
[Frontend] Support chat_template_kwargs in LLM.chat ( #17356 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-29 22:03:35 +08:00
40896bdf3f
pre-commit autoupdate (#17380 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 06:46:55 -07:00
00ee37efa2
[Bugfix] Clean up MiniMax-VL and fix processing ( #17354 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-29 20:42:16 +08:00
890f104cdf
[Doc] Fix QWen3MOE info ( #17381 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-29 12:38:32 +00:00
4a5e13149a
Update docs requirements ( #17379 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 11:35:47 +00:00
97cc8729f0
[Model] Ignore rotary embed load for Cohere model ( #17319 )
2025-04-29 00:30:40 -07:00
4464109219
[Build][Bugfix] Restrict setuptools version to <80 ( #17320 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-29 00:17:23 -07:00
193e78e35d
[Fix] Documentation spacing in compilation config help text ( #17342 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-04-29 00:16:17 -07:00
bdb2cddafc
[Misc]Use a platform independent interface to obtain the device attributes ( #17100 )
2025-04-29 06:59:13 +00:00
ebb3930d28
[Misc] Move config fields to MultiModalConfig ( #17343 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-29 06:37:21 +00:00
cde384cd92
[Model] support MiniMax-VL-01 model ( #16328 )
...
Signed-off-by: qingjun <qingjun@minimaxi.com >
2025-04-29 12:05:50 +08:00
96e06e3cb7
[Misc] Add a Jinja template to support Mistral3 function calling ( #17195 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-28 19:53:44 -07:00
17eb306fcc
[Bugfix] Add contiguous call inside rope kernel wrapper ( #17091 )
...
Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn >
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn >
2025-04-28 19:24:07 -07:00
165cb56329
Ignore '<string>' filepath ( #17330 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-28 19:23:29 -07:00
d6da8a8ff2
[Bugfix] Fix numel() downcast in fused_layernorm_dynamic_per_token_quant.cu ( #17316 )
2025-04-28 19:23:18 -07:00
b4ac4fa04d
[model] make llama4 compatible with pure dense layers ( #17315 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-04-29 10:22:22 +08:00
e136000595
[V1][Spec Decode] Make Eagle model arch config driven ( #17323 )
2025-04-29 10:22:02 +08:00
86d9fc29cb
implement Structural Tag with Guidance backend ( #17333 )
...
Signed-off-by: Michal Moskal <michal@moskal.me >
2025-04-29 02:21:32 +00:00
506475de5f
[Optim] Compute multimodal hash only once per item ( #17314 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-29 09:40:35 +08:00
cfe4532093
[Benchmark] Add single turn MTBench to Serving Bench ( #17202 )
2025-04-28 16:46:15 -07:00
8fc88d63f1
[Model] Add tuned triton fused_moe configs for Qwen3Moe ( #17328 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-28 15:20:24 -07:00
6e74fd4945
Support loading transformers models with named parameters ( #16868 )
...
Signed-off-by: Alex <alexwu@character.ai >
2025-04-28 23:15:58 +01:00
dcbac4cb4b
[Model] Qwen3 Dense FP8 Compat Fixes ( #17318 )
...
Signed-off-by: simon-mo <xmo@berkeley.edu >
2025-04-28 14:12:01 -07:00
ed2462030f
[Bugfix] Fix moe weight losing all extra attrs after process_weights_after_loading. ( #16854 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-04-28 21:05:07 +00:00
cc5befbced
[BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) ( #17283 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-28 13:55:50 -07:00
2c89cd96a8
[Chore] cleanup license indicators in light of SPDX ( #17259 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-04-28 19:43:52 +00:00
a0304dc504
[Security] Don't bind tcp zmq socket to all interfaces ( #17197 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-28 10:08:20 -07:00
c7941cca18
Explicitly explain quant method override ordering and ensure all overrides are ordered ( #17256 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-28 16:55:31 +00:00
b6dd32aa07
Make name of compressed-tensors quant method consistent across vLLM ( #17255 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-28 16:28:13 +00:00
f94886946e
Improve conversion from dataclass configs to argparse arguments ( #17303 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-28 16:22:12 +00:00
72dfe4c74f
[Docs] Add a security guide ( #17230 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-28 15:12:17 +00:00
8b464d9660
[Misc] Clean up Qwen2.5-Omni code ( #17301 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-28 06:20:45 -07:00
889ebb2638
[Misc] Minor typo/grammar in platforms/interface.py ( #17307 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-28 05:45:42 -07:00
3ad986c28b
[doc] update wrong model id ( #17287 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-28 04:20:51 -07:00
344e193b7d
[Bugfix] Add missing get_language_model to new MLLMs ( #17300 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-28 04:09:57 -07:00
fb1c933ade
Add missing class docstring for PromptAdapterConfig ( #17302 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-28 04:06:59 -07:00
72c5b97231
Update tpu_worker.py 's typo ( #17288 )
2025-04-28 04:01:15 -07:00
fa93cd9f60
[Model] Add Granite Speech Support ( #16246 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com >
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-28 10:05:00 +00:00
aec9674dbe
[Core] Remove legacy input mapper/processor from V0 ( #15686 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-28 15:38:48 +08:00
7fcc4223dc
[Minor][Models] Pass partial_rotary_factor parameter to rope ( #17266 )
...
Signed-off-by: evian <eviantai@u.nus.edu >
Co-authored-by: evian <eviantai@u.nus.edu >
2025-04-28 04:28:59 +00:00
8262a3e23b
[Misc] Validate stop_token_ids contents ( #17268 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-28 03:54:05 +00:00
f211331c48
[Doc] small fix ( #17277 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-28 03:53:35 +00:00
9053d0b134
[Doc] Fix wrong github link in LMCache examples ( #17274 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-04-28 03:09:11 +00:00
cb3f2d8d10
[Bugfix] Fix Mistral3 spatial merge error ( #17270 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-27 19:40:05 -07:00
c12df53b60
[Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… ( #16751 )
...
Signed-off-by: Ther-LF <2639852836@qq.com >
2025-04-27 19:38:42 -07:00
d1aeea7553
[Bugfix] Fix missing ARG in Dockerfile for arm64 platforms ( #17261 )
...
Signed-off-by: lkm-schulz <44176356+lkm-schulz@users.noreply.github.com >
2025-04-27 19:38:14 -07:00
d8bccde686
[BugFix] Fix vllm_flash_attn install issues ( #17267 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-04-27 17:27:56 -07:00
20e489eaa1
[V1][Spec Decode] Make eagle compatible with prefix caching. ( #17137 )
...
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
2025-04-27 09:29:43 -07:00
4213475ec7
[Metrics] Fix minor inconsistencies in bucket progression ( #17262 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-27 16:19:39 +00:00
d92879baf6
[doc] Add feature status legend ( #17257 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-27 08:17:02 -07:00
690fe019f0
[Feature] support sequence parallelism using compilation pass ( #16155 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-04-27 06:29:35 -07:00
ed7a29d9f8
[NVIDIA] Support Cutlass MLA for Blackwell GPUs ( #16032 )
...
Signed-off-by: kaixih <kaixih@nvidia.com >
2025-04-27 06:29:21 -07:00
756848e79e
[Bugfix] Fix Lora Name Parsing ( #17196 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-27 20:33:09 +08:00
18445edd0f
[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens ( #17033 )
...
Signed-off-by: sfc-gh-zhwang <flex.wang@snowflake.com >
2025-04-27 12:30:53 +00:00
30215ca61f
[MISC] Use string annotation types for class definitions ( #17244 )
...
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
2025-04-27 08:39:57 +00:00
838cedade7
[Bugfix] Get a specific type of layer from forward context ( #17222 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-27 00:58:05 -07:00
4283a28c2f
[Bugfix] Fix QWen2 VL multimodal mapping ( #17240 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-27 05:53:23 +00:00
93a126fbc7
[Misc] Make cached tokenizer pickle-compatible ( #17048 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-27 13:05:00 +08:00
8e4b351a0c
[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel ( #12591 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-04-27 00:35:08 +00:00
9869453c42
Update test_flash_attn.py ( #17102 )
...
Signed-off-by: ShuaibinLi <lishuaibin@live.cn >
2025-04-26 22:17:35 +00:00
3642c59aa8
[CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh ( #16271 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-26 18:25:05 +00:00
43eea2953b
[Minor] Fix lint error in main branch ( #17233 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-26 11:10:14 -07:00
de7eb10ce4
[Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation ( #16878 )
...
Signed-off-by: imkero <kerorek@outlook.com >
2025-04-26 10:41:35 -07:00
fd11a325b8
[MISC] rename interval to max_recent_requests ( #14285 )
2025-04-26 16:59:18 +00:00
4d17e20310
Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 ( #16573 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-04-26 09:17:58 -07:00
10fd1d7380
[Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps ( #9276 )
...
Signed-off-by: changjun.lee <pord7457@gmail.com >
2025-04-26 11:51:17 -04:00
52b4f4a8d7
[Docs] Update structured output doc for V1 ( #17135 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-26 15:12:18 +00:00
e782e0a170
[Chore] added stubs for vllm_flash_attn during development mode ( #17228 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-04-26 07:45:26 -07:00
dc2ceca5c5
[BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set ( #17088 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-04-26 14:34:24 +00:00
f8acd01ff7
[V1] Add structural_tag support using xgrammar ( #17085 )
2025-04-26 14:06:37 +00:00
c48334d405
[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device ( #17186 )
...
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai >
2025-04-26 05:55:14 -07:00
909fdaf152
[Bugfix] Fix standard models tests ( #17217 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-26 02:26:41 -07:00
8c1c926d00
[Bugfix] Fix missing int type for -n in multi-image example ( #17223 )
2025-04-26 08:49:52 +00:00
df6f3ce883
[Core] Remove prompt string from engine core data structures ( #17214 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-25 23:41:05 -07:00
513f074766
[CI/test] Fix Eagle Correctness Test ( #17209 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-25 23:40:36 -07:00
b07bf83c7d
[BugFix] Avoid race conditions in zero-copy tensor transmission ( #17203 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-26 06:00:07 +00:00
53e8cf53a4
[V1][Metrics] Allow V1 AsyncLLM to use custom logger ( #14661 )
...
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-25 22:05:40 -07:00
54271bb766
[ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. ( #17011 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-04-25 22:05:10 -07:00
9e96f56efb
Allocate kv_cache with stride order ( #16605 )
...
Signed-off-by: shuw <shuw@nvidia.com >
2025-04-25 22:03:31 -07:00
b278911229
[Minor][Models] Fix Return Types of Llama & Eagle ( #17220 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-25 21:54:47 -07:00
7bd0c7745c
[Doc] Minor fix for the vLLM TPU setup page ( #17206 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-04-26 04:39:56 +00:00
1cf0719ebd
[Minor][Spec Decode] Add use_eagle to SpeculativeConfig ( #17213 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-25 21:08:15 -07:00
537d5ee025
[doc] add Anything LLM integration ( #17216 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-25 21:03:23 -07:00
c8e5be35f7
[MISC][AMD] Add unused annotation to rocm kernel file ( #17097 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-04-25 20:33:35 -07:00
a6e72e1e4f
[Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env ( #17142 )
...
Signed-off-by: James Wu <jjwu@meta.com >
2025-04-26 11:28:20 +08:00
5e83a7277f
[v1] [P/D] Adding LMCache KV connector for v1 ( #16625 )
2025-04-26 03:03:38 +00:00
68af5f6c5c
[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary ( #17215 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-04-25 19:55:05 -07:00
8de2901fea
[Bugfix] gemma[2,3] interleaved attention when sliding window is disabled ( #17180 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-25 19:53:51 -07:00
c53e0730cb
[Misc] Refine ray_serve_deepseek example ( #17204 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-04-25 16:06:59 -07:00
a0e619e62a
[V1][Spec Decode] EAGLE-3 Support ( #16937 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Co-authored-by: Bryan Lu <yuzhelu@amazon.com >
2025-04-25 15:43:07 -07:00
70116459c3
[BugFix][Frontend] Fix LLM.chat() tokenization ( #16081 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-25 22:20:05 +00:00
65e262b93b
Fix Python packaging edge cases ( #17159 )
...
Signed-off-by: Christian Heimes <christian@python.org >
2025-04-26 06:15:07 +08:00
43faa0461a
[Bugfix] Fix hybrid model tests ( #17182 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-25 15:14:37 -07:00
48cb2109b6
[V1] Move usage stats to worker and start logging TPU hardware ( #16211 )
2025-04-25 14:06:01 -06:00
a5450f11c9
[Security] Use safe serialization and fix zmq setup for mooncake pipe ( #17192 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-04-25 16:53:23 +00:00
9d98ab5ec6
[Misc] Inline Molmo requirements ( #17190 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-25 16:41:44 +00:00
df5c879527
[doc] update wrong hf model links ( #17184 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-25 16:40:54 +00:00
423e9f1cbe
Use Transformers helper get_text_config() instead of checking for text_config ( #17105 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-25 08:47:35 -07:00
0bd7f8fca5
Bump Transformers to 4.51.3 ( #17116 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-25 08:34:34 -07:00
d5615af9ae
[Bugfix] Fix Mistral ChatCompletionRequest Body Exception ( #16769 )
...
Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-25 07:26:30 -07:00
19dcc02a72
[Bugfix] Fix mistral model tests ( #17181 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-25 06:03:34 -07:00
7feae92c1f
[Doc] Move todo out of beam search docstring ( #17183 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-25 04:44:58 -07:00
f851b84266
[Doc] Add two links to disagg_prefill.md ( #17168 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-25 10:23:57 +00:00
fc966e9cc6
Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 ( #17158 )
2025-04-25 17:10:32 +08:00
ef19e67d2c
[Doc] Add headings to improve gptqmodel.md ( #17164 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-25 01:13:13 -07:00
a41351f363
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization ( #15734 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
2025-04-25 00:45:02 -07:00
6aae216b4e
[Bugfix] remove fallback in guided_json (int range, patterns) ( #16725 )
...
Signed-off-by: csy1204 <josang1204@gmail.com >
Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com >
2025-04-25 06:54:43 +00:00
b22980a1dc
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance ( #16457 )
...
Signed-off-by: cynthieye <yexin93@qq.com >
Co-authored-by: MagnetoWang <magnetowang@outlook.com >
2025-04-25 14:52:28 +08:00
881f735827
[Misc] Benchmark Serving Script Support Appending Results ( #17028 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-24 22:53:55 -07:00
2f54045508
[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton ( #15099 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-04-24 22:51:02 -07:00
5aa6efb9a5
[Misc] Clean up redundant code in uniproc_executor.py ( #16762 )
...
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com >
2025-04-24 22:49:30 -07:00
6ca0234478
Move missed SchedulerConfig args into scheduler config group in EngineArgs ( #17131 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 22:48:53 -07:00
649818995f
[Docs] Fix True->true in supported_models.md ( #17141 )
2025-04-25 04:20:04 +00:00
7a0a9da72b
[Doc] V1 : Update LoRA status ( #17133 )
...
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com >
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com >
2025-04-24 20:17:22 -07:00
69bff9bc89
fix float16 support for kimi-vl ( #17156 )
...
Co-authored-by: zhouzaida <zhouzaida@msh.team >
2025-04-24 20:16:32 -07:00
41ca7eb491
[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 ( #16864 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-24 20:12:21 -07:00
eef364723c
[FEAT] [ROCm]: AITER Fused MOE V1 Support ( #16752 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-04-25 11:06:50 +08:00
0d6e187e88
Use custom address for listening socket ( #15988 )
...
Signed-off-by: Jens Glaser <glaserj@ornl.gov >
2025-04-25 01:57:16 +00:00
9420a1fc30
Better error message for missing mistral params.json ( #17132 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-24 23:43:08 +00:00
583e900996
[Misc] Add example to run DeepSeek with Ray Serve LLM ( #17134 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-04-24 22:25:21 +00:00
05e1fbfc52
Add chat template for Llama 4 models ( #16428 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-04-24 20:19:36 +00:00
fe92176321
Add collective_rpc to llm engine ( #16999 )
...
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai >
2025-04-24 20:16:52 +00:00
6d0df0ebeb
[Docs] Generate correct github links for decorated functions ( #17125 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-24 10:39:43 -07:00
0fa939e2d1
Improve configs - LoRAConfig + PromptAdapterConfig ( #16980 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 10:29:34 -07:00
0422ce109f
Add :markdownhelp: to EngineArgs docs so markdown docstrings render properly ( #17124 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 10:28:45 -07:00
47bdee409c
Molmo Requirements ( #17026 )
...
Signed-off-by: Eyshika Agarwal <eyshikaengineer@gmail.com >
Signed-off-by: eyshika <eyshikaengineer@gmail.com >
2025-04-24 10:08:37 -07:00
49f189439d
existing torch installation pip command fix for docs ( #17059 )
2025-04-24 10:07:21 -07:00
5adf6f6b7f
Updating builkite job for IBM Power ( #17111 )
...
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com >
2025-04-24 10:06:17 -07:00
4115f19958
[CI] Add automation for the tool-calling github label ( #17118 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-24 09:22:00 -07:00
340d7b1b21
[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics ( #16665 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-04-24 08:57:40 -07:00
1bcbcbf574
[Misc] refactor example series - structured outputs ( #17040 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-24 07:49:48 -07:00
82e43b2d7e
Add missing rocm_skinny_gemms kernel test to CI ( #17060 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-24 07:49:37 -07:00
67309a1cb5
[Frontend] Using matryoshka_dimensions control the allowed output dimensions. ( #16970 )
2025-04-24 07:06:28 -07:00
b724afe343
[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning ( #16954 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-24 06:15:03 -07:00
21f4f1c9a4
Improve static type checking in LoRAModelRunnerMixin ( #17104 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 06:14:47 -07:00
b0c1f6202d
[Misc] Remove OLMo2 config copy ( #17066 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-24 06:14:32 -07:00
c0dfd97519
[V1][PP] Optimization: continue scheduling prefill chunks ( #17080 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-04-24 05:27:08 -07:00
a9138e85b1
Fix OOT registration test ( #17099 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 04:44:12 -07:00
0a05ed57e6
Simplify TokenizerGroup ( #16790 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 04:43:56 -07:00
14288d1332
Disable enforce_eager for V1 TPU sampler and structured output tests ( #17016 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-24 02:50:09 -07:00
b411418ff0
[Chore] Remove Sampler from Model Code ( #17084 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-24 02:49:33 -07:00
2bc0f72ae5
Add docs for runai_streamer_sharded ( #17093 )
...
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-24 01:03:21 -07:00
9c1244de57
[doc] update to hyperlink ( #17096 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-24 00:58:08 -07:00
db2f8d915c
[V1] Update structured output ( #16812 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-23 23:57:17 -07:00
6167c0e5d2
[Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… ( #16472 )
...
Signed-off-by: 开哲 <kaizhe.zy@alibaba-inc.com >
Co-authored-by: 开哲 <kaizhe.zy@alibaba-inc.com >
2025-04-24 11:25:37 +08:00
ed2e464653
Addendum Fix to support FIPS enabled machines with MD5 hashing ( #17043 )
...
Signed-off-by: sydarb <areebsyed237@gmail.com >
2025-04-23 19:55:00 -07:00
2c8ed8ee48
More informative error when using Transformers backend ( #16988 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 19:54:03 -07:00
ed50f46641
[Bugfix] Enable V1 usage stats ( #16986 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-23 19:54:00 -07:00
46e678bcff
[Minor] Use larger batch sizes for A100/B100/B200/MI300x ( #17073 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-23 19:18:59 -07:00
6b2427f995
[Quantization]add prefix for commandA quantized model ( #17017 )
2025-04-23 17:32:40 -07:00
b07d741661
[CI/Build] workaround for CI build failure ( #17070 )
...
Signed-off-by: csy1204 <josang1204@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-04-23 16:14:18 -07:00
41fb013d29
[V1][Spec Decode] Always use argmax for sampling draft tokens ( #16899 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-23 14:57:43 -07:00
32d4b669d0
[BugFix][V1] Fix int32 token index overflow when preparing input ids ( #16806 )
2025-04-23 12:12:35 -07:00
3cde34a4a4
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar ( #15949 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
2025-04-23 18:34:41 +00:00
bdb3660312
Use @property and private field for data_parallel_rank_local ( #17053 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 08:50:08 -07:00
f3a21e9c68
CacheConfig.block_size should always be int when used (#17052 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 08:50:05 -07:00
8e630d680e
Improve Transformers backend model loading QoL ( #17039 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 07:33:51 -07:00
af869f6dff
[CI] Update structured-output label automation ( #17055 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-23 07:33:14 -07:00
53c0fa1e25
Ensure that pid passed to kill_process_tree is int for mypy ( #17051 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 07:32:26 -07:00
f7912cba3d
[Doc] Add top anchor and a note to quantization/bitblas.md ( #17042 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-23 07:32:16 -07:00
6317a5174a
Categorize tests/kernels/ based on kernel type ( #16799 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-23 09:21:07 -04:00
aa72d9a4ea
Mistral-format support for compressed-tensors ( #16803 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-23 08:46:23 -04:00
ce17db8085
[CI] Run v1/test_serial_utils.py in CI ( #16996 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-23 01:13:34 -07:00
8c87a9ad46
[Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers ( #16964 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-23 07:24:09 +00:00
ec69124eb4
[Misc] Improve readability of get_open_port function. ( #17024 )
...
Signed-off-by: gitover22 <qidizou88@gmail.com >
2025-04-23 06:16:53 +00:00
d0da99fb70
[BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) ( #16998 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-22 21:49:24 -07:00
b2f195c429
[V1] Avoid socket errors during shutdown when requests are in in-flight ( #16807 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-23 12:36:29 +08:00
047797ef90
[Bugfix] Triton FA function takes no keyword arguments ( #16902 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-04-22 21:35:24 -07:00
eb8ef4224d
[doc] add download path tips ( #17013 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-23 04:06:30 +00:00
56a735261c
[INTEL-HPU][v0] Port delayed sampling to upstream ( #16949 )
...
Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai >
2025-04-22 20:14:11 -07:00
e1cf90e099
[misc] tune some env vars for GB200 ( #16992 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-23 10:59:48 +08:00
6bc1e30ef9
Revert "[Misc] Add S3 environment variables for better support of MinIO." ( #17021 )
2025-04-22 19:22:29 -07:00
7e081ba7ca
[BugFix] Revert ROCm Custom Paged Attention Env Flag Check ( #17022 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-04-22 19:17:48 -07:00
1e013fa388
[V1][DP] More robust DP/EP dummy request coordination ( #16277 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-22 19:12:15 -07:00
bc7c4d206b
[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 ( #13305 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com >
Signed-off-by: maleksan85 <maleksan@amd.com >
Signed-off-by: <>
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com >
2025-04-22 19:11:56 -07:00
f67e9e9f22
add Dockerfile build vllm against torch nightly ( #16936 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-04-22 19:08:27 -07:00
36fe78769f
[Bugfix] validate urls object for multimodal content parts ( #16990 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-23 09:43:06 +08:00
83d933718c
[Core][V1][TPU] Enable structured decoding on TPU V1 ( #16499 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-04-22 18:05:23 -06:00
5175b884f7
[BugFix] Remove default multiproc executor collective_rpc timeout ( #17000 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-22 23:27:14 +00:00
5536b30a4c
Fencing Kernels Tests for enabling on AMD ( #16929 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-04-22 09:32:40 -07:00
7f58fb9718
Add assertion for no objects while hashing hf_config ( #16930 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-22 09:32:22 -07:00
30bc3e0f66
[FEAT][ROCm]: Support AITER MLA ( #15893 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
2025-04-22 09:31:13 -07:00
f34410715f
[frontend] enhance tool_calls type check ( #16882 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-22 15:40:24 +00:00
68d4c33202
[Misc] Add S3 environment variables for better support of MinIO. ( #16977 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-22 14:27:36 +00:00
f961d7f6ef
[BugFix] Pass in correct VLLM config in FlashInfer backend ( #13207 ) ( #16973 )
...
Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn >
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn >
2025-04-22 06:44:10 -07:00
d059110498
Improve configs - SpeculativeConfig ( #16971 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-22 12:55:36 +00:00
571e8dd65e
[Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni ( #16974 )
...
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com >
2025-04-22 12:23:17 +00:00
4b91c927f6
[Misc] refactor example series ( #16972 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-22 11:44:21 +00:00
0e237f0035
[FEAT][ROCm] Integrate Paged Attention Kernel from AITER ( #15001 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-04-22 02:46:28 -07:00
8f7bace7c3
[Doc] Improve documentation for multimodal CLI args ( #16960 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-22 08:35:35 +00:00
e4d6144232
[BugFix] Fix incremental detokenization perf issue ( #16963 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-22 08:16:19 +00:00
8d32dc603d
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS ( #6036 )
...
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com >
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com >
2025-04-22 09:01:36 +01:00
c4ab9f3e71
[V1] Remove pre-allocation for KV cache ( #16941 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-22 00:52:18 -07:00
2689d5c027
[Model] Use autoweightloader for mamba ( #16950 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2025-04-22 07:48:15 +00:00
acba33a0f1
[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams ( #16767 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-04-22 06:02:20 +00:00
a114bf20a3
[Perf] Optimize _update_states for GPU model runner ( #16910 )
...
Signed-off-by: snowcharm <snowcharmqq@gmail.com >
2025-04-22 14:01:54 +08:00
3097ce3a32
[Doc] Update ai_accelerator/hpu-gaudi.inc.md ( #16956 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-22 05:33:27 +00:00
d6da9322c8
[Bugfix] Fix f-string for Python 3.9-3.11 ( #16962 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-21 21:45:55 -07:00
71ce44047f
Support S3 Sharded loading with RunAI Model Streamer ( #16317 )
...
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-21 21:21:49 -07:00
188b7f9b8c
[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm ( #15830 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-04-21 20:46:22 -07:00
b9b4746950
[V1] Remove additional_config check ( #16710 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-04-21 20:45:27 -07:00
7b8a2ab76f
[Kernel] Add expert_map support to Cutlass FP8 MOE ( #16861 )
...
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com >
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com >
2025-04-21 20:44:32 -07:00
c9acbf1141
[Misc] Remove the chunked prefill warning for LoRA ( #16925 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-21 20:44:24 -07:00
5b794cae8d
[ROCm] Add aiter tkw1 kernel for Llama4 fp8 ( #16727 )
...
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-04-21 20:42:34 -07:00
0e4254492f
[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other ( #16863 )
...
Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com >
2025-04-22 11:40:19 +08:00
1311913f55
[BugFix][Spec Decode] No in-place update to draft probs ( #16952 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-21 19:54:19 -07:00
29f395c97c
[Doc] Remove unnecessary V1 flag ( #16924 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-21 21:04:38 -04:00
fa3bba2a53
[TPU][V1] Enable Top-P ( #16843 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-22 00:46:07 +00:00
986537f1c3
[V1] V1 FlashInfer Attention ( #16684 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Aurick Qiao <qiao@aurick.net >
2025-04-22 00:38:41 +00:00
210207525e
[TPU][V1] Capture multimodal encoder during model compilation ( #15051 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Siyuan Liu <lsiyuan@google.com >
2025-04-21 18:36:59 -06:00
71eda0bb76
Update Qwen1.5-MoE-W4A16-compressed-tensors.yaml ( #16946 )
2025-04-21 18:35:32 -06:00
471fe65630
[TPU][V1] Implicitly adjust page size when there's SMEM OOM ( #16871 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-21 15:43:13 -06:00
3a0fba5cf4
[V1][Spec Decode] Handle draft tokens beyond max_model_len ( #16087 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-21 12:38:50 -07:00
299ebb62b2
[Core] Speed up decode by remove synchronizing operation in sampler ( #16436 )
...
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com >
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com >
2025-04-21 18:18:22 +00:00
f728ab8e35
[Doc] mention how to install in CPU editable mode ( #16923 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-04-21 17:45:51 +00:00
63e26fff78
[doc] install required python3-dev apt package ( #16888 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-04-21 16:15:18 +00:00
fe3462c774
[XPU][Bugfix] minor fix for XPU ( #15591 )
...
Signed-off-by: yan ma <yan.ma@intel.com >
2025-04-22 00:02:57 +08:00
3b34fd5273
Raise error for data-parallel with benchmark_throughput ( #16737 )
...
Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2025-04-21 23:51:43 +08:00
55d6d3fdb8
[Bugfix] Fix GLM rotary_dim issue and support v1 ( #16912 )
...
Signed-off-by: isotr0py <2037008807@qq.com >
2025-04-21 14:26:34 +00:00
7272bfae77
[Misc] Refactor platform to get device specific stream and event ( #14411 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-21 21:25:49 +08:00
d9ac9e3dc5
[Misc] fix collect_env version parse ( #15267 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-04-21 20:29:40 +08:00
d41faaf9df
Restore buffers when wake up from level 2 sleep ( #16564 ) ( #16889 )
...
Signed-off-by: Han <zh950713@gmail.com >
2025-04-21 20:18:28 +08:00
b34f33438a
[Doc] Split dummy_processor_inputs() in Multimodal Docs ( #16915 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-21 11:10:01 +00:00
26c0406555
[Bugfix] Fix distributed bug in Qwen2.5-VL & Qwen2.5-Omni ( #16907 )
2025-04-21 10:25:21 +00:00
4c41278b77
[CI/CD][V1] Add spec decode tests to CI ( #16900 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-20 22:37:16 -07:00
bb3605db85
[Bugfix] Fix v1/spec_decode/test_ngram.py ( #16895 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-04-20 20:54:29 -07:00
fe742aef5a
[easy] Pass compile_fx only the config patches ( #16845 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-20 12:25:19 +08:00
4b07d36891
Improve configs - CacheConfig ( #16835 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-20 12:25:04 +08:00
87aaadef73
Serialize tensors using int8 views ( #16866 )
...
Signed-off-by: Staszek Pasko <staszek@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-19 10:28:34 -07:00
682e0b6d2f
Log how much time loading a compiled artifact takes ( #16848 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-19 16:50:46 +00:00
d6195a748b
[doc] update hyperlink ( #16877 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-19 16:40:38 +00:00
205d84aaa9
[VLM] Clean up models ( #16873 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-19 12:13:06 +00:00
5124f5bf51
[Model] Qwen2.5-Omni Cleanup ( #16872 )
2025-04-19 09:37:02 +00:00
83f3c3bd91
[Model] Refactor Phi-4-multimodal to use merged processor and support V1 ( #15477 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-19 02:26:11 -07:00
d9737ca1c6
[V1][Misc] stop update prefix cache stats when logs_stats is disabled ( #16460 )
...
Signed-off-by: vie-serendipity <2733147505@qq.com >
2025-04-19 02:25:19 -07:00
9d4ca19d50
[Misc] Benchmarks for audio models ( #16505 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-19 02:24:14 -07:00
2ef0dc53b8
[Frontend] Add sampling params to v1/audio/transcriptions endpoint ( #16591 )
...
Signed-off-by: Jannis Schönleber <joennlae@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Jannis Schönleber <joennlae@gmail.com >
2025-04-19 07:03:54 +00:00
1d4680fad2
[rocm][MI300] llama4 maverick fp8 moe config tp8 ( #16847 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-04-19 06:21:43 +00:00
2c1bd848a6
[Model][VLM] Add Qwen2.5-Omni model support (thinker only) ( #15130 )
...
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Xiong Wang <wangxiongts@163.com >
2025-04-18 23:14:36 -07:00
5c9121203c
[release] Publish neuron docker image ( #16733 )
...
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com >
2025-04-18 17:11:25 -07:00
490b1698a5
[Doc] Updated Llama section in tool calling docs to have llama 3.2 config info ( #16857 )
...
Signed-off-by: jmho <jaylenho734@gmail.com >
2025-04-18 23:28:53 +00:00
5a5e29de88
[Misc] refactor examples series - Chat Completion Client With Tools ( #16829 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-18 23:24:42 +00:00
3d3ab3689f
[New Model]: Snowflake Arctic Embed (Family) ( #16649 )
2025-04-18 08:11:57 -07:00
686623c5e7
Fix nullable_kvs fallback ( #16837 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-18 05:58:39 -07:00
aadb656562
[Misc] Clean up Kimi-VL ( #16833 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-18 05:15:09 -07:00
87e067de41
[Model] use AutoWeightsLoader for BigCode, GPT-J ( #16823 )
...
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com >
2025-04-18 10:42:41 +00:00
26507f8973
[Docs] Fix a link and grammar issue in production-stack.md ( #16809 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-18 06:42:58 +00:00
9c1d5b456d
[Doc] add podman setup instructions for official image ( #16796 )
...
Signed-off-by: Nathan Weinberg <nweinber@redhat.com >
2025-04-18 06:10:49 +00:00
e31045f95c
[Bugfix] fix pp for llama4 ( #16746 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-04-18 13:51:30 +08:00
aaec845f8e
[ROCm] [Attention] Cleanup ROCm output passing ( #16431 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-04-18 05:46:45 +00:00
7bdfd29a35
[Misc] add collect_env to cli and docker image ( #16759 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-17 22:13:35 -07:00
e78587a64c
Improve-mm-and-pooler-and-decoding-configs ( #16789 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 22:13:32 -07:00
7eb4255628
[BugFix] Accuracy fix for llama4 int4 - improperly casted scales ( #16801 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-17 22:13:29 -07:00
6a0f547561
Add hardware print to TPU V1 test ( #16792 )
2025-04-17 22:13:26 -07:00
30ed81b7ca
[V1][Structured Output] Minor modification to _validate_structured_output() ( #16748 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-18 13:12:54 +08:00
7a4a5de729
[Misc] Update outdated note: LMCache now supports chunked prefill ( #16697 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-18 05:12:42 +00:00
c16fb5dae8
[Doc] Improve help examples for --compilation-config ( #16729 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-17 21:22:34 -07:00
e37073efd7
Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema ( #16721 )
...
Signed-off-by: Tarun Kumar <takumar@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-17 21:08:27 -07:00
183dad7a85
[Attention] Update to lastest FA3 code ( #13111 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-17 15:14:07 -07:00
3408e47159
[P/D][V1] KV Connector API V1 ( #15960 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-04-17 13:22:40 -07:00
0377b8310b
[MLA] Simplification to batch P/D reordering ( #16673 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-17 16:12:09 -04:00
e4755f7fac
[V1][Metrics] Fix http metrics middleware ( #15894 )
2025-04-17 19:52:18 +00:00
92edf35826
[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints ( #16674 )
2025-04-17 11:44:34 -07:00
eb5819b2d9
[V1][TPU] Enable Top K ( #15489 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com >
2025-04-17 18:18:11 +00:00
5989f4684d
[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even ( #16726 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-17 18:09:57 +00:00
5125d72f02
[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small ( #16548 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-17 17:48:31 +00:00
a018e555fd
[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 ( #16753 )
...
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com >
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com >
2025-04-18 00:01:30 +08:00
6211b92273
[Bugfix]Fix index out of range error in api server log ( #16787 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-04-17 09:01:07 -07:00
05fcd1b430
[V1][Perf] Faster incremental detokenization ( #15137 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-17 07:45:24 -07:00
7c02d6a137
[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion ( #16784 )
...
Signed-off-by: insukim1994 <insu.kim@moreh.io >
2025-04-17 14:10:08 +00:00
11c3b98491
[Doc] Document Matryoshka Representation Learning support ( #16770 )
2025-04-17 13:37:37 +00:00
dbe7f07001
[Doc] Make sure to update vLLM when installing latest code ( #16781 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-17 06:53:31 -06:00
c69bf4ee06
fix: hyperlink ( #16778 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-17 11:34:20 +00:00
d27ea94034
Improve configs - TokenizerPoolConfig + DeviceConfig ( #16603 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 11:19:42 +00:00
99ed526101
[Misc] refactor examples series - lmcache ( #16758 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-17 11:02:35 +00:00
207da28186
[Doc] Fix a 404 link in installation/cpu.md ( #16773 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-17 10:46:21 +00:00
5b1aca2ae3
[Bugfix] Fix GLM4 model ( #16618 )
...
Signed-off-by: intervitens <intervitens@tutanota.com >
2025-04-17 03:35:07 -07:00
d8e557b5e5
[doc] add open-webui example ( #16747 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-17 18:27:32 +08:00
61a44a0b22
[Doc] Add more tips to avoid OOM ( #16765 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-17 09:54:34 +00:00
a6481525b8
[misc] ignore marlin_moe_wna16 local gen codes ( #16760 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-17 17:15:14 +08:00
8cac35ba43
[Ray] Improve documentation on batch inference ( #16609 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu >
2025-04-16 22:19:26 -07:00
9dbf7a2dc1
[V1] Remove log noise when idle ( #16735 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-16 21:34:08 -07:00
607029e515
[Bugfix] Revert max_prompt_len validation for decoder-only models. ( #16741 )
...
Signed-off-by: David Heineman <david@davidheineman.com >
2025-04-16 21:33:15 -07:00
cb072ce93b
[Bugfix] Update Florence-2 tokenizer to make grounding tasks work ( #16734 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-17 04:17:39 +00:00
95aca283b4
[rocm][V0] fix selection logic for custom PA in V0 ( #16426 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-04-16 19:52:11 -07:00
2b05b8ce69
[V1][Frontend] Improve Shutdown And Logs ( #11737 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-16 19:48:34 -07:00
3c776dcefb
Adding vllm buildkite job for IBM Power ( #16679 )
...
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com >
2025-04-17 10:47:47 +08:00
2cbd4d2999
[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification ( #16636 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
2025-04-16 19:47:26 -07:00
3092375e27
[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] ( #16432 )
...
Signed-off-by: Staszek Pasko <staszek@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-16 19:28:32 -07:00
3cd91dc955
Help user create custom model for Transformers backend remote code models ( #16719 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 01:05:59 +00:00
8a7368e069
[Misc] Remove redundant comment ( #16703 )
...
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
2025-04-17 00:44:52 +00:00
93e561ec4d
Improve error for structured output backend selection ( #16717 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 00:35:35 +00:00
e1b004839a
[Hardware] Add processor inputs to platform validation ( #16680 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-04-16 09:28:42 -07:00
ee378f3d49
[Model] support modernbert ( #16648 )
...
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com >
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com >
2025-04-16 05:30:15 -07:00
e82ee40de3
[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel ( #16693 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-16 03:31:39 -07:00
facbe2a114
[Doc] Improve OOM troubleshooting ( #16704 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-16 18:29:48 +08:00
7168920491
[Misc] refactor examples series ( #16708 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-16 10:16:36 +00:00
21378a2323
[CI] Cleanup additional_dependencies: [toml] for pre-commit yapf hook ( #16405 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-04-16 10:05:31 +00:00
976711d9db
[V1][Structured Output] Move xgrammar related utils to backend_xgrammar.py ( #16578 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-16 17:01:36 +08:00
44fa4d556c
[ROCM] Bind triton version to 3.2 in requirements-built.txt ( #16664 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-04-16 14:05:28 +08:00
3ac98edcb1
[Feature] add model aware kv ops helper ( #16020 )
...
Signed-off-by: billishyahao <bill.he@amd.com >
2025-04-15 23:00:43 -07:00
966c742ed2
Disable remote caching when calling compile_fx ( #16611 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-15 22:18:28 -07:00
0d7d05f4b6
[Misc] Modify LRUCache touch ( #16689 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-16 04:51:38 +00:00
96bb8aa68b
[Bugfix] fix gpu docker image mis benchmarks dir ( #16628 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-15 21:21:14 -07:00
3badb0213b
[Model] Add PLaMo2 ( #14323 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
Signed-off-by: shemmi <shemmi@preferred.jp >
Co-authored-by: Kento Nozawa <nzw0301@preferred.jp >
Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp >
Co-authored-by: Calvin Metzger <metzger@preferred.jp >
2025-04-15 19:31:30 -07:00
fdcb850f14
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server ( #10546 )
...
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local >
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local >
2025-04-15 22:31:38 +00:00
54a66e5fee
[Misc] Update compressed-tensors WNA16 to support zero-points ( #14211 )
2025-04-15 07:33:51 -06:00
280d62b8a2
[Kernel] Remove redundant Exp calculations ( #16123 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-15 12:58:37 +00:00
1666e66443
Add "/server_info" endpoint in api_server to retrieve the vllm_config. ( #16572 )
...
Signed-off-by: Xihui Cang <xihuicang@gmail.com >
2025-04-15 11:50:38 +00:00
1575c1701a
[CI/Build] Fix LoRA OOM ( #16624 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-15 16:38:19 +08:00
6ae996a873
[Misc] refactor argument parsing in examples ( #16635 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-15 08:05:30 +00:00
b590adfdc1
Fix vLLM x torch.compile config caching ( #16491 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-14 23:11:11 -07:00
b4fe16c75b
Add vllm bench [latency, throughput] CLI commands ( #16508 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-14 23:10:35 -07:00
bc5dd4f669
[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) ( #16631 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
2025-04-14 23:09:58 -07:00
dbb036cf61
[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py ( #16623 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-04-15 05:35:38 +00:00
70e7ed841d
[BugFix]: Update minimum pyzmq version ( #16549 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
Co-authored-by: mgoin <michael@neuralmagic.com >
2025-04-14 20:06:03 -07:00
d06ba4ed3f
[Kernel] moe wna16 marlin kernel ( #14447 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-14 20:05:22 -07:00
6b40996ae8
[Core][Bugfix] Fix Offline MM Beam Search ( #16390 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-15 10:33:02 +08:00
d2020acac7
config check sleep mode support oot platforms ( #16562 )
2025-04-14 16:31:50 -07:00
1eb3c2ed48
[DOC][TPU] Add core idea about avoiding recompilation after warmup ( #16614 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-14 21:56:06 +00:00
c64ee87267
[Hardware][TPU] Add torchvision to tpu dependency file ( #16616 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-04-14 17:50:46 -04:00
b1308b84a3
[Model][VLM] Add Kimi-VL model support ( #16387 )
...
Signed-off-by: courage17340 <courage17340@163.com >
2025-04-14 21:41:48 +00:00
7b5ecf79bd
s390x: Fix PyArrow build and add CPU test script for Buildkite CI ( #16036 )
...
Signed-off-by: Nishan Acharya <Nishan.Acharya@ibm.com >
2025-04-14 10:55:32 -07:00
9883a18859
Fix triton install condition on CPU ( #16600 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-14 17:06:01 +00:00
b3f2fddd17
[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 ( #16596 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-14 17:01:05 +00:00
aa29841ede
[Bugfix] Multi-modal caches not acting like LRU caches ( #16593 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-14 09:24:16 -07:00
6bf27affb6
[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet ( #16048 )
...
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
2025-04-14 17:08:39 +01:00
1dd23386ec
[Misc] Update usage with mooncake lib for kv transfer ( #16523 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-04-14 11:31:37 +00:00
7cbfc10943
[Misc] refactor examples ( #16563 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-14 09:59:15 +00:00
ce4ddd2d1a
[Misc] remove warning if triton>=3.2.0 ( #16553 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-14 02:39:47 -07:00
e51929ebca
Improve configs - SchedulerConfig ( #16533 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-14 17:24:16 +08:00
dc1b4a6f13
[Core][V0] Enable regex support with xgrammar ( #13228 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-14 10:13:38 +08:00
63d2705edb
[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py ( #16556 )
2025-04-13 17:20:26 -07:00
d085a44082
Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) ( #16537 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-13 14:55:18 +00:00
f49e5aff11
[V1][Spec Decode] KV cache slots for eagle heads ( #16370 )
...
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
2025-04-12 19:42:51 -07:00
6c11ecf8d3
[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine ( #16529 )
...
Signed-off-by: Ryan McConville <ryan@ryanmcconville.com >
2025-04-12 20:19:19 +00:00
93e5f3c5fb
[Perf] Optimize Preparing Inputs for GPU Model Runner ( #16484 )
...
Signed-off-by: snowcharm <snowcharmqq@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-12 22:54:37 +08:00
70363bccfa
Fix syntaxWarning: invalid escape sequence '\s' ( #16532 )
...
Signed-off-by: Jie Fu <jiefu@tencent.com >
2025-04-12 14:39:42 +00:00
3cdc57669f
[Misc] Delete redundant code ( #16530 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-04-12 11:21:37 +00:00
68bb122eb4
[MISC] Make GroupCoordinator compatible with out-of-tree devices ( #16464 )
...
Signed-off-by: hzji210@gmail.com <hzji210@gmail.com >
2025-04-12 09:20:25 +00:00
d9fc8cd9da
[V1] Enable multi-input by default ( #15799 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-12 08:52:39 +00:00
f069f3ea74
[Misc] Openai transcription client example use same Whisper model ( #16487 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-12 07:27:03 +00:00
c5bc0e7fcc
[Misc] Update chat utils tests ( #16520 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-12 06:48:43 +00:00
4a3a518722
fix: spelling ( #16466 )
...
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com >
2025-04-11 23:24:22 -07:00
fbf722c6e6
[Frontend] support matryoshka representation / support embedding API dimensions ( #16331 )
2025-04-11 23:23:10 -07:00
e92d7085bf
[Feature][V1] Add xgrammar to support minLength, maxLength with test ( #16516 )
...
Signed-off-by: Leon Seidel <leon.seidel@fau.de >
2025-04-11 23:22:07 -07:00
bd6028d6b0
Optimized topk for topk=1 (Llama-4) ( #16512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-12 14:21:08 +08:00
802329dee9
[Doc] Update Llama4 Model Names in Supported Models ( #16509 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-04-12 02:53:10 +00:00
41cc883c29
[BugFix] Handle non-contiguous tensors properly when serializing ( #16492 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-11 17:54:06 -07:00
57504a4bcf
[CI][Bugfix] Add mistral_tool_use to Ci ( #16517 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-11 17:52:38 -07:00
ed4792c990
[Doc] Fix link to vLLM blog ( #16519 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-04-11 17:39:23 -07:00
87b836ba77
Bugfix for PixtralHF models without spatial_merge_size ( #16513 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-11 23:32:22 +00:00
56c76c2e0e
[Bugfix] clean up duplicated code ( #16485 )
...
Signed-off-by: Gogs <gogs@fake.local >
Co-authored-by: Gogs <gogs@fake.local >
2025-04-11 23:19:40 +00:00
c09632a66c
Update openai_compatible_server.md ( #16507 )
...
Signed-off-by: Christian Sears <csears@redhat.com >
2025-04-11 22:54:58 +00:00
a3bf8d4a2b
[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 ( #16488 )
2025-04-12 06:26:55 +08:00
16eda8c43a
[Frontend] Added chat templates for LLaMa4 pythonic tool calling ( #16463 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Kai Wu <kaiwu@meta.com >
2025-04-12 06:26:17 +08:00
cd77382ac1
Improve configs - LoadConfig ( #16422 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-11 20:27:27 +00:00
71b9cde010
[Bugfix] handle alignment of encoder_seq_lens in mllama.py ( #14784 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
2025-04-11 19:59:50 +00:00
5285589f37
[Doc] Document InternVL3 support ( #16495 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-11 19:41:09 +00:00
f41647ee6b
[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel ( #16366 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-11 17:54:08 +00:00
4d022cbc75
[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models ( #16483 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-11 17:06:14 +00:00
70de35a881
Fix erroneous "model doesn't support compile" warning ( #16486 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-11 16:24:36 +00:00
34b2cf3b33
[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU ( #12779 )
...
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com >
2025-04-11 07:38:36 -07:00
9e90c9f73f
[Bugfix] Fix bugs of running Quark quantized models ( #16236 )
...
Signed-off-by: chaow <chaow@amd.com >
2025-04-11 10:18:32 -04:00
e9528f6dc6
[Kernel] support merge_attn_states CUDA kernel, 3x speedup ( #16173 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-11 06:50:50 -06:00
51baa9c333
Don't install triton on ppc64le platform ( #16470 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-11 10:11:00 +00:00
35e076b3a8
[Misc] update api_client example ( #16459 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-11 10:05:40 +00:00
a26f59ccbc
[Misc] Raise error for V1 not supporting Long LoRA. ( #16415 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-11 01:51:20 -07:00
aa3b3d76e0
Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True ( #16447 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-11 08:09:52 +00:00
f7030df3be
[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner ( #15990 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-11 15:32:37 +08:00
905e91e9ac
Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" ( #16453 )
2025-04-11 06:44:22 +00:00
f8f9c0ba62
[Bugfix] Don't set an upper bound on repetition penalty ( #16403 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-11 14:19:40 +08:00
dda811021a
[CPU][Bugfix] Fix CPU docker issues ( #16454 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-04-11 14:19:07 +08:00
93195146ea
[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test ( #16424 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-11 04:57:16 +00:00
ed37599544
Update supported_hardware.md for TPU INT8 ( #16437 )
2025-04-11 12:28:07 +08:00
99ef59cf7f
[Llama4] Enable attention temperature tuning by default for long context (>32k) ( #16439 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-04-10 21:26:07 -07:00
d544d141ec
update benchmark_serving_structured_output to include auto backend ( #16438 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-04-11 12:25:52 +08:00
3e397a9484
check input length of sonnet samples ( #16423 )
...
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com >
2025-04-11 10:15:06 +08:00
268c325078
Fix range_ratio Bug in RandomDataset ( #16126 )
...
Signed-off-by: jadewang21 <jadewangcn@outlook.com >
2025-04-10 15:31:17 -07:00
3cc9af88ff
[TPU][V1] Disable per-request seed/Generator ( #16172 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-10 17:05:44 -04:00
7cd0bd7212
[Bugfix] Fix output token length check logic ( #16419 )
...
Signed-off-by: look <eeslook@163.com >
2025-04-10 20:16:48 +00:00
56d4aefa33
[VLM] Avoid unnecessary dummy multimodal data during processing ( #16416 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-10 19:32:14 +00:00
dd143ef541
[V1] Zero-copy tensor/ndarray serialization/transmission ( #13790 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-10 19:23:14 +00:00
daefed052c
[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B ( #15423 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com >
2025-04-10 19:07:07 +00:00
5fbab20e02
[Bugfix] Fix bug when dataset is json ( #15899 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-04-10 18:35:41 +00:00
e8224f3dca
[V1][Spec Decode] Eagle Model loading ( #16035 )
...
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
2025-04-10 11:21:48 -07:00
9665313c39
[V1] Set structured output backend to auto by default ( #15724 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-10 17:53:26 +00:00
0c54fc7273
Improve configs - ParallelConfig ( #16332 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-10 17:34:37 +00:00
c1b57855ec
[TPU][V1] Use language_model interface for getting text backbone in MM ( #16410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-10 17:32:04 +00:00
83b824c8b4
[VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item ( #16408 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-10 09:06:58 -07:00
7678fcd5b6
Fix the torch version parsing logic ( #15857 )
2025-04-10 07:37:47 -07:00
8661c0241d
[CI] Add auto update workflow for Dockerfile graph ( #11879 )
...
Signed-off-by: wineandchord <guoqizhou19@gmail.com >
2025-04-10 13:43:05 +00:00
ce8d6b75fc
[doc] update the wrong link ( #16401 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-10 21:02:37 +08:00
61de3ef74b
[Model] Remove image mm limit for LLaMa4 ( #16365 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-04-10 09:36:27 +00:00
ec1f9c8c91
Update Numba to 0.61.2 ( #16376 )
...
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-10 07:59:37 +00:00
65e09094c4
[doc] add download model tips ( #16389 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-10 07:45:26 +00:00
c70cf0fe06
[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models ( #16038 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-10 15:08:47 +08:00
a5d11a54dc
[Bugfix] Fix validation error for text-only Mllama 3.2 ( #16377 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-10 14:19:42 +08:00
3d4c87758e
[Misc] Update transformers version limits of multi-modal tests ( #16381 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-09 23:03:33 -07:00
a9bd832fc5
[Model] use AutoWeightsLoader for deepseek_v2, internlm2 ( #16383 )
...
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com >
2025-04-09 23:01:00 -07:00
417bcefbae
fix sonnet dataset sample when prefix len is very small ( #16379 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-04-10 05:35:07 +00:00
baada0e737
[Bugfix][TPU] Fix TPU validate_request ( #16369 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-04-10 12:55:12 +08:00
82eb61dd4c
[misc] use tqdm.auto where appropriate ( #16290 )
...
Signed-off-by: Benjamin Kitor <bkitor@gigaio.com >
2025-04-09 21:54:54 -07:00
0d4d06fe2f
[CI][Bugfix] Pin triton version for CPU ( #16384 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-10 04:35:00 +00:00
4aed0ca6a2
[bugfix] Avoid the time consumption caused by creating dummy videos. ( #16371 )
2025-04-10 04:30:05 +00:00
1621b25288
[TPU] Fix dummy loading OOM ( #16372 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-10 04:06:16 +00:00
a564797151
[Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral ( #16325 )
...
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com >
2025-04-09 20:07:40 -07:00
1da6a09274
[Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer ( #14094 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-09 19:43:09 -07:00
1e44ffc3ff
Add GLM-4-0414 support ( #16338 )
...
Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: yihong <zouzou0208@gmail.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-10 09:19:42 +08:00
a454748544
[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues ( #16275 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-09 18:51:51 -06:00
1bff42c4b7
[Misc] refactor Structured Outputs example ( #16322 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-09 23:32:42 +00:00
cb391d85dc
[Hardware] add platform-specific request validation api ( #16291 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-04-09 12:50:01 -07:00
fee5b8d37f
[Build/CI] Add tracing deps to vllm container image ( #15224 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-09 19:14:06 +00:00
b2ce859bd2
Fix benchmark_throughput.py --backend=hf ( #16352 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-09 19:09:28 +00:00
566f10a929
[CI]Fix hpu docker and numpy version for CI ( #16355 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-04-09 17:52:26 +00:00
c3b5189137
[Bugfix] catch AssertionError in MistralTokenizer as ValueError ( #16344 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-09 17:33:24 +00:00
a25866ac8d
[Bugfix] Fix profiling.py ( #16202 )
...
Signed-off-by: zh Wang <rekind133@outlook.com >
2025-04-09 17:03:34 +00:00
098900d7c2
Revert "Update label-tpu mergify and remove removal bot" ( #16350 )
2025-04-09 07:59:36 -07:00
98d01d3ce2
[Bugfix][Frontend] respect provided default guided decoding backend ( #15476 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-09 05:11:10 -07:00
d55244df31
[Model] Add SupportsMultiModal.get_language_model interface ( #16007 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-09 04:12:54 -07:00
04149cce27
[BugFix] fix some typos found by typos. ( #16314 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-09 03:43:59 -07:00
24834f4894
update neuron config ( #16289 )
...
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com >
2025-04-09 03:43:22 -07:00
ec7da6fcf3
[BugFix] llama4 qknorm should be not shared across head ( #16311 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-04-09 00:59:14 -07:00
819d548e8a
[BugFix] logger is not callable ( #16312 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-09 00:59:02 -07:00
477d2a8aa2
Update label-tpu mergify and remove removal bot ( #16298 )
2025-04-09 07:56:25 +00:00
e484e02857
[Bugfix] Avoid transferring cached multi-modal items from P0 to P1 ( #16273 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-09 00:51:27 -07:00
24f6b9a713
[Misc] Fix test_sharded_state_loader.py( #16004 ) ( #16005 )
...
Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com >
2025-04-09 14:47:30 +08:00
9cdde47289
[BugFix] Fix fusion test and add them to CI ( #16287 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-04-08 23:46:45 -07:00
b1eb4ca152
[TPU] Update PyTorch/XLA ( #16288 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-09 14:46:32 +08:00
87b4ac56c2
[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding ( #16221 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-09 04:14:46 +00:00
cb84e45ac7
[Core] Upgrade to xgrammar 0.1.18, add cache size limit ( #16283 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-08 19:13:22 -07:00
4716377fbc
[Feature] Estimate max-model-len use available KV cache memory ( #16168 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-08 19:12:51 -07:00
4e9cf8c1dd
[Bugfix] fix gettid method is not define ( #16084 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-08 19:12:44 -07:00
2976dc27e9
[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs ( #16198 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com >
2025-04-08 19:12:34 -07:00
102bf967f0
[Model] Add smolvlm support ( #16017 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-08 19:12:17 -07:00
1f4b09b525
Add support to modelopt quantization of Mixtral model ( #15961 )
...
Signed-off-by: Yue <yueshen@nvidia.com >
2025-04-09 01:53:31 +00:00
86c3369eb8
[CI/Build] Fix CI LoRA failure ( #16270 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-09 09:13:56 +08:00
2755c34a8f
[V1] Update structured output offline inference example ( #15721 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-08 22:34:09 +00:00
db10422184
[Bugfix] fix deepseek fp16 scale bug ( #14809 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-08 16:56:09 -04:00
e1a2c699dd
[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context ( #16209 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-08 18:56:51 +00:00
0115ccd5c0
Add warning that content below line in template will be removed ( #16276 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-08 18:18:40 +00:00
40b4284fe3
[Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear ( #15328 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-08 10:02:23 -07:00
4ebc0b9640
[Bugfix] Proper input validation for multi-modal encoder-decoder models ( #16156 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-08 09:45:21 -07:00
dc96fd54c6
[Misc] Avoid stripping meaningful whitespace from nvidia-smi topo -m output in collect_env.py ( #16272 )
...
Signed-off-by: imkero <kerorek@outlook.com >
2025-04-08 16:08:09 +00:00
1f5d13ab9f
[New Model]: jinaai/jina-embeddings-v3 ( #16120 )
2025-04-08 08:39:12 -07:00
90cb44eb02
Update to transformers==4.51.1 ( #16257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-08 06:53:39 -07:00
e11880deea
[Bugfix] Remove triton do_bench fast_flush arg ( #16256 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-04-08 13:51:06 +00:00
9351f91be9
[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm ( #16247 )
...
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
2025-04-08 05:10:26 -07:00
5a1e1c8353
[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe ( #16203 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-08 04:05:47 -07:00
69ecaa7c79
[Misc] Add warning for multimodal data in LLM.beam_search ( #16241 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-08 04:05:27 -07:00
7f00899ff7
[Misc] format and refactor some examples ( #16252 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-08 10:42:32 +00:00
995e3d1f41
[Docs] Add Slides from Singapore Meetup ( #16213 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-04-08 07:20:22 +00:00
b4ac449a83
[Misc] Merge the logs of pp layers partitions ( #16225 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-04-08 00:18:15 -07:00
8e5314a468
[V1] Add disable_chunked_mm_input arg to disable partial mm input prefill ( #15837 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-07 23:24:07 -07:00
87918e40c4
[torch.compile][TPU] Make @support_torch_compile work for XLA backend ( #15782 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-08 14:23:53 +08:00
f6b32efb7f
[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version ( #16194 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-08 13:38:13 +08:00
b99733d092
[Bugfix] Do not skip "empty" parts of chats that are parsable ( #16219 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-08 05:14:15 +00:00
05a015d6a5
Add warning for Attention backends that do not support irope yet ( #16212 )
2025-04-08 03:59:26 +00:00
ad971af8c7
[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 ( #16161 )
2025-04-07 20:48:47 -07:00
f2ebb6f541
[V1] Scatter and gather placeholders in the model runner ( #16076 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
2025-04-08 10:43:41 +08:00
1d01211264
Update BASE_IMAGE to 2.22 release of Neuron ( #16218 )
2025-04-07 19:11:18 -07:00
f94ab12f79
[Misc] Update compressed-tensors to version 0.9.3 ( #16196 )
...
Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com >
2025-04-07 19:09:06 -07:00
a865bc1ca6
[core] do not send error across process ( #16174 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-07 19:09:03 -07:00
21802c4b6d
[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping ( #16031 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
2025-04-07 21:28:14 -04:00
652907b354
Torchao ( #14231 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com >
2025-04-07 19:39:28 -04:00
24f1c01e0f
[Bugfix][V0] XGrammar structured output supports Enum ( #15878 )
...
Signed-off-by: Leon Seidel <leon.seidel@fau.de >
2025-04-07 22:38:25 +00:00
fad6e2538e
[Misc] add description attribute in CLI ( #15921 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-07 22:30:35 +00:00
7f6d47c1a2
[V1][BugFix] Exit properly if engine core fails during startup ( #16137 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-07 15:30:15 -07:00
3147586ebd
[Bugfix] Fix guidance backend for Qwen models ( #16210 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-04-07 22:15:43 +00:00
ed636d99ca
[Misc] Move Llama 4 projector call into encoder execution ( #16201 )
2025-04-07 14:02:05 -07:00
090c856d76
[Misc] Human-readable max-model-len cli arg ( #16181 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-04-07 14:40:58 -04:00
ad434d4cfe
Print the warning only once ( #16193 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-07 18:30:06 +00:00
66d433b94f
[V1] Revert the default max_num_seqs to V0 values for most hardware ( #16158 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-07 13:54:36 -04:00
027b204ff1
[Bugfix] Re-enable support for ChatGLMForConditionalGeneration ( #16187 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-07 23:15:58 +08:00
55dcce91df
Upstream Llama4 Support to Main ( #16113 )
...
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com >
Signed-off-by: Chris Thi <chris.c.thi@gmail.com >
Signed-off-by: drisspg <drisspguessous@gmail.com >
Signed-off-by: Jon Swenson <jmswen@gmail.com >
Signed-off-by: Keyun Tong <tongkeyun@gmail.com >
Signed-off-by: Lu Fang <fanglu@meta.com >
Signed-off-by: Xiaodong Wang <xdwang@meta.com >
Signed-off-by: Yang Chen <yangche@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Lu Fang <lufang@fb.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Lucia Fang <fanglu@fb.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-07 08:06:27 -07:00
8017c8db7f
[Doc]Update image to latest version ( #16186 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-04-07 14:17:39 +00:00
dc3529dbf6
[Misc] improve example mlpspeculator and llm_engine_example ( #16175 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-07 11:53:52 +00:00
7699258ef0
[Model] Add Qwen3 and Qwen3MoE ( #15289 )
...
Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-07 04:06:41 -07:00
e9ba99f296
[V1][Structured Output] Add supports_structured_output() method to Platform ( #16148 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-07 11:06:24 +00:00
7c80368710
[VLM] Florence-2 supports online serving ( #16164 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-07 04:04:02 -07:00
95d63f38c0
doc: fix some typos in doc ( #16154 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-07 05:32:06 +00:00
bb8dab821e
[CI] Set max transformers version for Ultravox model test ( #16149 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-07 04:37:58 +00:00
fc0f87768a
[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings ( #16129 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-07 04:07:15 +00:00
0a57386721
[Misc] Update Mistral-3.1 example ( #16147 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-07 03:57:37 +00:00
3749e28774
[V1][Minor] Minor simplification for get_computed_blocks ( #16139 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-06 20:38:12 -07:00
86fc2321ff
[Metrics] Add bucket for request_latency, time_to_first_token and time_per_output_token ( #15202 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-04-06 20:34:51 -07:00
2549c0dfef
Fix requires-python ( #16132 )
2025-04-06 19:22:25 -07:00
b10e519895
[V1][Minor] Optimize get_cached_block ( #16135 )
2025-04-06 20:48:14 +00:00
9bde5ba127
[TPU] Update PyTorch/XLA ( #16130 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-06 18:25:55 +00:00
72c8f1ad04
[Misc] update requires-python in pyproject.toml ( #16116 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-06 14:56:34 +00:00
da224daaa9
[Bugfix] add hf_token to EngineArgs ( #16093 )
...
Signed-off-by: paolovic <paul-philipp.luley@uzh.ch >
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch >
2025-04-06 14:47:33 +00:00
3a100b9278
[Bugfix] LoRA : Fix the order in which the kernels process LoRAs ( #16040 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-04-06 14:04:50 +00:00
242a637aea
[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 ( #16103 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-06 05:52:01 -07:00
c2a9671510
[Misc] Improve model redirect to accept json dictionary ( #16119 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-06 05:51:45 -07:00
d5ae4f7f42
[Doc][Bugfix] Add missing EOF in k8s deploy doc ( #16025 )
2025-04-06 12:10:57 +00:00
b6c502a150
[Misc] refactor example eagle ( #16100 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-06 09:42:48 +00:00
9ca710e525
[CI][V1] Fix passing tokenizer as kwarg to validate_guidance_grammar ( #16117 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-06 16:18:00 +08:00
eb07c8cb5b
[Frontend] Fix typo in tool chat templates for llama3.2 and toolace ( #14501 )
...
Signed-off-by: Ben Jackson <ben@ben.com >
2025-04-06 07:44:36 +00:00
ba10801961
[Benchmark] Add sampling parameters to benchmark_serving. ( #16022 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
2025-04-06 12:30:35 +08:00
620fc2d09e
[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 ( #16112 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-04-05 21:23:40 -07:00
29283eaa7e
[Model] use AutoWeightsLoader for phi, gemma, deepseek ( #16088 )
...
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com >
2025-04-05 20:34:38 -07:00
2fa66ef713
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine ( #15946 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-04-05 20:04:22 -07:00
13affc432d
[Misc] Remove redundant code ( #16098 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-05 20:03:50 -07:00
d8f094a92a
[Misc] format output for encoder_decoder.py ( #16095 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-05 19:57:18 -07:00
97ae6d777f
Fix some capitalisations in generated examples doc titles ( #16094 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-05 13:44:03 +00:00
6baeee70d1
Revert "doc: add info for macos clang errors ( #16049 )" ( #16091 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-05 11:51:51 +00:00
d2517a4939
[doc] fix 404 ( #16082 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-05 11:39:18 +00:00
6342adc438
fix: support clang17 for macos and fix the real libomp ( #16086 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-05 11:00:12 +00:00
0adba91547
[CI] Fix benchmark script level ( #16089 )
2025-04-05 03:36:01 -07:00
4285e423a6
[Misc] Auto detect bitsandbytes pre-quantized models ( #16027 )
...
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com >
2025-04-04 23:30:45 -07:00
63375f0cdb
[V1][Spec Decode] Update N-gram Proposer Interface ( #15750 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-04 16:32:54 -07:00
70ad3f9e98
[Bugfix][TPU] Fix V1 TPU worker for sliding window ( #16059 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-04-04 23:31:19 +00:00
d6fc629f4d
[Kernel][Minor] Re-fuse triton moe weight application ( #16071 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-04-04 23:27:34 +00:00
af51d80fa1
Revert "[V1] Scatter and gather placeholders in the model runner" ( #16075 )
2025-04-04 14:50:57 -07:00
f5722a5052
[V1] Scatter and gather placeholders in the model runner ( #15712 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-04-04 21:26:44 +00:00
651cf0fec1
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue ( #15906 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-04 12:56:43 -07:00
4dc52e1c53
[CI] Reorganize .buildkite directory ( #16001 )
...
Signed-off-by: kevin <kevin@anyscale.com >
2025-04-04 12:16:20 -07:00
4708f13a9c
[Bugfix] Fix default behavior/fallback for pp in v1 ( #16057 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-04 17:58:08 +00:00
a6d042df0a
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917 , but for ROCm only ( #15413 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-04 09:40:37 -07:00
40a36ccfeb
[ROCm][Bugfix] Use platform specific FP8 dtype ( #15717 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-04 09:40:20 -07:00
ef608c37a7
[Distributed] [ROCM] Fix custom allreduce enable checks ( #16010 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com >
2025-04-04 09:39:08 -07:00
2386803f2a
[CPU] Change default block_size for CPU backend ( #16002 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-04-04 09:39:05 -07:00
95862f7b4d
[Benchmark][Doc] Update throughput benchmark and README ( #15998 )
...
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-04-04 09:39:02 -07:00
230b131b54
[Bugfix][kernels] Fix half2float conversion in gguf kernels ( #15995 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-04 09:38:58 -07:00
0812d8dd41
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe ( #15945 )
...
Signed-off-by: zhenwei <zhenweiliu@habana.ai >
2025-04-04 09:38:55 -07:00
bf7e3c51ae
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt ( #15939 )
...
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com >
2025-04-04 09:38:52 -07:00
a35a8a8392
[V1][Spec Decode] Avoid logging useless nan metrics ( #16023 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-04-04 08:52:41 -07:00
4ef0bb1fcf
doc: add info for macos clang errors ( #16049 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-04 14:58:16 +00:00
fadc59c0e6
[TPU][V1] Remove ragged attention kernel parameter hard coding ( #16041 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-04 07:48:50 -04:00
86cbd2eee9
[Misc] improve gguf check ( #15974 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-04 01:33:36 +00:00
092475f738
[ROCm] Tweak the benchmark script to run on ROCm ( #14252 )
2025-04-03 17:12:48 -07:00
dcc56d62da
[Bugfix] Fix function names in test_block_fp8.py ( #16033 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-04-03 23:01:34 +00:00
f15e70d906
[TPU] Switch Test to Non-Sliding Window ( #15981 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-04-03 14:28:45 -07:00
b6be6f8d1e
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. ( #15732 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-04-03 14:23:28 -07:00
03a70eacaf
Re-enable the AMD Testing for the passing tests. ( #15586 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-04-03 11:05:17 -07:00
45b1ff7a25
[Misc][Performance] Advance tpu.txt to the most recent nightly torch … ( #16024 )
2025-04-03 17:32:54 +00:00
15ba07ef25
[Minor] Fused experts refactor ( #15914 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-04-03 10:19:38 -07:00
d2b58ca203
[Neuron][kernel] Fuse kv cache into a single tensor ( #15911 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-04-03 09:51:32 -07:00
82e7e19a6e
[SupportsQuant] Chameleon, Chatglm, Commandr ( #15952 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-04-03 08:25:22 -07:00
421c462948
[SupportsQuant] Bert, Blip, Blip2, Bloom ( #15573 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-04-03 08:23:19 -07:00
84884cd9ac
fix: tiny fix make format.sh excutable ( #16015 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-03 15:18:05 +00:00
a43aa183dc
[doc] update contribution link ( #15922 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-03 10:47:31 +00:00
463bbb1835
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process ( #15367 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-04-03 07:32:10 +00:00
5e125e74d1
[misc] improve error message for "Failed to infer device type" ( #15994 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-03 14:45:03 +08:00
06f21ce7a5
[Benchmark] Add AIMO Dataset to Benchmark ( #15955 )
...
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com >
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com >
2025-04-03 06:09:18 +00:00
57a810db9c
[ROCM][V0] PA kennel selection when no sliding window provided ( #15982 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-04-03 05:28:44 +00:00
8b664706aa
[bugfix] add seed in torchrun_example.py ( #15980 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-03 12:25:01 +08:00
37bfee92bf
fix: better error message for get_config close #13889 ( #15943 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-03 03:53:19 +00:00
e73ff24e31
[ROCM][KERNEL] Paged attention for V1 ( #15720 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com >
2025-04-02 19:48:00 -07:00
bd7599d34a
[V1][TPU] Do not compile sampling more than needed ( #15883 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-03 01:36:01 +00:00
01b6113659
[TPU] optimize the all-reduce performance ( #15903 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-03 00:25:14 +00:00
1b84eff03a
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). ( #15736 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b .c.tpu-prod-env-large-adhoc.internal>
2025-04-02 17:18:08 -07:00
55acf86bf8
Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] ( #15969 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-02 23:37:30 +00:00
f021b97993
[V1] Support Mistral3 in V1 ( #15950 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-02 15:36:24 -07:00
1cab43c2d2
[misc] instruct pytorch to use nvml-based cuda check ( #15951 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-03 01:02:58 +08:00
8bd651b318
Restricted cmake to be less than version 4 as 4.x breaks the build of… ( #15859 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2025-04-02 16:19:39 +00:00
58e234a754
[Misc] V1 LoRA support CPU offload ( #15843 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-02 23:04:43 +08:00
e86c414d6a
[Model] use AutoWeightsLoader in model load_weights ( #15770 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-02 07:47:31 -07:00
550b2801ad
[CPU][Bugfix] Using custom allreduce for CPU backend ( #15934 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-04-02 07:46:47 -07:00
cefb9e5a28
[Frontend] Implement Tool Calling with tool_choice='required' ( #13483 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at >
Co-authored-by: Liangfu Chen <liangfc@amazon.com >
Co-authored-by: mgoin <michael@neuralmagic.com >
2025-04-02 07:45:45 -07:00
98d7367b61
[Metrics] Hide deprecated metrics ( #15458 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-04-02 07:37:19 -07:00
594a8b9030
[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. ( #15938 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-02 06:33:52 -07:00
44f990515b
[CI] Remove duplicate entrypoints-test ( #15940 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-04-02 02:44:01 -07:00
252937806c
[Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key ( #15926 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-04-02 02:19:35 -07:00
51826d51fa
Add minimum version for huggingface_hub to enable Xet downloads ( #15873 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-02 02:03:36 -07:00
14e53ed11f
[V1] Fix json_object support with xgrammar ( #15488 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-02 02:00:08 -07:00
ddb94c2605
[core] Add tags parameter to wake_up() ( #15500 )
...
Signed-off-by: Eric <erictang000@gmail.com >
2025-04-02 01:59:27 -07:00
90969fb39a
[Kernel] Add more dtype support for GGUF dequantization ( #15879 )
...
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com >
2025-04-02 01:58:48 -07:00
101f1481f9
[Build/CI] Update lm-eval to 0.4.8 ( #15912 )
...
Signed-off-by: Chris Thi <chris.c.thi@gmail.com >
2025-04-02 01:47:57 -07:00
2edc87b161
[Bugfix] Fix cache block size calculation for CPU MLA ( #15848 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-04-02 01:45:02 -07:00
4203926f10
[CI/Build] Further clean up LoRA tests ( #15920 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-02 01:39:09 -07:00
cdb57015a7
[Misc] Replace print with logger ( #15923 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-02 01:37:38 -07:00
aa557e6422
[Benchmark]Fix error message ( #15866 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2025-04-02 01:32:24 -07:00
0e00d40e4f
[V1][Bugfix] Fix typo in MoE TPU checking ( #15927 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-01 23:46:42 -07:00
c920e01242
[Doc] Update rocm.inc.md ( #15917 )
...
Signed-off-by: chun37 <chun.jb.37@gmail.com >
2025-04-01 23:38:26 -07:00
274d8e8818
[V1][Minor] Enhance SpecDecoding Metrics Log in V1 ( #15902 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-01 23:38:02 -07:00
2039c6305b
[Bugfix] Fix imports for MoE on CPU ( #15841 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-04-02 03:33:55 +00:00
6efb195a6e
[V1] Fix: make sure k_index is int64 for apply_top_k_only ( #15907 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-04-01 19:06:44 -07:00
24b7fb455a
[Spec Decode] Fix input triton kernel for eagle ( #15909 )
2025-04-01 18:15:14 -07:00
58f5a59769
[Docs] Add Intel as Sponsor ( #15913 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-04-01 17:16:55 -07:00
db9dfcfa6a
[Docs] Add Ollama meetup slides ( #15905 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-04-01 13:58:59 -07:00
9ef98d527e
[Model][MiniMaxText01] Support MiniMaxText01 model inference ( #13454 )
...
Signed-off-by: qscqesze <475517977@qq.com >
Co-authored-by: qingjun <qingjun@minimaxi.com >
Co-authored-by: qscqesze <475517977@qq.com >
2025-04-01 16:23:55 -04:00
93491aefc7
[BugFix] make sure socket close ( #15875 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-01 13:10:24 -07:00
7acd539cd7
[Docs] update usage stats language ( #15898 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-04-01 12:54:13 -07:00
e75a6301bd
[V1][Spec Decode] Implement Eagle Proposer [1/N] ( #15729 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-01 12:33:16 -07:00
a79cc68b3a
[V1][Metrics] Initial speculative decoding metrics ( #15151 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-04-01 10:45:04 -07:00
7e3f7a4ee7
[CI] Disable flaky structure decoding test temporarily. ( #15892 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-01 17:42:34 +00:00
9ec8257914
[Model] Add module name prefixes to gemma3 ( #15889 )
...
Signed-off-by: Bartholomew Sabat <bartek@recursal.ai >
Co-authored-by: Bartholomew Sabat <bartek@recursal.ai >
2025-04-01 10:13:40 -07:00
38327cf454
[Model] Aya Vision ( #15441 )
...
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-04-01 16:30:43 +00:00
dfa82e2a3d
[CI/Build] Clean up LoRA tests ( #15867 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-01 16:28:50 +00:00
e59ca942f5
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. ( #13932 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-04-01 12:07:43 -04:00
a57a3044aa
[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork ( #15820 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-01 08:56:39 -07:00
4e5a0f6ae2
[Misc] Allow using OpenCV as video IO fallback ( #15055 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-01 15:55:13 +00:00
b63bd14999
Reinstate format.sh and make pre-commit installation simpler ( #15890 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-01 15:41:30 +00:00
2041c0e360
[Doc] Quark quantization documentation ( #15861 )
...
Signed-off-by: chaow <chaow@amd.com >
2025-04-01 08:32:45 -07:00
085cbc4f9f
[New Model]: jinaai/jina-reranker-v2-base-multilingual ( #15876 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-01 08:32:26 -07:00
2b93162fb0
Remove format.sh as it's been unsupported >70 days ( #15884 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-01 22:27:46 +08:00
2e45bd29fe
[Misc] remove unused script ( #15746 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-01 13:58:05 +00:00
51d7c6a2b2
[Model] Support Mistral3 in the HF Transformers format ( #15505 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-01 06:10:05 -07:00
f3aca1ee30
setup correct nvcc version with CUDA_HOME ( #15725 )
...
Signed-off-by: Yang Chen <yangche@fb.com >
2025-04-01 06:09:40 -07:00
8dd41d6bcc
[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE ( #15831 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-01 06:07:53 -07:00
0a298ea418
[Bugfix] Fix no video/image profiling edge case for MultiModalDataParser ( #15828 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-01 18:17:11 +08:00
d330558bab
[Docs] Fix small error in link text ( #15868 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-01 10:05:14 +00:00
656fd72976
[Misc] Fix speculative config repr string ( #15860 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-04-01 02:26:22 -07:00
79455cf421
[Misc] Enable V1 LoRA by default ( #15320 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-04-01 16:53:56 +08:00
30d6a015e0
[Feature] specify model in config.yaml ( #15798 )
...
Signed-off-by: weizeng <weizeng@roblox.com >
2025-04-01 01:20:06 -07:00
8af5a5c4e5
fix: can not use uv run collect_env close #13888 ( #15792 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-01 07:45:49 +00:00
3a5f0afcd2
[V1] Implement sliding window attention in kv_cache_manager ( #14097 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-01 00:33:17 -07:00
c7e63aa4d8
[ROCm] Use device name in the warning ( #15838 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-01 00:10:48 -07:00
4a9ce1784c
[sleep mode] clear pytorch cache after sleep ( #15248 )
...
Signed-off-by: <villard@us.ibm.com >
2025-03-31 22:58:58 -07:00
7e4e709b43
[V1] TPU - Fix fused MOE ( #15834 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-31 22:58:07 -07:00
63d8eabed0
[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding ( #15824 )
...
Signed-off-by: alexwl <alexey.a.kiryushin@gmail.com >
2025-03-31 22:57:59 -07:00
e830b01383
[Bugfix] Fix extra comma ( #15851 )
...
Signed-off-by: haochengxia <xhc_1007@163.com >
2025-03-31 22:57:28 -07:00
ff6473980d
[Bugfix][Model] fix mllama multi-image ( #14883 )
...
Signed-off-by: yan ma <yan.ma@intel.com >
2025-03-31 22:53:37 -07:00
a164aea35d
[Frontend] Add Phi-4-mini function calling support ( #14886 )
...
Signed-off-by: Kinfey <kinfeylo@microsoft.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-03-31 22:50:05 -07:00
a76f547e11
Rename fallback model and refactor supported models section ( #15829 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-31 22:49:41 -07:00
b7b7676d67
[Distributed] Add custom allreduce support for ROCM ( #14125 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com >
2025-03-31 22:49:12 -07:00
e6e3c55ef2
Move dockerfiles into their own directory ( #14549 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-31 13:47:32 -07:00
f98a4920f9
[V1][Core] Remove unused speculative config from scheduler ( #15818 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-31 19:15:21 +00:00
d4bfc23ef0
Fix Transformers backend compatibility check ( #15290 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-31 10:27:07 -07:00
9a2160fa55
[V1] TPU CI - Add basic perf regression test ( #15414 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-31 13:25:20 -04:00
2de4118243
fix: change GB to GiB in logging close #14979 ( #15807 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-03-31 10:00:50 -07:00
239b7befdd
[V1][Spec Decode] Remove deprecated spec decode config params ( #15466 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-03-31 09:19:35 -07:00
09e974d483
[Bugfix] Check dimensions of multimodal embeddings in V1 ( #15816 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-31 09:01:35 -07:00
e5ef4fa99a
Upgrade transformers to v4.50.3 ( #13905 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-31 08:59:37 -07:00
037bcd942c
[Bugfix] Fix missing return value in load_weights method of adapters.py ( #15542 )
...
Signed-off-by: noc-turne <2270929247@qq.com >
2025-03-31 06:56:42 -07:00
c2e7507ad4
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats ( #15813 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-03-31 13:23:53 +00:00
3aa2b6a637
[Model] Update support for NemotronNAS models ( #15008 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com >
2025-03-31 20:35:14 +08:00
555aa21905
[V1] Fully Transparent Implementation of CPU Offloading ( #15354 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-31 20:22:34 +08:00
e7ae3bf3d6
fix: better install requirement for install in setup.py ( #15796 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-03-31 05:13:32 -07:00
b932c048ac
Recommend developing with Python 3.12 in developer guide ( #15811 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-31 11:54:49 +00:00
e85829450d
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm ( #15050 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-03-31 04:42:18 -07:00
effc5d24fa
[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup ( #15748 )
...
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
2025-03-31 15:38:58 +08:00
18ed3132d2
[Misc] update the comments ( #15780 )
...
Signed-off-by: chengyang liu <lcy4869@gmail.com >
Co-authored-by: chengyang liu <lcy4869@gmail.com >
2025-03-30 19:39:56 -07:00
9b459eca88
[V1][Scheduler] Avoid calling _try_schedule_encoder_inputs for every request ( #15778 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-30 14:10:42 -07:00
70fedd0f79
fix: Comments to English for better dev experience ( #15768 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-03-30 10:47:57 -07:00
bb103b29bf
[Bugfix] Added embed_is_patch mask for fuyu model ( #15731 )
...
Signed-off-by: Kyle Huang <kylhuang@nvidia.com >
2025-03-30 03:45:08 -07:00
248e76c4df
fix: lint fix a ruff checkout syntax error ( #15767 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-03-30 03:36:02 -07:00
803d5c35f3
[V1] Override mm_counts for dummy data creation ( #15703 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-30 03:20:42 -07:00
7fd8c0f85c
fix test_phi3v ( #15321 )
...
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com >
2025-03-30 02:01:34 -07:00
44c3a5abc3
[doc] update conda to usage link in installation ( #15761 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-30 08:12:13 +00:00
6909a76201
[Bugfix] Fix Mistral guided generation using xgrammar ( #15704 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-03-29 20:20:19 -07:00
045533716b
[CI] xgrammar structured output supports Enum. ( #15757 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-29 20:20:02 -07:00
3c0ff914ac
[Bugfix] Fix Mllama interleaved images input support ( #15564 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-03-29 18:11:15 +00:00
2bc4be4e32
[V1][Minor] Simplify rejection sampler's parse_output ( #15741 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-29 09:25:17 -07:00
c67abd614f
[V1] Support interleaved modality items ( #15605 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-29 06:30:09 -07:00
6fa7cd3dbc
[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore ( #12957 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-03-29 04:01:46 -07:00
94744ba41a
[V1] [Feature] Collective RPC ( #15444 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-03-29 03:39:14 -07:00
4965ec42d2
[FEAT] [ROCm] Add AITER int8 scaled gemm kernel ( #15433 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-03-29 03:33:56 -07:00
73aa7041bf
[doc] update doc ( #15740 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-29 04:27:22 +00:00
7c1f760024
[Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 ( #15659 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-03-28 21:13:15 -07:00
da461f3cbf
[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K ( #15714 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-28 21:13:06 -07:00
5b800f0932
[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server ( #15700 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-03-28 21:12:26 -07:00
8427f70493
Use numba 0.61 for python 3.10+ to support numpy>=2 ( #15692 )
...
Signed-off-by: cyy <cyyever@outlook.com >
2025-03-29 12:11:51 +08:00
7a7992085b
[CI] Speed up V1 structured output tests ( #15718 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-28 21:10:45 -07:00
1286211f57
[Bugfix] LoRA V1: add and fix entrypoints tests ( #15715 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-28 21:10:41 -07:00
6d531ad7b8
[Misc][V1] Misc code streamlining ( #15723 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-28 20:59:47 -07:00
762b424a52
[Docs] Document v0 engine support in reasoning outputs ( #15739 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
2025-03-29 03:46:57 +00:00
de1cb38769
[Model] Support Skywork-R1V ( #15397 )
...
Signed-off-by: jiacai.liu <932997367@qq.com >
Co-authored-by: jiacai.liu <932997367@qq.com >
2025-03-28 20:39:21 -07:00
c802f5430d
[ROCm][AMD][Build] Update AMD supported arch list ( #15632 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-03-28 20:39:18 -07:00
cff8991a50
[Docs][V1] Optimize diagrams in prefix caching design ( #15716 )
2025-03-29 03:33:58 +00:00
f3f8d8fff4
implement prometheus fast-api-instrumentor for http service metrics ( #15657 )
2025-03-29 00:12:02 +00:00
26df46ee59
[Misc] cli auto show default value ( #15582 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
2025-03-28 22:23:00 +00:00
c3f687ac22
[V1] TPU - Fix the chunked prompt bug ( #15713 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-28 20:19:04 +00:00
04437e313d
[Bugfix] [torch.compile] Add Dynamo metrics context during compilation ( #15639 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-03-28 14:01:09 -06:00
038bededba
[TPU] [Perf] Improve Memory Usage Estimation ( #15671 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-03-28 17:37:52 +00:00
d03308be0c
[Misc] Remove stale func in KVTransferConfig ( #14746 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-03-28 17:33:32 +00:00
c6bc0034d0
[Misc] Remove unused utils and clean up imports ( #15708 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-28 09:41:16 -07:00
70e132244a
[Minor] Remove TGI launching script ( #15646 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-28 09:30:08 -07:00
47e9038d23
Fix cpu offload testing for gptq/awq/ct ( #15648 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-29 00:29:32 +08:00
432cf22a6a
[Bugfix] Fix regex compile display format ( #15368 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-03-28 08:58:44 -07:00
2914006fe0
[doc] add missing imports ( #15699 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-28 15:56:48 +00:00
7329ff5468
[V1] Support disable_any_whtespace for guidance backend ( #15584 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-28 23:46:45 +08:00
541d1df486
[Bugfix] embed_is_patch for Idefics3 ( #15696 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-28 08:27:52 -07:00
3b00ff9138
[Bugfix][v1] xgrammar structured output supports Enum. ( #15594 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-28 06:14:53 -07:00
91276c5721
[Model] Adding torch compile annotations to chatglm ( #15624 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-28 21:14:09 +08:00
0b4167526d
[Docs] Add "Generation quality changed" section to troubleshooting ( #15701 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-28 13:03:21 +00:00
fd5fd26902
[Frontend] update priority for --api-key and VLLM_API_KEY ( #15588 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-28 19:40:12 +08:00
3bbaacbe15
[Bugfix][Frontend] Eliminate regex based check in reasoning full generator ( #14821 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
2025-03-28 11:20:35 +00:00
a10314c6b3
[Misc] Fix test_sleep to use query parameters ( #14373 )
...
Signed-off-by: Lize Cai <lize.cai@sap.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-28 18:00:14 +08:00
70f2c2a709
[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' ( #15674 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-28 17:10:40 +08:00
280d074103
[CPU][CI] Improve CPU Dockerfile ( #15690 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-03-28 01:36:31 -07:00
32b14baf8a
[Refactor][Frontend] Keep all logic about reasoning into one class ( #14428 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
2025-03-28 00:23:30 -07:00
2d9045fce8
[TPU][CI] Fix TPUModelRunner Test ( #15667 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-03-28 00:01:26 -07:00
355f66348c
[V1] Remove legacy input registry ( #15673 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 23:34:34 -07:00
8693e47e6a
[Bugfix] Fix mm_hashes forgetting to be passed ( #15668 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-28 05:51:05 +00:00
cec8c7d7f8
Refactor error handling for multiple exceptions in preprocessing ( #15650 )
...
Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com >
2025-03-28 03:27:20 +00:00
4d0ec37267
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 ( #14578 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-03-28 02:58:16 +00:00
e7f720ea56
[Misc]add coding benchmark for speculative decoding ( #15303 )
...
Signed-off-by: CXIAAAAA <cxia0209@gmail.com >
2025-03-28 10:47:05 +08:00
4ae17bf1e2
Revert "Use Cache Hinting for fused_moe kernel ( #15511 )" ( #15645 )
...
Signed-off-by: Wes Medford <wryanmedford@gmail.com >
2025-03-27 19:45:55 -07:00
8a49eea74b
[CI][TPU] Temporarily Disable Quant Test on TPU ( #15649 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-27 19:45:05 -07:00
b4245a48df
[Doc] Fix dead links in Job Board ( #15637 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-03-28 02:43:40 +00:00
4e0f6076be
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. ( #14948 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-28 10:13:41 +08:00
726efc6a32
[Quantization][V1] BitsAndBytes support V1 ( #15611 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-28 10:12:47 +08:00
bd45912b99
[TPU] Lazy Import ( #15656 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-28 09:57:01 +08:00
15dac210f0
[V1] AsyncLLM data parallel ( #13923 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-27 16:14:41 -07:00
112b3e5b3b
[CI] Update rules for applying tpu label. ( #15634 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-27 22:15:26 +00:00
32d669275b
Correct PowerPC to modern IBM Power ( #15635 )
...
Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com >
2025-03-27 15:04:32 -07:00
4098b72210
[Bugfix][TPU][V1] Fix recompilation ( #15553 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-27 19:15:06 +00:00
46450b8d33
Use absolute placement for Ask AI button ( #15628 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-27 18:52:18 +00:00
13ac9cab21
[Misc] Avoid direct access of global mm_registry in compute_encoder_budget ( #15621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 17:52:00 +00:00
66aa4c0bf4
[Feature] Add middleware to log API Server responses ( #15593 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-03-27 17:49:38 +00:00
247181536f
[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs ( #15620 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 17:36:32 +00:00
07bf813fb5
[Doc] Link to onboarding tasks ( #15629 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 16:30:53 +00:00
8958217ad5
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 ( #15211 )
...
Signed-off-by: h-sugi <h.sugi@ieee.org >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-27 22:29:29 +08:00
ac5bc615b0
[Model] MiniCPM-V/O supports V1 ( #15487 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 06:07:29 -07:00
8063dfc61a
[Doc] update --system for transformers installation in docker doc ( #15616 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-27 20:38:46 +08:00
6278bc829e
Fix incorrect filenames in vllm_compile_cache.py ( #15494 )
...
Signed-off-by: <zou3519@gmail.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-27 18:33:41 +08:00
3f532cb6a6
[Misc] Use model_redirect to redirect the model name to a local folder. ( #14116 )
2025-03-27 02:21:23 -07:00
e6c9053f9e
[Misc] Clean up scatter_patch_features ( #15559 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 07:45:00 +00:00
43ed4143c4
[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM ( #15587 )
...
Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: ElizaWszola <eliza@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com >
2025-03-27 06:47:25 +00:00
f4c98b4d4c
[Misc] Consolidate LRUCache implementations ( #15481 )
...
Signed-off-by: Bella kira <2374035698@qq.com >
2025-03-27 06:43:43 +00:00
e1e0fd7543
[TPU] Avoid Triton Import ( #15589 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-27 06:43:02 +00:00
df8d3d1287
[Misc] Restrict ray version dependency and update PP feature warning in V1 ( #15556 )
2025-03-27 06:21:07 +00:00
619d3de8bd
[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS ( #15583 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-03-26 22:46:26 -07:00
ecff8309a3
[ROCm] Env variable to trigger custom PA ( #15557 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-03-26 22:46:12 -07:00
dcf2a590f5
Allow torchao quantization in SiglipMLP ( #15575 )
2025-03-26 22:45:51 -07:00
54aa619459
[V1] Refactor num_computed_tokens logic ( #15307 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-27 04:54:36 +00:00
fb22be5817
[moe][quant] add weight name case for offset ( #15515 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-03-27 04:50:29 +00:00
7f301dd8ef
[Doc] Update V1 user guide for fp8 kv cache support ( #15585 )
...
Signed-off-by: weizeng <weizeng@roblox.com >
2025-03-26 19:39:03 -07:00
8095341a01
[misc] LoRA: Remove unused long context test data ( #15558 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-27 10:04:51 +08:00
69db16a46a
add platform check back ( #15578 )
...
Signed-off-by: Chenyaaang <llccyy1212@gmail.com >
2025-03-27 01:50:27 +00:00
ce78f9af4e
Add automatic tpu label to mergify.yml ( #15560 )
2025-03-26 21:39:58 -04:00
9239bf718e
[Kernel] CUTLASS grouped gemm fp8 MoE kernel ( #13972 )
...
Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com >
2025-03-27 00:54:44 +00:00
7a6d45bc8a
Support FIPS enabled machines with MD5 hashing ( #15299 )
...
Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com >
2025-03-26 20:19:46 -04:00
e74ff409e0
[TPU] support disabling xla compilation cache ( #15567 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-03-27 00:09:28 +00:00
7a888271f5
Use Cache Hinting for fused_moe kernel ( #15511 )
2025-03-26 23:21:34 +00:00
9d119a86ae
[V1] TPU CI - Fix test_compilation.py ( #15570 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-26 21:51:54 +00:00
b2e85e26f4
[V1] TPU - Revert to exponential padding by default ( #15565 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-26 21:35:05 +00:00
dd8a29da99
Applying some fixes for K8s agents in CI ( #15493 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-03-26 20:35:11 +00:00
27df5199d9
Support SHA256 as hash function in prefix caching ( #15297 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com >
2025-03-26 11:11:28 -07:00
35fad35a48
[V1][Sampler] Faster top-k only implementation ( #15478 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-26 10:56:47 -07:00
733e7c9e95
[Refactor] Remove unnecessary backend parameter in structured output interface ( #15317 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-26 17:51:56 +00:00
0af4d764d6
Fix weight loading for some models in Transformers backend ( #15544 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-26 10:17:53 -07:00
e64afa455c
multi-node offline DP+EP example ( #15484 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-26 23:54:24 +08:00
1711b929b6
[Model] Add Reasoning Parser for Granite Models ( #14202 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com >
Co-authored-by: Joe Runde <joe@joerun.de >
2025-03-26 14:28:07 +00:00
c091c0a588
Improve validation of TP in Transformers backend ( #15540 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-26 07:26:48 -07:00
1aa162e030
Apply torchfix ( #15532 )
...
Signed-off-by: cyy <cyyever@outlook.com >
2025-03-26 12:09:06 +00:00
cf5c8f1686
Separate base model from TransformersModel ( #15467 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-03-26 18:13:38 +08:00
4ec2cee000
[Misc] improve example script output ( #15528 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-26 10:12:47 +00:00
99f536f830
[Misc] Enhance warning information to user-defined chat template ( #15408 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-03-26 02:21:15 -07:00
5ebf66748b
[FEAT][ROCm] Integrate Fused MoE Kernels from AITER ( #14967 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-03-26 16:30:30 +08:00
781d056280
[Feature] Enhance EAGLE Architecture with Proper RMS Norms ( #14990 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-26 08:24:07 +00:00
5aefd6ac31
Fix raw_request extraction in load_aware_call decorator ( #15382 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2025-03-25 22:29:54 -07:00
6c663dfd5e
[misc] LoRA - Skip LoRA kernels when not required ( #15152 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-26 11:33:45 +08:00
33437bc6e7
[BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) ( #15492 )
...
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com >
2025-03-25 20:33:22 -07:00
23114d3364
[Misc] Warn about v0 in benchmark_paged_attn.py ( #15495 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-25 20:31:04 -07:00
997c8811d6
[Model] Support multi-image for Molmo ( #15438 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-26 11:26:33 +08:00
e42389f9d7
Transformers backend already supports V1 ( #15463 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-25 20:26:16 -07:00
ff38f0a32c
[CI/Build] LoRA: Delete long context tests ( #15503 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-25 17:18:34 -07:00
a5cfbab3c8
[Core] LoRA: V1 Scheduler optimization ( #15422 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-25 22:50:09 +00:00
ac3cd6e83c
[core] add bucket padding to tpu_model_runner ( #14995 )
...
Signed-off-by: Chenyaaang <llccyy1212@gmail.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-25 17:27:22 -04:00
082ab86f5f
[V1] Support long_prefill_token_threshold in v1 scheduler ( #15419 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-03-25 14:22:26 -07:00
6aa196c8dc
[V1][Minor] Use SchedulerInterface type for engine scheduler field ( #15499 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-25 14:21:36 -07:00
a0dd7dcd49
[TPU][V1] Fix Sampler recompilation ( #15309 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-25 16:43:54 -04:00
e977c11111
Add workaround for shared field_names in pydantic model class ( #13925 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-03-25 20:31:08 +00:00
5f063a80bd
[bugfix] add supports_v1 platform interface ( #15417 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-03-25 15:00:32 -04:00
5d8e1c9279
[Bugfix] Support triton==3.3.0+git95326d9f for RTX 5090 (Unsloth + vLLM compatibility) ( #15471 )
...
Co-authored-by: ServerAI <ai@exc-mad-ai.com >
2025-03-25 17:59:25 +00:00
0a049c7d86
[CI/Build] Add tests for the V1 tpu_model_runner. ( #14843 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-03-25 12:27:16 -04:00
d0cfec7ab9
[bugfix] fix inductor cache on max_position_embeddings ( #15436 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-25 07:05:39 -07:00
a608160027
[Kernel] Fix conflicting macro names for gguf kernels ( #15456 )
...
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com >
2025-03-25 13:50:49 +00:00
3f04a7fbf2
[Doc] Update V1 user guide for multi-modality ( #15460 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-25 11:01:58 +00:00
5994430b84
[Misc] Remove redundant num_embeds ( #15443 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-25 18:27:57 +08:00
a9e879b316
[Misc] Clean up MiniCPM-V/O code ( #15337 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-25 10:22:52 +00:00
3e2f37a69a
Dockerfile.ppc64le changes to move to UBI ( #15402 )
...
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
2025-03-25 10:15:14 +00:00
4f044b1d67
[Kernel][CPU] CPU MLA ( #14744 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-03-25 09:34:59 +00:00
4157f563b4
[Hardware][TPU][Bugfix] Fix v1 mp profiler ( #15409 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-03-25 01:43:00 -07:00
051da7efe3
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 ( #15160 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Richard Barnes <rbarnes@meta.com >
2025-03-25 15:36:45 +08:00
25f560a62c
[V1][Spec Decode] Update target_logits in place for rejection sampling ( #15427 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-24 21:04:41 -07:00
a09ad90a72
[V1] guidance backend for structured output + auto fallback mode ( #14779 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com >
Co-authored-by: Michal Moskal <michal@moskal.me >
2025-03-24 21:02:33 -07:00
10b34e36b9
[Bugfix] Fixed the issue of not being able to input video and image simultaneously ( #15387 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-25 03:48:08 +00:00
b5269db959
Revert "Fix non-contiguous input passed to Marlin kernel ( #15319 )" ( #15398 )
2025-03-24 20:43:51 -07:00
6db94571d7
[Misc] Remove LoRA log ( #15388 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-24 20:43:48 -07:00
97cfa65df7
Add pipeline parallel support to TransformersModel ( #12832 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-03-25 10:41:45 +08:00
911c8eb000
[Minor][Spec Decode] Remove compiled_softmax ( #15416 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-24 19:09:04 -07:00
ebcebeeb6b
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling ( #15063 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-24 17:16:46 -07:00
f533b5837f
[ROCm][Kernel] MoE weights padding ( #14454 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: charlifu <charlifu@amd.com >
2025-03-24 23:45:30 +00:00
8279201ce6
[Build] Cython compilation support fix ( #14296 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-03-24 23:37:54 +00:00
23fdab00a8
[Hardware][TPU] Skip failed compilation test ( #15421 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-03-24 23:28:57 +00:00
623e2ed29f
[BugFix][V1] Quick fix for min_tokens with multiple EOS ( #15407 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-24 15:58:59 -07:00
9d72daf4ce
[V1][Perf] Simpler request output queues ( #15156 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-24 22:44:08 +00:00
6dd55af6c9
[Doc] Update docs on handling OOM ( #15357 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-24 14:29:34 -07:00
3eb08ed9b1
[DOC] Add Kubernetes deployment guide with CPUs ( #14865 )
2025-03-24 10:48:43 -07:00
5eeadc2642
[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral ( #12303 )
...
Signed-off-by: zhenwei <zhenweiliu@habana.ai >
2025-03-24 09:48:40 -07:00
3aee6573dc
[V1] Aggregate chunked prompt logprobs in model runner ( #14875 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-24 12:27:57 -04:00
9cc645141d
[MISC] Refine no available block debug msg ( #15076 )
...
Signed-off-by: Yi Liu <yiliu4@habana.ai >
Signed-off-by: yiliu30 <yi4.liu@intel.com >
Co-authored-by: Yi Liu <yiliu4@habana.ai >
2025-03-25 00:01:10 +08:00
0893567db9
[V1][Minor] fix comments ( #15392 )
...
Signed-off-by: chenjincong <chenjincong@baidu.com >
Signed-off-by: Chen-0210 <chenjincong11@gmail.com >
Co-authored-by: chenjincong <chenjincong@baidu.com >
2025-03-24 08:45:32 -07:00
8abe69b499
[Core] Don't force uppercase for VLLM_LOGGING_LEVEL ( #15306 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-24 08:27:30 -07:00
761702fd19
[Core] Integrate fastsafetensors loader for loading model weights ( #10647 )
...
Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com >
2025-03-24 08:08:02 -07:00
9606d572ed
[distributed] fix dp group ( #15355 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-24 14:54:27 +00:00
cbcdf2c609
[Bugfix] Fix chat template loading ( #15143 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-24 13:50:09 +00:00
038de04d7b
Fix zmq IPv6 URL format error ( #15341 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-24 09:30:41 -04:00
6b3cc75be0
[Kernel] allow non-contiguous input for marlin kernel ( #14658 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-03-24 09:21:33 -04:00
7ffcccfa5c
Revert "[CI/Build] Use uv python for docker rather than ppa:deadsnakess/ppa ( #13569 )" ( #15377 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-24 05:53:10 -07:00
cc8accfd53
[Misc] Update guided decoding logs to debug ( #15310 )
...
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
2025-03-24 04:25:20 -07:00
948ab03e7e
[Bugfix][V1] Avoid importing PreTrainedModel ( #15366 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2025-03-24 10:33:12 +00:00
5797fb97e9
[Misc] Remove ignore_reinit_error for ray.init() ( #15373 )
2025-03-24 07:41:53 +00:00
3892e58ad7
[Misc] Upgrade BNB version ( #15183 )
2025-03-24 05:51:42 +00:00
d20e261199
Fix non-contiguous input passed to Marlin kernel ( #15319 )
2025-03-24 03:09:44 +00:00
f622dbcf39
[Fix] [torch.compile] Improve UUID system for custom passes ( #15249 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-03-24 01:54:07 +00:00
dccf535f8e
[V1] Enable V1 Fp8 cache for FA3 in the oracle ( #15191 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-23 15:07:04 -07:00
9c5c81b0da
[Misc][Doc] Add note regarding loading generation_config by default ( #15281 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-23 14:00:55 -07:00
d6cd59f122
[Frontend] Support tool calling and reasoning parser ( #14511 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-03-23 14:00:07 -07:00
bc8ed3c4ba
[V1][Spec Decode] Use better defaults for N-gram ( #15358 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-23 10:52:30 -07:00
b9bd76ca14
[V1][Spec Decode] Respect prompt_lookup_max ( #15348 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-23 10:41:44 -07:00
6ebaf9ac71
[Bugfix] consider related env vars for torch.compiled cache hash ( #14953 )
...
Signed-off-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2025-03-23 15:53:09 +00:00
f90d34b498
[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 ( #15322 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-03-23 01:10:10 -07:00
f68cce8e64
[ci/build] fix broken tests in LLM.collective_rpc ( #15350 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-23 14:49:48 +08:00
09b6a95551
[ci/build] update torch nightly version for GH200 ( #15135 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-23 14:04:13 +08:00
50c9636d87
[V1][Usage] Refactor speculative decoding configuration and tests ( #14434 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-03-22 19:28:10 -10:00
0661cfef7a
Fix v1 supported oracle for worker-cls and worker-extension-cls ( #15324 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-23 10:23:35 +08:00
a827aa815d
[doc] Add back previous news ( #15331 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-03-22 17:38:33 -07:00
b877031d80
Remove openvino support in favor of external plugin ( #15339 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-22 14:06:39 -07:00
dd861b992f
[BugFix][Typing] Fix Imprecise Type Annotations ( #15208 )
...
Signed-off-by: Wang Ran (汪然) <wrran@outlook.com >
2025-03-22 09:05:03 -07:00
eb63ea1e18
[V1] Add disable-any-whitespace option support for xgrammar ( #15316 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-22 15:56:17 +00:00
2f4bd358f1
[Model] Support Tele-FLM Model ( #15023 )
...
Signed-off-by: Naitong Yu <ntyu@baai.ac.cn >
Signed-off-by: jiangxin <horizon94@outlook.com >
Co-authored-by: Jason Fang <jasonfang3900@gmail.com >
Co-authored-by: jiangxin <horizon94@outlook.com >
2025-03-22 02:04:44 -07:00
8a8b30eac1
[Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes ( #15308 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-22 02:03:32 -07:00
2fa0e1396b
[Bugfix] Fix torch.compile raise FileNotFoundError ( #15278 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-22 13:49:34 +08:00
1c2bec0f82
[Doc] add load_format items in docs ( #14804 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-03-21 22:36:43 -07:00
ec870fba9a
[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature ( #14959 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-03-21 22:36:14 -07:00
df1430265c
[Bugfix][V0] Multi-sequence logprobs streaming edge case ( #15259 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-03-21 22:35:37 -07:00
4c69e228b3
[Misc] Increase RayDistributedExecutor RAY_CGRAPH_get_timeout ( #15301 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-03-21 22:25:43 -07:00
790b79750b
[Build/CI] Fix env var typo ( #15305 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-21 22:28:46 +00:00
cfbb8c930f
[TPU][V1] MHA Pallas backend ( #15288 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-21 08:50:39 -07:00
baec0d4de9
Revert "[Feature] specify model in config.yaml ( #14855 )" ( #15293 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-21 08:30:23 -07:00
c21b99b912
[Bugfix][VLM] fix llava processor ( #15285 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-03-21 05:14:36 -07:00
93a00d7dde
[v1] Refactor KVCacheConfig ( #14079 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-03-21 04:56:27 -07:00
61e8c18350
[Misc] Add cProfile helpers ( #15074 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-21 04:56:09 -07:00
8afcd0f633
[Bugfix] Fix broken kernel test due to missing rename for v1 Triton backend ( #15282 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-21 11:42:06 +00:00
91ca929dc7
[V1] Fix wrong import path of get_flash_attn_version ( #15280 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-03-21 03:54:11 -07:00
84e00adc8a
[Bugfix] Fix incorrect resolving order for transformers fallback ( #15279 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-21 03:54:08 -07:00
47c7126213
[Misc] Add attention mask pre-computation optimization back to Qwen2.5-VL ( #15273 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-21 10:32:33 +00:00
a989ca2bf6
[Bugfix] Add int8 torch dtype for KVCache ( #15260 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-03-21 08:58:28 +00:00
0fa3970deb
[Feature] specify model in config.yaml ( #14855 )
...
Signed-off-by: weizeng <weizeng@roblox.com >
2025-03-21 00:26:03 -07:00
da6ea29f7a
[V1] Avoid redundant input processing in n>1 case ( #14985 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-20 22:24:10 -07:00
7297941b38
[Doc] Update LWS docs ( #15163 )
...
Signed-off-by: Edwinhr716 <Edandres249@gmail.com >
2025-03-20 21:18:47 -07:00
f8a08cb90d
[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs ( #14071 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-21 03:14:19 +00:00
b15fd2be2a
[Hardware][TPU] Add check for no additional graph compilation during runtime ( #14710 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-03-21 03:05:28 +00:00
e588ac237c
Add an example for reproducibility ( #15262 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-20 19:55:47 -07:00
5df2da5b97
[Misc] Better RayExecutor and multiprocessing compatibility ( #14705 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-20 19:27:46 -07:00
11b986b3fb
[Docs] Trim the latest news in README ( #15261 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-20 19:24:21 -07:00
296f927f24
[Model] RE: Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies ( #14857 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
2025-03-20 19:21:08 -07:00
0032903a5b
[Bugfix] detect alibi and revert to FA2 ( #15231 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
2025-03-20 19:20:16 -07:00
47195057e9
[V1][TPU] Speed up top-k on TPU by using torch.topk ( #15242 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
2025-03-20 19:19:40 -07:00
6edbfa924d
Mention extra_body as a way top pass vLLM only parameters using the OpenAI client ( #15240 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-20 19:18:36 -07:00
1e508343e1
[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation ( #15200 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-20 19:18:04 -07:00
2e0b4cfde0
[ROCM] Upgrade torch to 2.6 ( #15244 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-03-20 19:17:33 -07:00
10f55fe6c5
[Misc] Clean up the BitsAndBytes arguments ( #15140 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-20 19:17:12 -07:00
d3ccbd6350
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 ( #15159 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Richard Barnes <rbarnes@meta.com >
2025-03-21 10:01:11 +08:00
0cfe7d386d
[CI/Build] LoRA : make add_lora_test safer ( #15181 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-21 09:28:53 +08:00
0c6f5023c3
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface ( #15250 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-03-20 17:50:43 -07:00
06dd08256f
Enforce that TP > 1 is not supported for Mamba2 if Quantization is Enabled. ( #14617 )
...
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com >
2025-03-21 00:44:37 +00:00
2b22290ce0
[V1] Add flag to disable cascade attention ( #15243 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-20 15:24:16 -07:00
d8e82bc06d
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id ( #15043 )
...
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com >
2025-03-20 10:01:02 -07:00
086b56824c
[ci] feat: make the test_torchrun_example run with tp=2, external_dp=2 ( #15172 )
...
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-21 00:30:04 +08:00
5a0905ba2a
Replace misc issues with link to forum ( #15226 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-20 23:18:20 +08:00
a8f12a63fd
Fix env vars for running Ray distributed backend on GKE ( #15166 )
...
Signed-off-by: Richard Liu <ricliu@google.com >
2025-03-20 14:59:33 +00:00
69ae2380c6
Add user forum to README ( #15220 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-20 22:39:51 +08:00
27261e40a6
[Bugfix] Multi-video inference on LLaVA-Onevision ( #15082 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-03-20 14:10:45 +00:00
e3f813c33b
[macOS] Ugrade pytorch to 2.6.0 ( #15129 )
2025-03-20 01:22:40 -07:00
c607a2652b
Fixing Imprecise Type Annotations ( #15192 )
2025-03-20 01:19:55 -07:00
3d45e3d749
[release] Tag vllm-cpu with latest upon new version released ( #15193 )
2025-03-20 01:19:10 -07:00
742369d35a
[Frontend][Bugfix] support prefill decode disaggregation on deepseek ( #14824 )
...
Signed-off-by: billishyahao <bill.he@amd.com >
Co-authored-by: Zhai Feiyue <80079571+ZhaiFeiyue@users.noreply.github.com >
2025-03-20 00:00:33 -07:00
bfe2fe0af4
typo: Update config.py ( #15189 )
2025-03-19 23:31:21 -07:00
a8652f4f0f
Enable CUDA graph support for llama 3.2 vision ( #14917 )
...
Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com >
2025-03-19 23:29:16 -07:00
2f726b241e
[Doc] Update README.md ( #15187 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-20 13:25:58 +08:00
a597a57595
[Attention] Flash Attention 3 - fp8 ( #14570 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
2025-03-20 01:14:20 -04:00
ae65f3e237
[Misc]fixed disable these http request logs ( #14754 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-19 21:53:40 -07:00
34868b106a
[Doc] Update Mistral Small 3.1/Pixtral example ( #15184 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-20 04:46:06 +00:00
1f16b7fe74
[Core][V0] Add guidance backend for structured output ( #14589 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Loc Huynh <lohuynh@microsoft.com >
Co-authored-by: Michal Moskal <michal@moskal.me >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-19 21:33:51 -07:00
b88be22165
[Benchmark] Allow oversample request in benchmark dataset ( #15170 )
...
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
2025-03-20 12:32:58 +08:00
d8c6d7d6b5
[V1][TPU] Support V1 Sampler for ragged attention ( #14227 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-19 21:00:39 -07:00
40828ce5fe
fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… ( #14673 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com >
2025-03-19 20:56:16 -07:00
ffa443afed
[Bugfix] Fix embedding assignment for InternVL-based models ( #15086 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-20 03:40:13 +00:00
70e500cad9
Fix broken tests ( #14713 )
...
Signed-off-by: JovanSardinha <jovan.sardinha@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-03-20 02:06:49 +00:00
4cb1c05c9e
[Doc] Clarify run vllm only on one node in distributed inference ( #15148 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-03-20 09:55:59 +08:00
c47aafa37c
[BugFix] Lazily import XgrammarBackend to avoid early cuda init ( #15171 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-20 01:30:43 +00:00
cfbca8a2f2
[V1] TPU - Tensor parallel MP support ( #15059 )
2025-03-20 00:55:18 +00:00
0fe5609874
[Docs] Annouce Ollama and Singapore Meetups ( #15161 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-19 16:18:04 -07:00
22d33baca2
[FrontEnd][Perf] merge_async_iterators fast-path for single-prompt requests ( #15150 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-19 21:04:41 +00:00
b0e96aaebb
[V1][TPU] Change kv cache shape. ( #15145 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-03-19 12:16:42 -07:00
8310e0b59b
simple bugfix: Update stats.py ( #15139 )
2025-03-19 18:26:27 +00:00
26dd972adb
[FEAT]Support reset prefix cache by specified device ( #15003 )
2025-03-19 10:54:41 -07:00
61c7a1b856
[V1] Minor V1 async engine test refactor ( #15075 )
...
Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca >
Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca >
2025-03-19 10:37:17 -07:00
374ee287d8
[Frontend] Remove custom_cache_manager ( #13791 )
...
Signed-off-by: fulvius31 <asangior@redhat.com >
2025-03-20 00:13:50 +08:00
a4d83661d7
[Misc] Update the "the first vLLM China Meetup" slides link to point to the first page ( #15134 )
...
Signed-off-by: imkero <kerorek@outlook.com >
2025-03-19 15:07:39 +00:00
8363cd093d
[Bugfix] Adjust mllama to regional compilation ( #15112 )
...
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai >
2025-03-19 07:57:25 -07:00
6c5a3195db
[Misc][Benchmark] Add support for different tokenizer_mode ( #15040 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-03-19 14:56:50 +00:00
073d1ed354
[Doc] Update tip info on using latest transformers when creating a custom Dockerfile ( #15070 )
2025-03-19 13:33:40 +00:00
3d446433ec
[Bugfix] Fix size calculation of processing cache ( #15114 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-19 05:53:19 -07:00
1fe0fd12d3
[Misc] Avoid unnecessary HF do_rescale warning when passing dummy data ( #15107 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-19 03:42:31 -07:00
dafb4e504a
[V1][Bugfix] Fix oracle for device checking ( #15104 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-19 18:35:32 +08:00
68cf1601d3
[CI][Intel GPU] update XPU dockerfile and CI script ( #15109 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-03-19 01:29:25 -07:00
61f412187d
[Bugfix] Re-enable Gemma3 for V1 ( #14980 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-18 23:58:22 -07:00
05ccd0aa35
[V1] Ensure using int64 for sampled token ids ( #15065 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-18 23:52:19 -07:00
f690372b68
[Core] Update dtype detection and defaults ( #14858 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-19 13:49:33 +08:00
8b3e94a357
[Model] Remove duplicated message check in Mistral chat completion request ( #15069 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-03-19 05:09:32 +00:00
437f9162d0
[Model] Pixtral: Remove layer instantiation duplication ( #15053 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-03-19 10:34:03 +08:00
4f065f12f5
[Misc][V1] Skip device checking if not available ( #15061 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-18 19:33:43 -07:00
228b768db6
[Doc] Minor v1_user_guide update ( #15064 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2025-03-18 16:10:45 -07:00
027827cc1d
fix long dtype in topk sampling ( #15049 )
2025-03-18 15:57:31 -07:00
72a8639b68
[V1] TPU - CI/CD use smaller model ( #15054 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-18 21:39:21 +00:00
99abb8b650
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels ( #14930 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-18 14:31:54 -07:00
3a1e648158
[V1] Refactor Structured Output for multiple backends ( #14694 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-18 19:49:15 +00:00
46c759c165
[Bugfix] Fix LoRA extra vocab size ( #15047 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-18 09:40:29 -07:00
179a619c21
[Bugfix] Fix broken CPU quantization due to triton import ( #15038 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-18 08:57:39 -07:00
452e8fd968
[MODEL] Add support for Zamba2 models ( #13185 )
...
Signed-off-by: Yury Tokpanov <yury@zyphra.com >
Signed-off-by: Quentin Anthony <qganthony@yahoo.com >
Co-authored-by: Quentin Anthony <qganthony@yahoo.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-18 08:56:21 -07:00
8b793f7ec6
MI325 configs, fused_moe_kernel bugfix ( #14987 )
...
Signed-off-by: Eugene Kuznetsov <eugene.kuznetsov@amd.com >
2025-03-18 08:05:18 -07:00
af35d3a3cc
[TPU][V1][Bugfix] Fix chunked prefill with padding ( #15037 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-18 07:34:45 -07:00
3b457143d2
[Bugfix] Register serializers for V0 MQ Engine ( #15009 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-18 09:14:47 -04:00
ab656f2c2f
[Bugfix] Loosen type check to avoid errors in V1 ( #15021 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-18 12:54:40 +00:00
64fc2193dc
[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros ( #14347 )
2025-03-18 05:50:19 -07:00
dd732028f5
[Bugfix][Frontend] Fix validation of logprobs in ChatCompletionRequest ( #14352 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com >
2025-03-18 05:50:05 -07:00
414919138b
[Bugfix] torchrun compatibility ( #14899 )
...
Signed-off-by: hiyouga <hiyouga@buaa.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-18 05:49:27 -07:00
db7c8ca910
[Misc] Embedding model support LoRA ( #14935 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-18 12:07:00 +00:00
f863ffc965
[Mistral-Small 3.1] Update docs and tests ( #14977 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-18 03:29:42 -07:00
400d483e87
[Kernels] LoRA - Retire SGMV and BGMV Kernels ( #14685 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-18 09:47:53 +00:00
d1695758b2
[Doc][V1] Fix V1 APC doc ( #14920 )
2025-03-18 08:15:46 +00:00
53a0cf8b95
[Neuron] trim attention kernel tests to fit trn1.2x instance ( #14988 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-03-18 15:05:52 +08:00
5eeabc2a44
[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights ( #14950 )
2025-03-17 23:27:26 +00:00
18551e820c
[V1] TPU - Fix CI/CD runner ( #14974 )
2025-03-17 21:07:07 +00:00
e41e160263
[V1] Guard Against Main Thread Usage ( #14972 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-17 13:23:02 -07:00
b89fb2a4a1
[CI/Build] Use AutoModelForImageTextToText to load VLMs in tests ( #14945 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 18:35:17 +00:00
5340b0e221
[Bugfix] Fix interface for Olmo2 on V1 ( #14976 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-17 11:26:38 -07:00
37e3806132
[Bugfix] Make Gemma3 MM V0 only for now ( #14971 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-17 10:04:21 -07:00
c0efdd655b
[Fix][Structured Output] using vocab_size to construct matcher ( #14868 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-03-17 11:42:45 -04:00
aaaec52ad9
[Bugfix][Model] Mixtral: use unused head_dim config argument ( #14961 )
...
Signed-off-by: Quentin Torroba <quentin.torroba@mistral.ai >
2025-03-17 07:44:18 -07:00
e1eb45d397
[Bugfix] Fix precommit - line too long in pixtral.py ( #14960 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 07:18:50 -07:00
89fca671fb
[V1] Default MLA to V1 ( #14921 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-17 06:54:40 -07:00
d20b0c139c
Add patch merger ( #14957 )
2025-03-17 06:47:50 -07:00
166a168b0f
[Doc] Fix misleading log during multi-modal profiling ( #14955 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 06:14:32 -07:00
2bb0e1a799
[Bugfix][ROCm] running new process using spawn method for rocm in tests. ( #14810 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-17 11:33:35 +00:00
6eaf1e5c52
[Misc] Add --seed option to offline multi-modal examples ( #14934 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 03:00:17 -07:00
868a8c5b2c
[Bugfix] Fix Ultravox on V1 ( #14929 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 17:15:20 +08:00
b4ad56c1bd
[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. ( #14846 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-03-17 01:48:28 -07:00
69698f257e
fix minor miscalled method ( #14327 )
2025-03-17 01:47:58 -07:00
cd0cd85102
[MISC] More AMD unused var clean up ( #14926 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-03-17 16:40:41 +08:00
0a74bfce9c
setup.py: drop assumption about local main branch ( #14692 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-17 01:37:42 -07:00
dd3b865854
[Doc] Add vLLM Beijing meetup slide ( #14938 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-03-17 16:29:36 +08:00
9b87a579aa
[Misc][XPU] Use None as device capacity for XPU ( #14932 )
...
Signed-off-by: yan ma <yan.ma@intel.com >
2025-03-17 01:22:14 -07:00
b539222d4e
[V1] Remove input cache client ( #14864 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-16 23:42:06 -07:00