|
14b4326b94
|
v1: Support KV events from connectors (#19737)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-01 01:13:21 +00:00 |
|
|
752d2e1c36
|
[Minor] Fix some random typos in comments (#24009)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-31 16:42:17 -07:00 |
|
|
81eea3d348
|
vllm fix check on max vocab size (#22471)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-08-31 20:57:05 +08:00 |
|
|
9701352e4b
|
[Doc]: fix typos in Python comments (#24001)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-31 08:21:59 +00:00 |
|
|
749be00a98
|
[Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers. (#23394)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-08-30 18:01:22 -07:00 |
|
|
5b8077b8ac
|
Fix wrong truncate_prompt_tokens type hint (#22761)
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com>
Signed-off-by: Gabriel Marinho <104592062+gmarinho2@users.noreply.github.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-30 20:39:38 +00:00 |
|
|
038e9be4eb
|
[LoRA] Much faster startup when LoRA is enabled (#23777)
Signed-off-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-30 15:37:39 +00:00 |
|
|
68a349114f
|
[Misc] enhance type hint for rearrange return value (#23519)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-30 06:43:33 -07:00 |
|
|
e80bca309e
|
[Refactor] refactor freezing_value/cuda_event initialize outside try finally (#23758)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-30 06:42:25 -07:00 |
|
|
fb4983e112
|
[Misc] add reorder_batch AttentionMetadataBuilder (#23798)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-08-30 06:41:45 -07:00 |
|
|
379ea2823a
|
Add LoRA support for DeepSeek models (V2, V3, R1-0528) (#23971)
Signed-off-by: sadeghja1070 <sadegh.ja1070@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-30 06:40:02 -07:00 |
|
|
3a6acad431
|
[Model] Enable encoder DP for MiniCPM-V (#23948)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-30 06:31:26 -07:00 |
|
|
5490d633ce
|
[UT] fix unify_kv_cache_configs when kv cache config needs sort (#23843)
|
2025-08-30 11:22:14 +00:00 |
|
|
628d00cd7b
|
[Bugfix] Fix test_lora_resolvers.py (#23984)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-30 11:16:11 +00:00 |
|
|
4071c76cf3
|
[V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba (#23831)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-30 00:16:15 -07:00 |
|
|
f1bddbd852
|
[Core] Cleanup TPU model runner for MM (#23894)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-30 00:14:58 -07:00 |
|
|
9748c5198b
|
[CI] Fix broken compile tests due to unsupported SiluMul+Nvfp4Quant fusion (#23973)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-08-30 00:14:43 -07:00 |
|
|
ee52a32705
|
[CI] Move testing image from remote URL to S3 (#23980)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-08-29 21:41:25 -07:00 |
|
|
8fb85b7bb6
|
Add routed_scaling_factor to MoE grouped topk (#23123)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-29 21:36:48 -07:00 |
|
|
5b31cb1781
|
[Bugfix] Fix --config arg expansion called from api_server.py (#23944)
Signed-off-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-29 21:36:39 -07:00 |
|
|
d660c98c1b
|
[CI] Fix unavailable image remote URL (#23966)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-08-29 15:40:04 -07:00 |
|
|
5674a40366
|
[Misc] Make download_weights_from_hf more reliable (#23863)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-29 12:37:24 -07:00 |
|
|
8c3e199998
|
Revert gemma3n fast prefill changes (#23897)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-29 12:16:57 -07:00 |
|
|
1c26b42296
|
[Docs] [V1] [Hybrid] Add new documentation re: contributing mamba-based models (#23824)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-29 18:47:58 +00:00 |
|
|
b7adf94c4a
|
Tuned H100/H200 triton fp8 block configs for fused_qkv_a_proj (#23939)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-29 10:28:35 -07:00 |
|
|
4d7fe40fc0
|
[RL][BugFix] Fix missing tokenizer error for token-in-token-out (#23904)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-30 01:09:55 +08:00 |
|
|
0dc9532065
|
[BUGFIX ] fix undefined silu_and_mul_nvfp4_quant (#23929)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
|
2025-08-29 09:36:39 -07:00 |
|
|
72a69132dc
|
[CI] Add aiter to matching list of issue auto labeller for rocm tag (#23942)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-08-29 15:29:21 +00:00 |
|
|
d90d8eb674
|
[BugFix] Async scheduling and PP compatibility with DP (#23770)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-29 08:17:27 -07:00 |
|
|
0a2f4c0793
|
[Models] Use in-place adds in Idefics2Vision (#23932)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-08-29 07:42:57 -07:00 |
|
|
1cf3753b90
|
[MODEL] Apertus and XIELU (#23068)
Signed-off-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com>
Co-authored-by: AllenHaoHuang <allenhuangdd@gmail.com>
|
2025-08-29 20:29:18 +08:00 |
|
|
4f7cde7272
|
Adds json_count_leaves utility function (#23899)
Signed-off-by: aditchawdhary <aditxy@hotmail.com>
|
2025-08-29 05:28:13 -07:00 |
|
|
67c14906aa
|
Update PyTorch to 2.8.0 (#20358)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-29 18:57:35 +08:00 |
|
|
69f46359dd
|
[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-08-29 18:36:57 +08:00 |
|
|
d9e00dbd1f
|
[Performance] V1 Classify Models E2E Performance Optimization (#23541)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-29 03:12:32 -07:00 |
|
|
ad39106b16
|
[CPU] Enable data parallel for CPU backend (#23903)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-29 02:19:58 -07:00 |
|
|
2554b27baa
|
[V0 Deprecation] Remove pooling model support in V0 (#23434)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-29 00:04:02 -07:00 |
|
|
934bebf192
|
Better errors for Transformers backend missing features (#23759)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-29 07:01:40 +00:00 |
|
|
885ca6d31d
|
[Misc] Fix warnings for mistral model (#23552)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-08-29 06:58:48 +00:00 |
|
|
2d0afcc9dc
|
[mrope][Qwen2-VL] Fix edge case where getting index of image/video token can potentially throw in default vl mrope implementation. (#23895)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-08-28 23:29:13 -07:00 |
|
|
b4f9e9631c
|
[CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-28 23:28:35 -07:00 |
|
|
05d839c19e
|
Fix(async): Add support for truncate_prompt_tokens in AsyncLLM (#23800)
|
2025-08-28 22:55:06 -07:00 |
|
|
6597d7a456
|
[Platform] import activation_quant_fusion for CUDA only (#23882)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-08-28 22:54:16 -07:00 |
|
|
5264015d74
|
[BugFix][AMD][Deepseek] fix a dtype mismatch error for deepseek running on AMD (#23864)
Signed-off-by: Jinghui Zhang <jinghuizhang0804@gmail.com>
|
2025-08-28 22:54:12 -07:00 |
|
|
98ac0cb32d
|
[Bugfix] Use ReplicatedLinear for SequenceClassification head (#23836)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-29 04:41:20 +00:00 |
|
|
c8b3b299c9
|
[tests] Improve speed and reliability of test_transcription_api_correctness (#23854)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-29 04:25:33 +00:00 |
|
|
006477e60b
|
[ROCm][Fix] Fix rocm build caused by #23791 (#23847)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-08-28 19:52:27 -07:00 |
|
|
de533ab2a1
|
[Models] Improve iteration over layers (#19497)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-08-29 09:26:34 +08:00 |
|
|
235c9db8a7
|
[XPU] support data parallel for MoE models on XPU (#22887)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
|
2025-08-29 09:23:04 +08:00 |
|
|
b668055a11
|
[V0 Deprecation] Remove V0 Samplers test (#23862)
|
2025-08-28 18:05:52 -07:00 |
|