f0945e311d 
					 
					
						
						
							
							stash  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-07-24 00:33:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ec76caafa 
					 
					
						
						
							
							updated  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-07-23 20:02:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1588294a88 
					 
					
						
						
							
							updated  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-07-23 18:58:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e82e9afeb7 
					 
					
						
						
							
							updated  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-07-23 18:43:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						10abfaf309 
					 
					
						
						
							
							Merge branch 'fix-connector-agg' into debug-logging  
						
						
						
						
					 
					
						2025-07-23 18:20:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ff1a2b537 
					 
					
						
						
							
							[BugFix] Fix KVConnector TP worker aggregation  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-23 18:29:06 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0abe10e4a7 
					 
					
						
						
							
							updated  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-07-23 15:21:46 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						316b1bf706 
					 
					
						
						
							
							[Tests] Add tests for headless internal DP LB ( #21450 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-23 07:49:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7c734ee09b 
					 
					
						
						
							
							[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. ( #21364 )  
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-07-23 06:34:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f59ec35b7f 
					 
					
						
						
							
							[V1] Check all pooling tasks during profiling ( #21299 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-23 05:53:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2671334d45 
					 
					
						
						
							
							[Model] add Hunyuan V1 Dense Model support. ( #21368 )  
						
						... 
						
						
						
						Signed-off-by: Asher Zhang <asherszhang@tencent.com > 
						
						
					 
					
						2025-07-23 03:54:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2cc5016a19 
					 
					
						
						
							
							[Docs] Clean up v1/metrics.md ( #21449 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-23 03:37:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6929f8b437 
					 
					
						
						
							
							[Misc] fixed nvfp4_moe test failures due to invalid kwargs ( #21246 )  
						
						... 
						
						
						
						Signed-off-by: Yang Chen <yangche@fb.com > 
						
						
					 
					
						2025-07-23 01:41:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32ec9e2f2a 
					 
					
						
						
							
							Mamba V2 Test not Asserting Failures.  ( #21379 )  
						
						... 
						
						
						
						Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com > 
						
						
					 
					
						2025-07-23 01:40:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						accac82928 
					 
					
						
						
							
							[Sampler] Introduce logprobs mode for logging ( #21398 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-23 01:39:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23637dcdef 
					 
					
						
						
							
							[Docs] Fix bullets and grammars in tool_calling.md ( #21440 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-23 01:23:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6364af92f8 
					 
					
						
						
							
							Fixed typo in profiling logs ( #21441 )  
						
						
						
						
					 
					
						2025-07-23 01:18:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7aaa2bd5a8 
					 
					
						
						
							
							[Bugfix] ensure tool_choice is popped when tool_choice:null is passed in json payload ( #19679 )  
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-07-23 00:30:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2f5c14de6a 
					 
					
						
						
							
							add clear messages for deprecated models ( #21424 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-07-23 00:03:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f002e9a870 
					 
					
						
						
							
							[Cleanup] Only log MoE DP setup warning if DP is enabled ( #21315 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-23 00:02:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a1f3610fc6 
					 
					
						
						
							
							[Core] Add basic unit test for maybe_evict_cached_block ( #21400 )  
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-23 00:02:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ecedd1806 
					 
					
						
						
							
							[Bugfix] Fix nightly transformers CI failure ( #21427 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-23 00:01:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						107111a859 
					 
					
						
						
							
							Changing "amdproduction" allocation. ( #21409 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-07-22 20:48:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2dec7c1a5d 
					 
					
						
						
							
							[Bugfix][CUDA] fixes CUDA FP8 kv cache dtype supported ( #21420 )  
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-07-22 20:34:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						08d2bd78da 
					 
					
						
						
							
							[BUGFIX] deepseek-v2-lite failed due to fused_qkv_a_proj name update ( #21414 )  
						
						... 
						
						
						
						Signed-off-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-07-22 20:33:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4f76a05f4f 
					 
					
						
						
							
							[BugFix] Update python to python3 calls for image; fix prefix & input calculations. ( #21391 )  
						
						... 
						
						
						
						Signed-off-by: Eric Hanley <ericehanley@google.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-22 20:33:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f154bb9ff0 
					 
					
						
						
							
							Simplify weight loading in Transformers backend ( #21382 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-22 20:29:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3ec7170ff1 
					 
					
						
						
							
							[Bugfix][ROCm][Build] Fix build regression on ROCm ( #21393 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-22 20:27:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c401c64b4c 
					 
					
						
						
							
							[CI/Build] Fix model executor tests ( #21387 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-22 20:25:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b77c7d327f 
					 
					
						
						
							
							[BugFix] Fix ray import error mem cleanup bug ( #21381 )  
						
						... 
						
						
						
						Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com > 
						
						
					 
					
						2025-07-22 16:19:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						35bc8bd5fb 
					 
					
						
						
							
							[Misc] Copy HF_TOKEN env var to Ray workers ( #21406 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-22 16:18:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4594fc3b28 
					 
					
						
						
							
							[Model] Add Qwen3CoderToolParser ( #21396 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: simon-mo <xmo@berkeley.edu > 
						
						
					 
					
						2025-07-22 15:05:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ae268b6326 
					 
					
						
						
							
							Fix Flashinfer Allreduce+Norm enable disable calculation based on fi_allreduce_fusion_max_token_num ( #21325 )  
						
						... 
						
						
						
						Signed-off-by: XIn Li <xinli@nvidia.com > 
						
						
					 
					
						2025-07-22 12:42:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						35366ae57c 
					 
					
						
						
							
							[CI/Build] Fix test failure due to updated model repo ( #21375 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-22 08:39:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2226d5bd85 
					 
					
						
						
							
							[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers ( #21353 )  
						
						... 
						
						
						
						Signed-off-by: ariG23498 <aritra.born2fly@gmail.com > 
						
						
					 
					
						2025-07-22 08:27:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						44554a0068 
					 
					
						
						
							
							Add tokenization_kwargs to encode for embedding model truncation ( #21033 )  
						
						
						
						
					 
					
						2025-07-22 08:24:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						226b452a20 
					 
					
						
						
							
							Revert "[Refactor] Fix Compile Warning #1444-D ( #21208 )" ( #21384 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-22 08:22:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f38ee34a0a 
					 
					
						
						
							
							[feat] Enable mm caching for transformers backend ( #21358 )  
						
						... 
						
						
						
						Signed-off-by: raushan <raushan@huggingface.co > 
						
						
					 
					
						2025-07-22 08:18:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b194557a6c 
					 
					
						
						
							
							Adds parallel model weight loading for runai_streamer ( #21330 )  
						
						... 
						
						
						
						Signed-off-by: bbartels <benjamin@bartels.dev >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-22 08:15:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						774d0c014b 
					 
					
						
						
							
							[Perf] Cuda Kernel for Per Token Group Quant ( #21083 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-22 07:27:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c8db17cfd 
					 
					
						
						
							
							[feat]: add SM100 support for cutlass FP8 groupGEMM ( #20447 )  
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com >
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-22 07:27:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4fb56914c5 
					 
					
						
						
							
							[perf] Add fused MLA QKV + strided layernorm ( #21116 )  
						
						... 
						
						
						
						Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-22 07:07:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0df4d9b06b 
					 
					
						
						
							
							[Misc] unify variable for LLM instance v2 ( #21356 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-22 06:32:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed25054577 
					 
					
						
						
							
							[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool ( #21222 )  
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-22 06:17:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						10904e6d75 
					 
					
						
						
							
							[benchmark] Port benchmark request sent optimization to benchmark_serving ( #21209 )  
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-22 05:28:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a32237665d 
					 
					
						
						
							
							[Core] Optimize update checks in LogitsProcessor ( #21245 )  
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-22 05:27:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bc8a8ce5ec 
					 
					
						
						
							
							[Misc] Remove deprecated args in v0.10 ( #21349 )  
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-07-22 05:26:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32142b3c62 
					 
					
						
						
							
							[Bugfix] Fix eviction cached blocked logic ( #21357 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-07-22 01:18:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						82b8027be6 
					 
					
						
						
							
							Add arcee model ( #21296 )  
						
						... 
						
						
						
						Signed-off-by: alyosha-swamy <raghav@arcee.ai >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-22 00:57:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3779eb8c81 
					 
					
						
						
							
							[Feature][eplb] add verify ep or tp or dp ( #21102 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-21 23:41:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e23ad9655 
					 
					
						
						
							
							Update fp4 quantize API ( #21327 )  
						
						... 
						
						
						
						Signed-off-by: Shu Wang <shuw@nvidia.com > 
						
						
					 
					
						2025-07-21 23:40:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e69a92a1ce 
					 
					
						
						
							
							[Bug] DeepGemm: Fix Cuda Init Error ( #21312 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-21 23:36:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8425f785ad 
					 
					
						
						
							
							[Misc] DeepEPHighThroughtput - Enable Inductor pass ( #21311 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-21 23:35:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c17231e827 
					 
					
						
						
							
							Fix kv_cache_dtype handling for out-of-tree HPU plugin ( #21302 )  
						
						... 
						
						
						
						Signed-off-by: Konrad Zawora <kzawora@habana.ai >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-07-21 23:35:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e5b5ca580 
					 
					
						
						
							
							[Refactor] Fix Compile Warning #1444-D ( #21208 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-21 23:33:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						488d8a986a 
					 
					
						
						
							
							[V1] [Hybrid] Add new test to verify that hybrid views into KVCacheTensor are compatible ( #21300 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-21 23:31:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						af376ca19d 
					 
					
						
						
							
							[Core] Minimize number of dict lookup in _maybe_evict_cached_block ( #21281 )  
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-21 22:37:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e7b2042681 
					 
					
						
						
							
							Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE ( #20762 ) ( #21334 )  
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-07-21 21:49:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90f1e55421 
					 
					
						
						
							
							[Intel GPU] Ray Compiled Graph avoid NCCL for Intel GPU ( #21338 )  
						
						... 
						
						
						
						Signed-off-by: ratnampa <ratnam.parikh@intel.com > 
						
						
					 
					
						2025-07-21 21:48:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5e70dcd6e6 
					 
					
						
						
							
							[Doc] Fix CPU doc format ( #21316 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-21 21:47:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						25d585ab7b 
					 
					
						
						
							
							[XPU] Enable external_launcher to serve as an executor via torchrun ( #21021 )  
						
						... 
						
						
						
						Signed-off-by: chzhang <chaojun.zhang@intel.com > 
						
						
					 
					
						2025-07-21 21:47:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d0a01a5f2 
					 
					
						
						
							
							[v1][sampler] Inplace logprobs comparison to get the token rank ( #21283 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-21 13:47:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0ec82edda5 
					 
					
						
						
							
							[perf] Speed up align sum kernels ( #21079 )  
						
						... 
						
						
						
						Signed-off-by: Himanshu Jaju <hj@mistral.ai > 
						
						
					 
					
						2025-07-21 11:19:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						005ae9be6c 
					 
					
						
						
							
							Fix bad lm-eval fork ( #21318 )  
						
						
						
						
					 
					
						2025-07-21 10:47:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						29d1ffc5b4 
					 
					
						
						
							
							[DP] Fix Prometheus Logging ( #21257 )  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-07-21 09:11:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						304dce7ec0 
					 
					
						
						
							
							[Attention] Clean up iRoPE in V1 ( #21188 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-21 09:10:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6ece16c4fe 
					 
					
						
						
							
							[Misc] Add dummy maverick test ( #21199 )  
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-21 09:08:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a0e827e07c 
					 
					
						
						
							
							[BugFix] make utils.current_stream thread-safety ( #21252 ) ( #21253 )  
						
						... 
						
						
						
						Signed-off-by: simpx <simpxx@gmail.com > 
						
						
					 
					
						2025-07-21 09:07:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a15a50fc17 
					 
					
						
						
							
							[CPU] Enable shared-memory based pipeline parallel for CPU backend ( #21289 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-21 09:07:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6dda13c86b 
					 
					
						
						
							
							[Misc] Add sliding window to flashinfer test ( #21282 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-21 08:37:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6b46c4b653 
					 
					
						
						
							
							Add Nvidia ModelOpt config adaptation ( #19815 )  
						
						... 
						
						
						
						Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com > 
						
						
					 
					
						2025-07-21 10:02:58 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d97841078b 
					 
					
						
						
							
							[Misc] unify variable for LLM instance ( #20996 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-21 12:18:33 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e6b90a2805 
					 
					
						
						
							
							[Docs] Make tables more space efficient in supported_models.md ( #21291 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-21 02:25:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						be54a951a3 
					 
					
						
						
							
							[Docs] Fix hardcoded links in docs ( #21287 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-21 02:23:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						042af0c8d3 
					 
					
						
						
							
							[Model][1/N] Support multiple poolers at model level ( #21227 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-21 02:22:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						378d33c392 
					 
					
						
						
							
							[Bugfix] Fix missing placeholder in logger debug ( #21280 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-20 22:50:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						940af1f03a 
					 
					
						
						
							
							Add the instruction to run e2e validation manually before release ( #21023 )  
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-07-20 22:29:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						92615d7fe8 
					 
					
						
						
							
							[Docs] Add RFC Meeting to Issue Template ( #21279 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-07-20 21:58:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8188196a1c 
					 
					
						
						
							
							[CI] Cleanup modelscope version constraint in Dockerfile ( #21243 )  
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-07-20 20:13:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7ba34b1241 
					 
					
						
						
							
							[bugfix] fix syntax warning caused by backslash ( #21251 )  
						
						
						
						
					 
					
						2025-07-20 17:12:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9499e26e2a 
					 
					
						
						
							
							[Model] Support VLMs with transformers backend ( #20543 )  
						
						... 
						
						
						
						Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-20 13:25:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						51ba839555 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for bart ( #18299 )  
						
						... 
						
						
						
						Signed-off-by: calvin chen <120380290@qq.com > 
						
						
					 
					
						2025-07-20 08:15:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1fb65bde3 
					 
					
						
						
							
							Enable v1 metrics tests ( #20953 )  
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-07-20 03:22:02 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a1d8940ae 
					 
					
						
						
							
							[TPU] support fp8 kv cache quantization ( #19292 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-20 03:01:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2b504eb770 
					 
					
						
						
							
							[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. ( #21233 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-19 16:09:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						10eb24cc91 
					 
					
						
						
							
							GLM-4 Update ( #20736 )  
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-07-19 22:40:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e8cbb58f3 
					 
					
						
						
							
							[BugFix] Fix full cuda graph slot_mapping ( #21228 )  
						
						... 
						
						
						
						Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com > 
						
						
					 
					
						2025-07-19 14:13:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						752c6ade2e 
					 
					
						
						
							
							[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small ( #21217 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-19 13:53:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						881e3cbe3b 
					 
					
						
						
							
							[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers  ( #21194 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-19 19:27:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9f414a12ad 
					 
					
						
						
							
							[BugFix] Make PD work with Ray ( #21072 )  
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-07-19 08:46:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6a971ed692 
					 
					
						
						
							
							[Docs] Update the link to the 'Prometheus/Grafana' example ( #21225 )  
						
						
						
						
					 
					
						2025-07-19 06:58:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						da6579bf41 
					 
					
						
						
							
							[CI/CD][bugfix]fix: error argument to loads has incompatible type ( #21223 )  
						
						... 
						
						
						
						Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com > 
						
						
					 
					
						2025-07-19 05:16:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c81259d33a 
					 
					
						
						
							
							Fix/remove some broken model executor tests ( #21224 )  
						
						... 
						
						
						
						Signed-off-by: Rabi Mishra <ramishra@redhat.com > 
						
						
					 
					
						2025-07-19 12:15:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e3a0e43d7f 
					 
					
						
						
							
							[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code ( #21032 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-19 05:13:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b3d82108e7 
					 
					
						
						
							
							[Bugfix][Frontend] Fix openai CLI arg middleware ( #21220 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-19 02:40:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d0734c562 
					 
					
						
						
							
							[NVIDIA] Add SM100 Flashinfer MoE blockscale fp8 backend for low latency ( #20645 )  
						
						... 
						
						
						
						Signed-off-by: kaixih <kaixih@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-19 02:33:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7d94577138 
					 
					
						
						
							
							Add torch golden impl for moe_align_block_size kernel test ( #20653 )  
						
						... 
						
						
						
						Signed-off-by: Shixian Cui <shixian@amazon.com >
Co-authored-by: Shixian Cui <shixian@amazon.com > 
						
						
					 
					
						2025-07-19 02:32:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						59f935300c 
					 
					
						
						
							
							[BugFix] Fix potential cuda-graph IMA ( #21196 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-19 02:18:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						18e519ec86 
					 
					
						
						
							
							[Bugfix] Fix ndarray video color from VideoAsset ( #21064 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-19 02:17:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1eaff27815 
					 
					
						
						
							
							[V0 deprecation] Remove long context LoRA ( #21169 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-19 02:15:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cf8cc32674 
					 
					
						
						
							
							Fix a couple of Voxtral tests ( #21218 )  
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-07-19 09:13:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a2cb2649d 
					 
					
						
						
							
							[Misc][Tools][Benchmark] Add readme file for auto_tune script ( #20779 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-07-19 09:06:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e04107d97 
					 
					
						
						
							
							[Model] EXAONE 4.0 model support ( #21060 )  
						
						... 
						
						
						
						Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com >
Signed-off-by: woongsik <rlawhdrhs27@gmail.com > 
						
						
					 
					
						2025-07-19 14:25:44 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						37bd8d6e4c 
					 
					
						
						
							
							[Bug] DeepGemm: Fix TypeError: per_block_cast_to_fp8() missing 1 required positional argument: 'use_ue8m0' for SM100 ( #21187 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-18 23:25:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						468e2400fe 
					 
					
						
						
							
							[BugFix][CPU] Fix TorchSDPABackendImpl doesn't have use_irope  ( #21200 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-18 23:18:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dcc6cfb991 
					 
					
						
						
							
							[Kernel][Performance] Tweak MoE Batched silu_mul_fp8_quant_deep_gemm kernel ( #21193 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-18 23:09:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dd572c0ab3 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 Spec Decode workers ( #21152 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-18 21:47:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ffe905a41 
					 
					
						
						
							
							[Bugfix][Model] Fix LoRA for Mistral-Small-3.1-24B-Instruct-2503 ( #21183 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-07-18 21:15:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9a9fda1423 
					 
					
						
						
							
							[Core] Support Local Chunked Attention for Hybrid KV Cache ( #19351 )  
						
						... 
						
						
						
						Signed-off-by: Lucia Fang <fanglu@fb.com >
Signed-off-by: Lu Fang <fanglu@meta.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lu Fang <fanglu@meta.com > 
						
						
					 
					
						2025-07-18 20:48:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						466e878f2a 
					 
					
						
						
							
							[Quantization] Enable BNB support for more MoE models ( #21100 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-18 17:52:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						217937221b 
					 
					
						
						
							
							Elastic Expert Parallel Initial Support ( #20775 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-18 17:46:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5782581acf 
					 
					
						
						
							
							[Bugfix] Voxtral on Blackwell GPUs (RTX 50 series) ( #21077 )  
						
						... 
						
						
						
						Signed-off-by: hax0r31337 <liulihaocaiqwq@gmail.com > 
						
						
					 
					
						2025-07-18 18:40:18 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f199f197b 
					 
					
						
						
							
							[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue ( #21005 )  
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <jialino@meta.com > 
						
						
					 
					
						2025-07-18 12:34:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b2eb2b5ad7 
					 
					
						
						
							
							[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 ( #19346 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-18 14:10:21 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						21274ab476 
					 
					
						
						
							
							[CI] Update CODEOWNERS for vllm/compilation ( #21185 )  
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-18 06:51:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed8cbfedf8 
					 
					
						
						
							
							Let GraniteMoeAttention use YaRN ( #21174 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-18 05:52:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						45badd05d0 
					 
					
						
						
							
							[Core] Set pooling params based on task and model ( #21128 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-18 05:41:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4adc66f64d 
					 
					
						
						
							
							[Bugfix] Allocate less memory in non-batched CUTLASS MoE ( #21121 )  
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-07-18 18:55:52 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55ad648715 
					 
					
						
						
							
							[Doc] Fix typo in model name ( #21178 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-18 03:55:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5895afd780 
					 
					
						
						
							
							[Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. ( #20750 )  
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-18 09:10:47 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ca4eb82bcb 
					 
					
						
						
							
							[Model] Re-add the implicit conversion feature for as_seq_cls_model ( #21103 )  
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-18 07:15:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ba2dfbb0c2 
					 
					
						
						
							
							[Misc] Make MM embedding merge interface explicit in model runner ( #21147 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-18 07:13:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1bf65138f6 
					 
					
						
						
							
							[benchmark] Sending request strictly follows the random intervals ( #21108 )  
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-18 06:22:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						54cf1cae62 
					 
					
						
						
							
							[Misc] Do not print async output warning for v1 ( #21151 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-17 21:57:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5780121c95 
					 
					
						
						
							
							[Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm ( #20911 )  
						
						... 
						
						
						
						Signed-off-by: Shixian Cui <shixian@amazon.com >
Co-authored-by: Shixian Cui <shixian@amazon.com > 
						
						
					 
					
						2025-07-18 04:34:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c7d8724e78 
					 
					
						
						
							
							[Core] FlashInfer CUTLASS fused MoE backend (NVFP4) ( #20037 )  
						
						... 
						
						
						
						Signed-off-by: shuw <shuw@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-17 21:32:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b38baabcf9 
					 
					
						
						
							
							[Doc] Add inplace weights loading example ( #19640 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-17 21:12:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						89cab4d01f 
					 
					
						
						
							
							[Attention] Make local attention backend agnostic ( #21093 )  
						
						
						
						
					 
					
						2025-07-18 00:10:42 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b9a21e9173 
					 
					
						
						
							
							[Docs] Update supported models documentation with missing models ( #20844 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-07-17 20:12:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c4e3b12524 
					 
					
						
						
							
							[Docs] Add minimal demo of Ray Data API usage ( #21080 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-17 20:09:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8dfb45ca33 
					 
					
						
						
							
							[Bugfix] Fix the tensor non-contiguous issue for Flashinfer TRT-LLM backend attention kernel ( #21133 )  
						
						
						
						
					 
					
						2025-07-18 00:35:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8a8fc94639 
					 
					
						
						
							
							[Log] Debugging Log with more Information ( #20770 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-18 00:19:46 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4de7146351 
					 
					
						
						
							
							[V0 deprecation] Remove V0 HPU backend ( #21131 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-17 16:37:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ac9fb732a5 
					 
					
						
						
							
							On environments where numa cannot be detected we get 0 ( #21115 )  
						
						... 
						
						
						
						Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-07-17 18:52:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a3a6c695f4 
					 
					
						
						
							
							[Misc] Qwen MoE model supports LoRA ( #20932 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-17 18:32:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90bd2ab6e3 
					 
					
						
						
							
							[Model] Update pooling model interface ( #21058 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-17 16:05:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9fb2d22032 
					 
					
						
						
							
							[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE ( #20762 )  
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-07-17 09:56:44 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2d6a38209b 
					 
					
						
						
							
							[Docs] Move code block out of admonition now that it's short ( #21118 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-17 06:12:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						89e3c4e9b4 
					 
					
						
						
							
							[Misc] Avoid unnecessary import ( #21106 )  
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-07-17 12:57:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fe8a2c544a 
					 
					
						
						
							
							[Docs] Improve docstring formatting for FusedMoEParallelConfig.make ( #21117 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-17 04:13:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ef00b5cac 
					 
					
						
						
							
							[VLM] Add Nemotron-Nano-VL-8B-V1 support ( #20349 )  
						
						... 
						
						
						
						Signed-off-by: Kyle Huang <kylhuang@nvidia.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-17 03:07:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5a7fb3ab9e 
					 
					
						
						
							
							[Model] Add ToolParser and MoE Config for Hunyuan A13B  ( #20820 )  
						
						... 
						
						
						
						Signed-off-by: Asher Zhang <asherszhang@tencent.com > 
						
						
					 
					
						2025-07-17 09:10:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						11dfdf21bf 
					 
					
						
						
							
							[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels  ( #20903 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-17 08:10:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fdc5b43d20 
					 
					
						
						
							
							[Bugfix]: Fix final_res_batch list index out of range error ( #21055 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-17 00:29:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c5b8b5953a 
					 
					
						
						
							
							[Misc] Fix PhiMoE expert mapping ( #21085 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-17 05:47:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4fcef49ec4 
					 
					
						
						
							
							[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation ( #21048 )  
						
						... 
						
						
						
						Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com > 
						
						
					 
					
						2025-07-17 13:29:45 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8a4e5c5f3c 
					 
					
						
						
							
							[V1][P/D]Enhance Performance and code readability for P2pNcclConnector ( #20906 )  
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com > 
						
						
					 
					
						2025-07-16 22:13:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						76b494444f 
					 
					
						
						
							
							[Attention] Refactor attention metadata builder interface ( #20466 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-17 04:44:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						28a6d5423d 
					 
					
						
						
							
							[Bugfix] Fix Machete zero point issue for GPTQ models on SM90 ( #21066 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-16 19:54:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						58760e12b1 
					 
					
						
						
							
							[TPU] Start using python 3.12 ( #21000 )  
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-07-16 19:37:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a50d918225 
					 
					
						
						
							
							[Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile ( #21013 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-16 19:37:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c9ba8104ed 
					 
					
						
						
							
							[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group ( #21024 )  
						
						... 
						
						
						
						Signed-off-by: KevinXiong-C <kevin_xiong1997@outlook.com > 
						
						
					 
					
						2025-07-16 19:36:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e7dfbe7b4 
					 
					
						
						
							
							Update PyTorch to torch==2.7.1 for CUDA ( #21011 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-17 02:30:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						72ad273582 
					 
					
						
						
							
							Remove torch_xla.tpu.version() from pallas.py. ( #21065 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-17 00:25:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						01513a334a 
					 
					
						
						
							
							Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) ( #12010 )  
						
						... 
						
						
						
						Signed-off-by: Nir David <ndavid@habana.ai >
Signed-off-by: Uri Livne <ulivne@habana.ai >
Co-authored-by: Uri Livne <ulivne@habana.ai > 
						
						
					 
					
						2025-07-16 15:33:41 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ac2bf41e53 
					 
					
						
						
							
							[Model] Remove model sampler ( #21059 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-16 19:03:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a931b4cdcf 
					 
					
						
						
							
							Remove Qwen Omni workaround that's no longer necessary ( #21057 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-16 16:25:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a0f8a79646 
					 
					
						
						
							
							[fix] fix qwen image_embeds input ( #21049 )  
						
						... 
						
						
						
						Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai > 
						
						
					 
					
						2025-07-16 15:17:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						18bdcf4113 
					 
					
						
						
							
							feat - add a new endpoint get_tokenizer_info to provide tokenizer/chat-template information ( #20575 )  
						
						... 
						
						
						
						Signed-off-by: m-misiura <mmisiura@redhat.com > 
						
						
					 
					
						2025-07-16 21:52:14 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1c3198b6c4 
					 
					
						
						
							
							[Model] Consolidate pooler implementations ( #20927 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-16 13:39:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						260127ea54 
					 
					
						
						
							
							[Docs] Add intro and fix 1-2-3 list in frameworks/open-webui.md ( #19199 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-16 06:11:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d0dc4cfca4 
					 
					
						
						
							
							Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests ( #20831 )  
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-07-16 00:14:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d31a647124 
					 
					
						
						
							
							[BugFix] Fix import error on non-blackwell machines ( #21020 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-15 22:27:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						85431bd9ad 
					 
					
						
						
							
							[TPU] fix kv_cache_update kernel block size choosing logic ( #21007 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-16 04:39:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c11013db8b 
					 
					
						
						
							
							[Meta] Llama4 EAGLE Support ( #20591 )  
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: qizixi <qizixi@meta.com > 
						
						
					 
					
						2025-07-15 21:14:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1eb2b9c102 
					 
					
						
						
							
							[CI] update typos config for CI pre-commit and fix some spells ( #20919 )  
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-07-15 21:12:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6ebf313790 
					 
					
						
						
							
							Avoid direct comparison of floating point numbers ( #21002 )  
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-07-15 21:12:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cfbcb9ed87 
					 
					
						
						
							
							[Voxtral] Add more tests ( #21010 )  
						
						... 
						
						
						
						Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-15 21:11:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						76ddeff293 
					 
					
						
						
							
							[Doc] Remove duplicate docstring ( #21012 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-15 20:09:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f46098335b 
					 
					
						
						
							
							[Bugfix] Fix Mistral3 support on SM100/SM120 ( #20998 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-15 20:08:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e9534c7202 
					 
					
						
						
							
							[CI][HPU] update for v0 deprecate by switching to VLLM_TARGET_DEVICE=empty ( #21006 )  
						
						... 
						
						
						
						Signed-off-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-07-15 20:07:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7976446015 
					 
					
						
						
							
							Add Dockerfile argument for VLLM_USE_PRECOMPILED environment ( #20943 )  
						
						... 
						
						
						
						Signed-off-by: dougbtv <dosmith@redhat.com > 
						
						
					 
					
						2025-07-15 19:53:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fcb9f879c1 
					 
					
						
						
							
							[Bugfix] Correct per_act_token in CompressedTensorsW8A8Fp8MoECutlassM… ( #20937 )  
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-07-15 19:53:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3ed94f9d0a 
					 
					
						
						
							
							[Docs] Enhance Anyscale documentation, add quickstart links for vLLM ( #21018 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-15 19:46:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fa839565f2 
					 
					
						
						
							
							[Misc] Refactor: Improve argument handling for conda command ( #20481 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-15 19:43:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						75a99b98bf 
					 
					
						
						
							
							[Chore] Remove outdated transformers check ( #20989 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-07-15 19:42:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b5c3b68359 
					 
					
						
						
							
							[Misc] bump xgrammar version to v0.1.21 ( #20992 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-15 19:42:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6cbc4d4bea 
					 
					
						
						
							
							[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture ( #20923 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-15 19:19:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						153c6f1e61 
					 
					
						
						
							
							[Frontend] Remove print left in FrontendArgs.add_cli_args ( #21004 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-15 19:18:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						34cda778a0 
					 
					
						
						
							
							[Frontend] OpenAI Responses API supports input image ( #20975 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-15 18:59:36 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						30800b01c2 
					 
					
						
						
							
							[Nvidia] Integrate SM100 cudnn prefill API to MLA prefill ( #20411 )  
						
						... 
						
						
						
						Signed-off-by: Elfie Guo <elfieg@nvidia.com >
Co-authored-by: Elfie Guo <eflieg@nvidia.com > 
						
						
					 
					
						2025-07-15 17:56:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						10be209493 
					 
					
						
						
							
							[Bug Fix] get_distributed_init_method should get the ip from get_ip i… ( #20889 )  
						
						... 
						
						
						
						Signed-off-by: Chen Li <lcpingping@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-07-15 21:23:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						19c863068b 
					 
					
						
						
							
							[Frontend] Support cache_salt in /v1/completions and /v1/responses ( #20981 )  
						
						... 
						
						
						
						Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 21:01:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f29fd8a7f8 
					 
					
						
						
							
							[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 ( #20838 )  
						
						... 
						
						
						
						Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com >
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com > 
						
						
					 
					
						2025-07-15 16:08:26 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed10f3cea1 
					 
					
						
						
							
							[ROCm] warpSize is being made non constexpr in ROCm 7.0 ( #20330 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-15 14:01:44 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b637e9dcb8 
					 
					
						
						
							
							Add full serve CLI reference back to docs ( #20978 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 17:42:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1e36c8687e 
					 
					
						
						
							
							[Deprecation] Remove nullable_kvs ( #20969 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 17:21:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5bac61362b 
					 
					
						
						
							
							Configure Gemini ( #20971 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 09:37:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						313ae8c16a 
					 
					
						
						
							
							[Deprecation] Remove everything scheduled for removal in v0.10.0 ( #20979 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 15:57:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c847e34b39 
					 
					
						
						
							
							[CI/Build] Fix wrong path in Transformers Nightly Models Test ( #20994 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-15 08:53:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e7e3e6d263 
					 
					
						
						
							
							Voxtral ( #20970 )  
						
						... 
						
						
						
						Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-15 07:35:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ffd963fa0 
					 
					
						
						
							
							[v1][core] Support for attention free models ( #20811 )  
						
						... 
						
						
						
						Signed-off-by: Christian Pinto <christian.pinto@ibm.com > 
						
						
					 
					
						2025-07-15 14:20:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						56fe4bedd6 
					 
					
						
						
							
							[Deprecation] Remove TokenizerPoolConfig ( #20968 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 14:00:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d91278181d 
					 
					
						
						
							
							[doc] Add more details for Ray-based DP ( #20948 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-15 05:37:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						20149d84d9 
					 
					
						
						
							
							[MISC] Add init files for python package ( #20908 )  
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com > 
						
						
					 
					
						2025-07-15 12:16:33 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3534c39a20 
					 
					
						
						
							
							[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli  ( #20840 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-15 04:04:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c586b55667 
					 
					
						
						
							
							[TPU] Optimize kv cache update kernel ( #20415 )  
						
						... 
						
						
						
						Signed-off-by: Yifei Teng <tengyifei88@gmail.com > 
						
						
					 
					
						2025-07-15 03:56:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						33d560001e 
					 
					
						
						
							
							[Docs] Improve documentation for ray cluster launcher helper script ( #20602 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-15 03:55:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f148c44c6a 
					 
					
						
						
							
							[frontend] Refactor CLI Args for a better modular integration ( #20206 )  
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-07-15 02:23:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						235bfd5dfe 
					 
					
						
						
							
							[Docs] Improve documentation for RLHF example ( #20598 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-15 01:54:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						68d28e37b0 
					 
					
						
						
							
							[frontend] Add --help=page option for paginated help output ( #20961 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-15 00:42:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						37a7d5d74a 
					 
					
						
						
							
							[Misc] Refactor AllReduceFusionPass. Remove parameter ( #20918 )  
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-07-15 06:57:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d4d309409f 
					 
					
						
						
							
							Implement Async Scheduling ( #19970 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-14 23:01:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						85bd6599e4 
					 
					
						
						
							
							[Model] Add AutoWeightsLoader support for BERT, RoBERTa ( #20534 )  
						
						... 
						
						
						
						Signed-off-by: Jennifer He <islandhe@gmail.com >
Signed-off-by: <islandhe@gmail.com >
Signed-off-by: Jen H <islandhe@gmail.com > 
						
						
					 
					
						2025-07-15 13:34:24 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						91b3d190ae 
					 
					
						
						
							
							[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir ( #20940 )  
						
						... 
						
						
						
						Signed-off-by: Boyuan Feng <boyuan@meta.com > 
						
						
					 
					
						2025-07-15 13:02:17 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc017915f5 
					 
					
						
						
							
							[Doc] Clearer mistral3 and pixtral model support description ( #20926 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-14 21:56:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ad0a4588b 
					 
					
						
						
							
							[Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer ( #20934 )  
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com > 
						
						
					 
					
						2025-07-15 03:27:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						016b8d1b7f 
					 
					
						
						
							
							Enabled BnB NF4 inference on Gaudi ( #20172 )  
						
						... 
						
						
						
						Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai > 
						
						
					 
					
						2025-07-14 20:26:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						80305c1b24 
					 
					
						
						
							
							[CI] Fix flaky test_streaming_response test ( #20913 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-14 20:15:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						37e2ecace2 
					 
					
						
						
							
							feat: add image zoom to improve image viewing experience ( #20763 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-14 20:14:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						054c8657e3 
					 
					
						
						
							
							[Docs] Add Kuberay to deployment integrations ( #20592 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-14 20:13:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d4170fad39 
					 
					
						
						
							
							Use w8a8 quantized matmul Pallas kernel ( #19170 )  
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-07-15 03:06:33 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						946aadb4a0 
					 
					
						
						
							
							[CI/Build] Split Entrypoints Test into LLM and API Server ( #20945 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-15 02:44:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bcdfb2a330 
					 
					
						
						
							
							[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM ( #20933 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-15 01:42:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ba8c300018 
					 
					
						
						
							
							[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache ( #20942 )  
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-15 01:26:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8cdc371217 
					 
					
						
						
							
							SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP ( #20769 )  
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-07-15 01:06:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61e20828da 
					 
					
						
						
							
							Fall back if flashinfer comm module not found ( #20936 )  
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-07-14 23:11:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55e1c66da5 
					 
					
						
						
							
							[Docs] remove outdated performance benchmark ( #20935 )  
						
						... 
						
						
						
						Signed-off-by: Kuntai Du <kuntai@uchicago.edu > 
						
						
					 
					
						2025-07-14 22:14:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						86f3ac21ce 
					 
					
						
						
							
							Fix overflow indexing in causal_conv1d kernel ( #20938 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-14 21:43:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						149f2435a5 
					 
					
						
						
							
							[Misc] Relax translations tests ( #20856 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-14 20:08:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c0569dbc82 
					 
					
						
						
							
							[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts ( #20725 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-14 19:47:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8bb43b9c9e 
					 
					
						
						
							
							Add benchmark dataset for mlperf llama tasks ( #20338 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-14 19:10:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						559756214b 
					 
					
						
						
							
							Change default model to Qwen3-0.6B ( #20335 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-07-14 16:54:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d0cf239c6 
					 
					
						
						
							
							[CI/Build] Add Transformers nightly tests in CI ( #20924 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-14 16:33:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3fc964433a 
					 
					
						
						
							
							[Misc] Clean up Aimv2 config registration in Ovis config ( #20921 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-14 15:36:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0caf61c08a 
					 
					
						
						
							
							[CI] Update codeowner for compilation code ( #20929 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-14 08:33:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						667624659b 
					 
					
						
						
							
							[CI] cc folks on changes to vllm/compilation ( #20925 )  
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-14 07:52:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						38efa28278 
					 
					
						
						
							
							[Model] Add Ling implementation ( #20680 )  
						
						... 
						
						
						
						Signed-off-by: vito.yy <vito.yy@antgroup.com > 
						
						
					 
					
						2025-07-14 22:10:32 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e8cc53af5e 
					 
					
						
						
							
							[Misc] Log the reason for falling back to FlexAttention ( #20699 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-14 04:16:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a4851cfe68 
					 
					
						
						
							
							[Bugfix]: Fix messy code when using logprobs ( #20910 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-14 11:06:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9887e8ec50 
					 
					
						
						
							
							[Misc] Remove unused function ( #20909 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-14 10:48:55 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f326ab9c88 
					 
					
						
						
							
							[Bugfix] Bump up mistral_common to support v13 tokenizer ( #20905 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-14 10:45:03 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dcf2a5e208 
					 
					
						
						
							
							[CI/Build] Fix OOM issue in Jina-VL test ( #20907 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-14 10:32:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1e9438e0b0 
					 
					
						
						
							
							[MISC] Move bind_kv_cache to worker module ( #20900 )  
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-07-14 09:40:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						697ef765ee 
					 
					
						
						
							
							[Refactor][V1] Move outlines utils for V1 imports ( #20878 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-07-14 00:58:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a99b9f7dee 
					 
					
						
						
							
							[Quantization] add BNB for MixtralForCausalLM ( #20893 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-14 07:34:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c488b928a7 
					 
					
						
						
							
							[ROCm] [Bugfix] [Critical]: Fix mamba compilation bug ( #20883 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-07-14 15:23:28 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c7fa47161 
					 
					
						
						
							
							Fix: Add missing EOFError handling in CLI complete command ( #20896 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-14 07:09:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						88fc8a97e3 
					 
					
						
						
							
							Removing redundant python version check ( #20888 )  
						
						... 
						
						
						
						Signed-off-by: Dannyso05 <dansong1177@gmail.com > 
						
						
					 
					
						2025-07-14 06:15:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						66f6fbd393 
					 
					
						
						
							
							[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) ( #20511 )  
						
						... 
						
						
						
						Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com > 
						
						
					 
					
						2025-07-14 02:45:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8632e831ba 
					 
					
						
						
							
							[Core] Add update_config RPC method ( #20095 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-14 00:49:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4bbfc36b16 
					 
					
						
						
							
							[V1] Hybrid allocator without prefix caching ( #20661 )  
						
						... 
						
						
						
						Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com > 
						
						
					 
					
						2025-07-13 16:55:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						80d38b8ac8 
					 
					
						
						
							
							[V1] [ROCm] [AITER] Upgrade AITER to commit 916bf3c and bugfix APIs ( #20880 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-07-13 15:19:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						211b6a6113 
					 
					
						
						
							
							[Bugfix] fix define of RerankDocument ( #20877 )  
						
						... 
						
						
						
						Signed-off-by: liuchenlong <liuchenlong@xiaohongshu.com >
Co-authored-by: liuchenlong <liuchenlong@xiaohongshu.com > 
						
						
					 
					
						2025-07-13 14:32:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						247102f07f 
					 
					
						
						
							
							[Bugfix] Fix: add patch_rope_scaling after hf override ( #20857 )  
						
						... 
						
						
						
						Signed-off-by: Wang Siyuan <wsy0227@sjtu.edu.cn >
Signed-off-by: Wang Siyuan <sywang0227@gmail.com > 
						
						
					 
					
						2025-07-13 00:13:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bd4c1e6fdb 
					 
					
						
						
							
							Support for LlamaForSequenceClassification ( #20807 )  
						
						... 
						
						
						
						Signed-off-by: thechaos16 <thechaos16@gmail.com > 
						
						
					 
					
						2025-07-13 00:09:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						99b4f080d8 
					 
					
						
						
							
							Renable google/gemma-3-1b-it accuracy test. ( #20866 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-12 21:48:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						020f58abcd 
					 
					
						
						
							
							[Core] Support multiple tasks per model ( #20771 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-12 19:40:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c1acd6d7d4 
					 
					
						
						
							
							[Refactor] Change the way of import triton ( #20774 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-12 19:39:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3b3b778d4a 
					 
					
						
						
							
							[Bugfix] Fix a couple PPLX+CUTLASS MoE bugs ( #20825 )  
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-07-12 19:39:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						42d440c22b 
					 
					
						
						
							
							[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant ( #20841 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-12 19:38:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f45a332886 
					 
					
						
						
							
							[Sched] Enhance the logic to remove stopped requests from queues ( #20739 )  
						
						
						
						
					 
					
						2025-07-12 15:33:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e2c176e1f 
					 
					
						
						
							
							[Bugfix] Restrict Machete to only run on Hopper ( #20830 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-12 17:34:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a86754a12b 
					 
					
						
						
							
							[docs] convert supported configs to table ( #20858 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-12 06:54:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c2a2f19aba 
					 
					
						
						
							
							[Bugfix] Fix Tensor Parallelism Padding Consistency in Granite Models ( #20843 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-07-12 06:11:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c11a738b3 
					 
					
						
						
							
							[Model] New model support for microsoft/Phi-4-mini-flash-reasoning ( #20702 )  
						
						... 
						
						
						
						Signed-off-by: Congcong Chen <congcongchen@microsoft.com > 
						
						
					 
					
						2025-07-12 06:02:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b639327ad9 
					 
					
						
						
							
							Revert "Use NVCC --compress-mode to reduce binary size by 30%  #20694 " ( #20853 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 23:07:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4afe687a82 
					 
					
						
						
							
							Enable ModelOpt Llama4 fp8 checkpoint deployment ( #20419 )  
						
						... 
						
						
						
						Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com > 
						
						
					 
					
						2025-07-11 23:07:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5de8d9f111 
					 
					
						
						
							
							Remove extra tensor on CPU ( #20693 )  
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-07-12 14:06:34 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c1c8ca57ff 
					 
					
						
						
							
							[cold start time] add envs.VLLM_COMPILE_DEPYF to guard decompile ( #20790 )  
						
						... 
						
						
						
						Signed-off-by: Boyuan Feng <boyuan@meta.com > 
						
						
					 
					
						2025-07-11 23:06:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a3a5a47e48 
					 
					
						
						
							
							[Bugfix] Fix torch.compile x LoRA for PyTorch 2.8  ( #20823 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-11 23:06:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fb25e95688 
					 
					
						
						
							
							[Docs] Update basic.md ( #20846 )  
						
						
						
						
					 
					
						2025-07-11 23:05:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d4891cd03 
					 
					
						
						
							
							[Bug] Fix DeepGemm for EP low latency case ( #20833 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-11 23:05:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f56d2996ca 
					 
					
						
						
							
							[Misc] Respect no_use_tqdm_on_load flag while capturing CUDA graph ( #20834 )  
						
						... 
						
						
						
						Signed-off-by: Linkun <github@lkchen.net > 
						
						
					 
					
						2025-07-11 23:04:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						147afb448b 
					 
					
						
						
							
							[Bugfix] Replace unavailable video url in multimodal test ( #20854 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-12 05:25:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c7d942da8 
					 
					
						
						
							
							[Frontend] Abstract prompt and SpeechToTextConfig for transcriptions models ( #20637 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-11 21:33:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						890323dc1b 
					 
					
						
						
							
							[Bugfix] : Fix typo - logger.warn_once -> logger.warning_once ( #20852 )  
						
						
						
						
					 
					
						2025-07-11 20:56:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						01cae37713 
					 
					
						
						
							
							[CI/Build] Ensure compatability with Transformers v4.53 ( #20541 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-11 20:53:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						11c0198615 
					 
					
						
						
							
							[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading ( #20682 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-11 20:52:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b1235c3e10 
					 
					
						
						
							
							[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices  ( #20822 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-11 20:52:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						44d02f54db 
					 
					
						
						
							
							[Misc] Restrict deep_gemm's log output ( #20827 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-11 20:50:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a8593237c0 
					 
					
						
						
							
							Add pynccl all-gatherv and reducescatterv ( #20154 )  
						
						... 
						
						
						
						Signed-off-by: Trevor Morris <tmorris@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 18:59:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc0f41d10a 
					 
					
						
						
							
							Integration SM100 FlashInfer fused allreduce RMSNorm ( #20691 )  
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-07-11 18:58:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7b828e30d5 
					 
					
						
						
							
							[CI Bug] Fix Async Engine, Inputs, Utils, Worker Test: 'State' object has no attribute 'enable_server_load_tracking' ( #20845 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-11 18:57:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5f0af36af5 
					 
					
						
						
							
							Update kimi-k2 tool calling docs, enable unit tests ( #20821 )  
						
						... 
						
						
						
						Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn >
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team > 
						
						
					 
					
						2025-07-11 20:16:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d21b2664c 
					 
					
						
						
							
							[Bugfix] Fix OOM in language generation test ( #20814 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-11 11:21:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9907fc4494 
					 
					
						
						
							
							[Docs] Data Parallel deployment documentation ( #20768 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-11 09:42:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d47661f0cd 
					 
					
						
						
							
							[Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM ( #20646 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 10:05:33 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						53fa457391 
					 
					
						
						
							
							[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility ( #20449 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-11 07:51:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6fb162447b 
					 
					
						
						
							
							[doc] fix ordered list issue ( #20819 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-11 06:49:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						66177189c5 
					 
					
						
						
							
							[Bugfix] Add missing field to TritonLanguagePlaceholder ( #20812 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-11 05:25:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b4f0b5f9aa 
					 
					
						
						
							
							Temporarily suspend google/gemma-3-1b-it. ( #20722 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-11 11:21:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cbd14ed561 
					 
					
						
						
							
							[Bugfix] Refactor /invocations to be task-agnostic ( #20764 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-11 03:20:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7bd4c37ae7 
					 
					
						
						
							
							[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100).  ( #19825 )  
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: shuw <shuw@nvidia.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 09:23:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8020e98c9f 
					 
					
						
						
							
							[Quantization][1/N] MoE support BNB-Inflight Quantization ( #20061 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-11 08:01:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						762be26a8e 
					 
					
						
						
							
							[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging ( #20777 )  
						
						... 
						
						
						
						Signed-off-by: Luka Govedic <lgovedic@redhat.com >
Signed-off-by: luka <lgovedic@redhat.com > 
						
						
					 
					
						2025-07-11 00:15:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6a9e6b2abf 
					 
					
						
						
							
							[doc] fold long code block ( #20795 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-10 23:16:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5d09152ff1 
					 
					
						
						
							
							[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine ( #20660 )  
						
						... 
						
						
						
						Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com > 
						
						
					 
					
						2025-07-11 05:53:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						31d5c1797f 
					 
					
						
						
							
							[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf ( #19830 )  
						
						... 
						
						
						
						Signed-off-by: Luka Govedic <lgovedic@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 04:56:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						35514b682a 
					 
					
						
						
							
							[XPU] XCCL support enabled in torch 2.8.0.dev nightly builds ( #20705 )  
						
						... 
						
						
						
						Signed-off-by: ratnampa <ratnam.parikh@intel.com > 
						
						
					 
					
						2025-07-10 20:39:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e2de455c34 
					 
					
						
						
							
							[Feature] Integrate SM100 DeepGEMM support ( #20087 )  
						
						
						
						
					 
					
						2025-07-10 20:18:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b032352cc 
					 
					
						
						
							
							[Attention] MLA - Flashinfer Ragged Prefill ( #20034 )  
						
						
						
						
					 
					
						2025-07-10 20:17:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						922f316441 
					 
					
						
						
							
							[Model] Support HF format of minimax ( #20211 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 02:55:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5923ab9524 
					 
					
						
						
							
							[fix]: disable cutlass block scaled group gemm for EP ( #20781 )  
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com > 
						
						
					 
					
						2025-07-11 02:39:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0cf893cae1 
					 
					
						
						
							
							Add kimi-k2 tool parser ( #20789 )  
						
						... 
						
						
						
						Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn >
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team > 
						
						
					 
					
						2025-07-11 10:36:23 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cf75cd2098 
					 
					
						
						
							
							[CI Bugfix] Specify same TORCH_CUDA_ARCH_LIST for flashinfer aot and install ( #20772 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 01:16:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b854321ffe 
					 
					
						
						
							
							[Docs] Lazy import gguf ( #20785 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-07-10 16:06:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b6fe23d05 
					 
					
						
						
							
							[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. ( #20786 )  
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-10 14:52:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f0c98cae27 
					 
					
						
						
							
							[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce  ( #20648 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-10 14:40:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						574ad60db9 
					 
					
						
						
							
							[KVConnector] Always call connector clear_metadata() at end of step ( #20756 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: David Ben-David <sdavidbd@gmail.com > 
						
						
					 
					
						2025-07-10 22:37:27 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fdadb6f43a 
					 
					
						
						
							
							[Bugfix] Fused MoE Modular Kernel chunking loop ( #20392 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-10 20:31:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						41060c6e08 
					 
					
						
						
							
							[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] ( #19126 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-07-10 21:09:37 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3de2ed767f 
					 
					
						
						
							
							[Bugfix] Remove assertion of expert_map being None ( #20714 )  
						
						... 
						
						
						
						Signed-off-by: Ming Yang <yming@meta.com >
Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-07-10 19:55:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						299252ea82 
					 
					
						
						
							
							[CI] Fix pre commit issue ( #20782 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-10 12:48:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6902ce79f 
					 
					
						
						
							
							[V0][V1][Core] Add outlines integration for V1, and update V0 integration. ( #15975 )  
						
						... 
						
						
						
						Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com > 
						
						
					 
					
						2025-07-10 15:30:26 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5e53c89a74 
					 
					
						
						
							
							[Bugfix] [CI] Fix Tensorizer LoRA test ( #20760 )  
						
						... 
						
						
						
						Signed-off-by: Sanger Steel <sangersteel@gmail.com > 
						
						
					 
					
						2025-07-10 19:07:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c66e38ea4c 
					 
					
						
						
							
							[Test] Remove docker build from test. ( #20542 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-10 11:21:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						251595368f 
					 
					
						
						
							
							Fix DeepSeek-R1-0528 chat template ( #20717 )  
						
						... 
						
						
						
						Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com > 
						
						
					 
					
						2025-07-10 17:47:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4bed167768 
					 
					
						
						
							
							[Model][VLM] Support JinaVL Reranker ( #20260 )  
						
						... 
						
						
						
						Signed-off-by: shineran96 <shinewang96@gmail.com > 
						
						
					 
					
						2025-07-10 10:43:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b140416abf 
					 
					
						
						
							
							[Model] Add reason parser for Hunyuan A13B Model. ( #20625 )  
						
						... 
						
						
						
						Signed-off-by: Asher Zhang <asherszhang@tencent.com > 
						
						
					 
					
						2025-07-10 16:33:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b8366b61a 
					 
					
						
						
							
							[ROCm][Regression] Remove tensor creation that harms performance on ROCm ( #20741 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-10 09:22:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c7753a9809 
					 
					
						
						
							
							[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU ( #14129 )  
						
						... 
						
						
						
						Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com > 
						
						
					 
					
						2025-07-10 15:59:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4b9a9435bb 
					 
					
						
						
							
							Update Dockerfile FlashInfer to v0.2.8rc1 ( #20718 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-10 08:09:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3482fd7e4e 
					 
					
						
						
							
							[Doc] Add engine args back in to the docs ( #20674 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-10 08:02:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						77f77a951e 
					 
					
						
						
							
							[Misc] Clean up mark to fork process in BNB tests ( #20692 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-10 13:59:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1a4f35e2ea 
					 
					
						
						
							
							Normalize lm-eval command between baseline and correctness test ( #18560 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-10 13:27:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						be1e128dfb 
					 
					
						
						
							
							[CI Bugfix] Skip failing Tensorizer+LoRA test ( #20724 )  
						
						
						
						
					 
					
						2025-07-10 21:15:03 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65393ee064 
					 
					
						
						
							
							[doc] fix ordered list ( #20749 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-10 03:13:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc221ad72d 
					 
					
						
						
							
							[Bugfix][Build][Non-CUDA] Only referencing CMAKE_CUDA_COMPILER_VERSION on CUDA where it is defined ( #20738 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-10 02:58:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7571a4a7e5 
					 
					
						
						
							
							[CI/Build] Fix Basic Models Test ( #20728 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-10 09:57:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f67d986dd1 
					 
					
						
						
							
							[Misc] loose new-model tagger conditions ( #20747 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-10 02:54:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cc876d0f29 
					 
					
						
						
							
							[KVConnector] Aggregate finished requests on the scheduler ( #19555 )  
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-07-10 09:22:18 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fdfd409f8f 
					 
					
						
						
							
							[TPU][Core]Make load weight exceed hbm error more instructive for customers ( #20644 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-07-10 07:01:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ffbcc9e757 
					 
					
						
						
							
							[BugFix] Fix VllmConfig() construction on all platforms ( #20695 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-10 07:00:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						59389c927b 
					 
					
						
						
							
							[BugFix][CPU] Fix CPU worker dependency on cumem_allocator ( #20696 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-10 14:24:20 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8f2720def9 
					 
					
						
						
							
							[Frontend] Support Tool Calling with both tool_choice='required' and $defs. ( #20629 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-10 13:56:35 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ad6c2e1a0b 
					 
					
						
						
							
							Correct PPMissingLayer handling in Deepseek-V2-Lite PP deployment ( #20665 )  
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-07-09 20:34:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						49e8c7ea25 
					 
					
						
						
							
							Use NVCC --compress-mode to reduce binary size by 30% ( #20694 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-09 18:26:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						805d62ca88 
					 
					
						
						
							
							[Misc] DP : Add ExpertTokensMetadata ( #20332 )  
						
						... 
						
						
						
						Signed-off-by: Varun <vsundarr@redhat.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-10 00:33:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b7d9e9416f 
					 
					
						
						
							
							[CI/Build] Fix FlashInfer double build in Dockerfile ( #20651 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-09 17:41:56 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7c12a765aa 
					 
					
						
						
							
							[Misc] Simplify the prefix caching logic on draft tokens ( #20701 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-09 14:48:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cd587c93ef 
					 
					
						
						
							
							[BugFix]: Properly set engine_id when using multi connector ( #19487 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: leiyiming <leiyiming@kingsoft.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-09 20:32:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						332d4cb17b 
					 
					
						
						
							
							[Feature][Quantization] MXFP4 support for MOE models ( #17888 )  
						
						... 
						
						
						
						Signed-off-by: Felix Marty <felmarty@amd.com >
Signed-off-by: Bowen Bao <bowenbao@amd.com >
Signed-off-by: Felix Marty <Felix.Marty@amd.com >
Co-authored-by: Bowen Bao <bowenbao@amd.com > 
						
						
					 
					
						2025-07-09 13:19:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bf03ff3575 
					 
					
						
						
							
							[Kernel] Add Conch backend for mixed-precision linear layer ( #19818 )  
						
						... 
						
						
						
						Signed-off-by: Jacob Manning <jmanning+oss@stackav.com > 
						
						
					 
					
						2025-07-09 13:17:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						47043eb678 
					 
					
						
						
							
							[Kernel] Triton implementation of causal-conv1d for Mamba-based models ( #18218 )  
						
						... 
						
						
						
						Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com >
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-07-09 12:53:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						31b96d1c64 
					 
					
						
						
							
							Support Llama 4 for cutlass_moe_fp4 ( #20453 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-09 15:53:38 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e59ba9e142 
					 
					
						
						
							
							[CI/Build] Enlarge tolerance for a CPU multi-modal test ( #20684 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-09 17:48:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						403b481573 
					 
					
						
						
							
							Remove heading form installation inc.md file ( #20697 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-09 10:42:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						138709f8d1 
					 
					
						
						
							
							[Doc] Update CPU doc ( #20676 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-09 10:28:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0bbac1c1b4 
					 
					
						
						
							
							[Bench] Add NVFP4 GEMM benchmark script ( #20578 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-09 13:23:48 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a3e4e85ece 
					 
					
						
						
							
							[XPU][CI] enhance xpu test support ( #20652 )  
						
						... 
						
						
						
						Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com >
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai > 
						
						
					 
					
						2025-07-09 16:53:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eb58f5953d 
					 
					
						
						
							
							[TPU][Bugfix] fix test_pallas ( #20666 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-09 09:32:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ac9c33f78 
					 
					
						
						
							
							[Bugfix] Fix handling of Tensorizer arguments for LoadConfig ( #20643 )  
						
						... 
						
						
						
						Signed-off-by: Sanger Steel <sangersteel@gmail.com > 
						
						
					 
					
						2025-07-09 15:36:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						efe73d0575 
					 
					
						
						
							
							[doc] update doc format ( #20673 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-09 08:08:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						853487bc1b 
					 
					
						
						
							
							[Docs] Improve docs for RLHF co-location example ( #20599 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-09 08:06:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ff2af6d2b 
					 
					
						
						
							
							[Benchmark] Parameterization of streaming loading of multimodal datasets ( #20528 )  
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com > 
						
						
					 
					
						2025-07-09 13:35:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70ca5484f5 
					 
					
						
						
							
							[Doc] Update notes ( #20668 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-09 03:46:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5358cce5ff 
					 
					
						
						
							
							[V1] [Doc] Update V1 docs for Mamba models ( #20499 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-09 01:02:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2155e95ef1 
					 
					
						
						
							
							[Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. ( #20662 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-09 07:39:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f95570a52d 
					 
					
						
						
							
							[Docs] fix minimax tool_calling docs error ( #20667 )  
						
						... 
						
						
						
						Signed-off-by: qingjun <qingjun@minimaxi.com > 
						
						
					 
					
						2025-07-09 00:37:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6e7e3d58f 
					 
					
						
						
							
							[Intel GPU] support ray as distributed executor backend for XPU. ( #20659 )  
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-09 00:36:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e760fcef22 
					 
					
						
						
							
							[XPU] Use spawn with XPU multiprocessing ( #20649 )  
						
						... 
						
						
						
						Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com > 
						
						
					 
					
						2025-07-09 00:34:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6bbf1795b7 
					 
					
						
						
							
							[Misc] Fix the size of batched_dummy_mm_inputs in profile_run ( #20434 )  
						
						... 
						
						
						
						Signed-off-by: bk-201 <joy25810@foxmail.com > 
						
						
					 
					
						2025-07-08 20:15:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e0ef888f0 
					 
					
						
						
							
							Fix bullets in incremental_build.md ( #20642 )  
						
						
						
						
					 
					
						2025-07-09 11:03:41 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						97abeb1daa 
					 
					
						
						
							
							[feat] enable SM100 CUTLASS block scaled group gemm for smaller batch sizes ( #20640 )  
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com > 
						
						
					 
					
						2025-07-09 11:03:35 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						34dad19e7b 
					 
					
						
						
							
							[Bugfix] set default set cuda_graph_sizes to min(self.max_num_seqs * 2, 512) ( #20628 )  
						
						... 
						
						
						
						Signed-off-by: izhuhaoran <izhuhaoran@qq.com > 
						
						
					 
					
						2025-07-09 11:02:51 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6db31e7a27 
					 
					
						
						
							
							[Hardware][PPC64LE] Enable V1 for ppc64le and ARM ( #20554 )  
						
						... 
						
						
						
						Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Nikhil Gupta <nikhil.gupta2@arm.com > 
						
						
					 
					
						2025-07-08 20:00:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						977180c912 
					 
					
						
						
							
							[Docs] Improve documentation for multi-node service helper script ( #20600 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-08 19:44:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c40784c794 
					 
					
						
						
							
							[BugFix][Intel GPU] Use refactored API for dist_backend in V1 worker ( #20596 )  
						
						... 
						
						
						
						Signed-off-by: ratnampa <ratnam.parikh@intel.com > 
						
						
					 
					
						2025-07-08 19:44:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						baed180aa0 
					 
					
						
						
							
							[tech debt] Revisit lora request model checker ( #20636 )  
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-07-09 09:42:41 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0b407479ef 
					 
					
						
						
							
							[misc]refactor Platform.set_device method ( #20262 )  
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-09 01:39:47 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5eaf570050 
					 
					
						
						
							
							Replace multiply_add with homogeneous_multiply_add to Address Clang Template Parameter Issue ( #20142 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-09 00:30:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d8ee5a2ca4 
					 
					
						
						
							
							[TPU][Bugfix] disable phi-3 test ( #20632 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-08 23:14:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b9fca83256 
					 
					
						
						
							
							[Bugfix] Fix GLM-4.1-V video prompt update ( #20635 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-08 23:13:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32dffc2772 
					 
					
						
						
							
							[Core] Rename get_max_tokens_per_item for backward compatibility ( #20630 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-08 23:11:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c438183e99 
					 
					
						
						
							
							[Bugfix] Fix topk_ids indices_type for CUTLASS w8a8 FP8 MoE ( #20166 )  
						
						... 
						
						
						
						Signed-off-by: Ming Yang <yming@meta.com > 
						
						
					 
					
						2025-07-08 23:10:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						baba0389f7 
					 
					
						
						
							
							[CI] Increase the threshold of the MTEB RERANK tests ( #20615 )  
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-08 08:10:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c6c22f16d3 
					 
					
						
						
							
							Revert invalid spellchecker fix on deepseek_vl2 ( #20618 )  
						
						
						
						
					 
					
						2025-07-08 15:07:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dd382e0fe3 
					 
					
						
						
							
							[Model] Implement missing get_language_model for Keye-VL ( #20631 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-08 07:47:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						849590a2a7 
					 
					
						
						
							
							Update torch/xla pin to 20250703 ( #20589 )  
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-07-08 07:44:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a4c23314c0 
					 
					
						
						
							
							[xpu]feat: support multi-lora on xpu ( #20616 )  
						
						... 
						
						
						
						Signed-off-by: yan <yan.ma@intel.com > 
						
						
					 
					
						2025-07-08 22:07:10 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b942c094e3 
					 
					
						
						
							
							Stop using title frontmatter and fix doc that can only be reached by search ( #20623 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-08 03:27:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b4bab81660 
					 
					
						
						
							
							Remove unnecessary explicit title anchors and use relative links instead ( #20620 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-08 02:49:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b91cb3fa5c 
					 
					
						
						
							
							[Docs] Improve documentation for Deepseek R1 on Ray Serve LLM ( #20601 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-08 02:09:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71d1d75b7a 
					 
					
						
						
							
							[PD][Nixl] Remote consumer READ timeout for clearing request blocks  ( #20139 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-08 08:56:40 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						72d14d0eed 
					 
					
						
						
							
							[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load ( #19619 )  
						
						... 
						
						
						
						Signed-off-by: Sanger Steel <sangersteel@gmail.com >
Co-authored-by: Eta <esyra@coreweave.com > 
						
						
					 
					
						2025-07-07 22:47:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e34d130c16 
					 
					
						
						
							
							[TPU] Temporary fix vmem oom for long model len by reducing page size ( #20278 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-07-08 05:16:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7721ef1786 
					 
					
						
						
							
							[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files ( #20560 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-07 22:13:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8369b7c2a9 
					 
					
						
						
							
							[Misc] improve error msg ( #20604 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-07 21:45:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3eb4ad53f3 
					 
					
						
						
							
							[Docs] Add Anyscale to frameworks ( #20590 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-07 20:09:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90a2769f20 
					 
					
						
						
							
							[Docs] Add Ray Serve LLM section to openai compatible server guide ( #20595 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-07 20:08:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e60d422f19 
					 
					
						
						
							
							[Docs] Improve docstring for ray data llm example ( #20597 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-07 20:06:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d914c81a2 
					 
					
						
						
							
							[Docs] Rewrite offline inference guide ( #20594 )  
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-07 20:06:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e428cdd7a 
					 
					
						
						
							
							[Doc] Syntax highlight request responses as JSON instead of bash ( #20582 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 20:02:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93b9d9f499 
					 
					
						
						
							
							[Bugfix]: Fix messy code when using logprobs ( #19209 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-08 11:02:15 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						af107d5a0e 
					 
					
						
						
							
							Make distinct code and console admonitions so readers are less likely to miss them ( #20585 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 19:55:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						31c5d0a1b7 
					 
					
						
						
							
							[Optimize] Don't send token ids when kv connector is not used ( #20586 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-07 19:04:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						afb7cff1b9 
					 
					
						
						
							
							[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe ( #20167 )  
						
						... 
						
						
						
						Signed-off-by: Ming Yang <yming@meta.com > 
						
						
					 
					
						2025-07-08 01:07:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d2e841a10a 
					 
					
						
						
							
							[Misc] Improve logging for dynamic shape cache compilation ( #20573 )  
						
						... 
						
						
						
						Signed-off-by: kyolebu <kyu@redhat.com > 
						
						
					 
					
						2025-07-08 00:48:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						14601f5fba 
					 
					
						
						
							
							[Config] Refactor mistral configs  ( #20570 )  
						
						... 
						
						
						
						Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com > 
						
						
					 
					
						2025-07-07 15:25:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						042d131f39 
					 
					
						
						
							
							Fix links in multi-modal model contributing page ( #18615 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 21:13:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8e807cdfa4 
					 
					
						
						
							
							[Misc] feat output content in stream response ( #19608 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-07 20:45:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e601efcb10 
					 
					
						
						
							
							[Misc] Add fully interleaved support for multimodal 'string' content format ( #14047 )  
						
						... 
						
						
						
						Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru >
Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru > 
						
						
					 
					
						2025-07-07 19:43:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						22dd9c2730 
					 
					
						
						
							
							[Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel ( #20308 )  
						
						... 
						
						
						
						Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com > 
						
						
					 
					
						2025-07-07 19:08:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6d795d593 
					 
					
						
						
							
							[DP] Copy environment variables to Ray DPEngineCoreActors ( #20344 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-07 10:14:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a37d75bbec 
					 
					
						
						
							
							[Front-end] microbatch tokenization ( #19334 )  
						
						... 
						
						
						
						Signed-off-by: zt2370 <ztang2370@gmail.com > 
						
						
					 
					
						2025-07-07 17:54:10 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						edd270bc78 
					 
					
						
						
							
							[Bugfix] Prevent IndexError for cached requests when pipeline parallelism is disabled ( #20486 )  
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-07-07 09:41:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						110df74332 
					 
					
						
						
							
							[Model][Last/4] Automatic conversion of CrossEncoding model ( #19675 )  
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-07 14:46:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1ad69e8375 
					 
					
						
						
							
							[Doc] Fix some MkDocs snippets used in the installation docs ( #20572 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 07:44:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b8a498c9b2 
					 
					
						
						
							
							[Doc] Add outline for content tabs ( #20571 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 07:43:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						923147b5e8 
					 
					
						
						
							
							[Doc] Fix internal links so they don't always point to latest ( #20563 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 04:15:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						45877ef740 
					 
					
						
						
							
							[Doc] Use gh-pr and gh-issue everywhere we can in the docs ( #20564 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 03:54:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e4bef1bea 
					 
					
						
						
							
							[Doc] Remove extra whitespace from CI failures doc ( #20565 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 03:35:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ff79a136e 
					 
					
						
						
							
							[Misc] Set the minimum openai version ( #20539 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-07 09:15:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						448acad31e 
					 
					
						
						
							
							[Misc] remove unused jinaai_serving_reranking ( #18878 )  
						
						... 
						
						
						
						Signed-off-by: Abirdcfly <fp544037857@gmail.com > 
						
						
					 
					
						2025-07-07 09:14:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eb0b2d2f08 
					 
					
						
						
							
							[Docs] Clean up tables in supported_models.md ( #20552 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-07 01:46:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3112271f6e 
					 
					
						
						
							
							[XPU] log clean up for XPU platform ( #20553 )  
						
						... 
						
						
						
						Signed-off-by: yan <yan.ma@intel.com > 
						
						
					 
					
						2025-07-07 01:38:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1fd471e957 
					 
					
						
						
							
							Add docstrings to url_schemes.py to improve readability ( #20545 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-07 08:31:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c5ebec064 
					 
					
						
						
							
							[XPU][CI] add v1/core test in xpu hardware ci ( #20537 )  
						
						... 
						
						
						
						Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com > 
						
						
					 
					
						2025-07-07 01:16:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e610deb72 
					 
					
						
						
							
							[CI/Build] Enable phi2 lora test ( #20540 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-07 05:10:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e2c19ce22 
					 
					
						
						
							
							[Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU ( #19410 )  
						
						... 
						
						
						
						Signed-off-by: dbyoung18 <yang5.yang@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-07 04:32:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						47db8c2c15 
					 
					
						
						
							
							[Misc] add a tip for pre-commit ( #20536 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-06 19:42:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						462b269280 
					 
					
						
						
							
							Implement OpenAI Responses API [1/N] ( #20504 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-06 18:32:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c18b3b8e8b 
					 
					
						
						
							
							[Bugfix] Add use_cross_encoder flag to use correct activation in ClassifierPooler ( #20527 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-06 14:01:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9528e3a05e 
					 
					
						
						
							
							[BugFix][Spec Decode] Fix spec token ids in model runner ( #20530 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-06 19:44:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9fb52e523a 
					 
					
						
						
							
							[V1] Support any head size for FlexAttention backend ( #20467 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-06 09:54:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e202dd2736 
					 
					
						
						
							
							[V0 deprecation] Remove V0 CPU/XPU/TPU backends ( #20412 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-06 08:48:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						43813e6361 
					 
					
						
						
							
							[Misc] call the pre-defined func ( #20518 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-06 10:25:29 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cede942b87 
					 
					
						
						
							
							[Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py ( #20516 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-07-06 09:20:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fe1e924811 
					 
					
						
						
							
							[Frontend] Support image object in llm.chat ( #19635 )  
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com >
Signed-off-by: Flora Feng <4florafeng@gmail.com > 
						
						
					 
					
						2025-07-06 06:47:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4548c03c50 
					 
					
						
						
							
							[TPU][Bugfix] fix the MoE OOM issue ( #20339 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-05 21:19:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						40b86aa05e 
					 
					
						
						
							
							[BugFix] Fix: ImportError when building on hopper systems ( #20513 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-06 12:17:30 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						432870829d 
					 
					
						
						
							
							[Bugfix] Fix missing per_act_token parameter in compressed_tensors_moe ( #20509 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-07-06 12:08:30 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f73d02aadc 
					 
					
						
						
							
							[BUG]  Fix   #20484 . Support empty sequence in cuda penalty kernel ( #20491 )  
						
						... 
						
						
						
						Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai > 
						
						
					 
					
						2025-07-05 19:38:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c5ebe040ac 
					 
					
						
						
							
							test_attention compat with coming xformers change ( #20487 )  
						
						... 
						
						
						
						Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-05 19:37:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d763cb891 
					 
					
						
						
							
							[Misc] remove unused import ( #20517 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-05 19:17:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cf4cd53982 
					 
					
						
						
							
							[Misc] Add logger.exception for TPU information collection failures ( #20510 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-05 07:24:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32c9be2200 
					 
					
						
						
							
							[v1] Re-add fp32 support to v1 engine through FlexAttention ( #19754 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-05 09:41:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8aeaa910a2 
					 
					
						
						
							
							Fix unknown attribute of topk_indices_dtype in CompressedTensorsW8A8Fp8MoECutlassMethod ( #20507 )  
						
						... 
						
						
						
						Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com > 
						
						
					 
					
						2025-07-05 14:03:20 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						906e05d840 
					 
					
						
						
							
							[Misc] Remove the unused LoRA test code ( #20494 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-05 13:48:16 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ef9a2990ae 
					 
					
						
						
							
							[doc] small fix ( #20506 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-04 20:56:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e90870491 
					 
					
						
						
							
							[Misc] Add security warning for development mode endpoints ( #20508 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-04 20:52:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d3f05c9248 
					 
					
						
						
							
							[Doc] fix mutltimodal_inputs.md gh examples link ( #20497 )  
						
						... 
						
						
						
						Signed-off-by: Guy Stone <guys@spotify.com > 
						
						
					 
					
						2025-07-04 16:41:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c108781c85 
					 
					
						
						
							
							[CI Bugfix] Fix pre-commit failures on main ( #20502 )  
						
						
						
						
					 
					
						2025-07-04 14:17:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d184b95b8 
					 
					
						
						
							
							[feat]: CUTLASS block scaled group gemm for SM100 ( #19757 )  
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com >
Co-authored-by: Duncan Moss <dmoss@nvidia.com > 
						
						
					 
					
						2025-07-04 12:58:04 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2f35a022e6 
					 
					
						
						
							
							Enable V1 for Hybrid SSM/Attention Models ( #20016 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-07-04 17:46:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ffe00ef77a 
					 
					
						
						
							
							[Misc] Small: Remove global media connector. Each test should have its own test connector object. ( #20395 )  
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-07-04 08:15:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5561681d04 
					 
					
						
						
							
							[CI] add kvcache-connector dependency definition and add into CI build ( #18193 )  
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-07-04 06:49:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fbd62d8750 
					 
					
						
						
							
							[Doc] Fix classification table in list of supported models ( #20489 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-04 06:08:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e26f9156a 
					 
					
						
						
							
							[Model][3/N] Automatic conversion of CrossEncoding model ( #20168 )  
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-04 05:47:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e5452ee34 
					 
					
						
						
							
							[Bug][Frontend] Fix structure of transcription's decoder_prompt ( #18809 )  
						
						... 
						
						
						
						Signed-off-by: sangbumlikeagod <oironese@naver.com > 
						
						
					 
					
						2025-07-04 11:28:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0e3fe896e2 
					 
					
						
						
							
							Support Llama 4 for fused_marlin_moe ( #20457 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-04 07:55:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1caca5a589 
					 
					
						
						
							
							[Misc] Add SPDX-FileCopyrightText ( #20428 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-04 07:40:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						783921d889 
					 
					
						
						
							
							[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels ( #20331 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-04 15:06:24 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4a98edff1f 
					 
					
						
						
							
							[Structured Outputs][V1] Skipping with models doesn't contain tokenizers ( #20365 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-04 15:05:49 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a7bab0c9e5 
					 
					
						
						
							
							[Misc] small update ( #20462 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-03 20:33:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						25950dca9b 
					 
					
						
						
							
							Add ignore consolidated file in mistral example code ( #20420 )  
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-07-04 02:55:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a4113b035c 
					 
					
						
						
							
							[Platform] Add custom default max tokens ( #18557 )  
						
						... 
						
						
						
						Signed-off-by: Gabriel Marinho <gmarinho@ibm.com > 
						
						
					 
					
						2025-07-04 10:50:17 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e1665b089 
					 
					
						
						
							
							[Misc] Change warn_for_unimplemented_methods to debug ( #20455 )  
						
						
						
						
					 
					
						2025-07-04 02:35:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d1096e7db 
					 
					
						
						
							
							[Bugfix] Register reducer even if transformers_modules not available ( #19510 )  
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-07-03 22:08:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d775dd30a 
					 
					
						
						
							
							[Misc] Fix Unable to detect current VLLM config. Defaulting to NHD kv cache layout warning ( #20400 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-03 14:56:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						78fe77534b 
					 
					
						
						
							
							[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. ( #18864 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-07-03 14:55:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2f2fcb31b8 
					 
					
						
						
							
							[Misc] Remove _maybe_ignore_quant_config from GLM4.1v ( #20432 )  
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com > 
						
						
					 
					
						2025-07-03 21:41:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1dba2c4ebe 
					 
					
						
						
							
							[Misc] adjust for ipv6 for mookcacke url parse ( #20107 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-03 20:27:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71d6de3a26 
					 
					
						
						
							
							[Misc] Clean up InternVL family config registration ( #19992 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-03 20:01:47 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						536fd33003 
					 
					
						
						
							
							[CI] Trimming some failing test groups from AMDPRODUCTION. ( #20390 )  
						
						
						
						
					 
					
						2025-07-03 08:21:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						619b9f5c7e 
					 
					
						
						
							
							[Frontend] fix duplicate output for bench subcmd ( #20446 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-03 08:02:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1b689c445 
					 
					
						
						
							
							[Bugfix] Fix flaky test_streaming_response test ( #20363 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-03 14:46:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9854dc9040 
					 
					
						
						
							
							[Frontend] improve vllm bench <bench_type> --help display ( #20430 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-03 14:22:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ff5c60fad8 
					 
					
						
						
							
							[Misc] Automatically tag PRs to add new models ( #20222 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-03 07:11:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6f1229f91d 
					 
					
						
						
							
							[Model][2/N] Automatic conversion of CrossEncoding model ( #19978 )  
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-03 13:59:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1819fbda63 
					 
					
						
						
							
							[Quantization] Bump to use latest bitsandbytes ( #20424 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-03 21:58:46 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7f0367109e 
					 
					
						
						
							
							[CI/Build][CPU] Enable cross compilation in CPU release pipeline ( #20423 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-03 05:26:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fb14d53cf6 
					 
					
						
						
							
							[Kernel] refactor cpu worker v0 cache dtype ( #20080 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-03 08:39:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b024a42e93 
					 
					
						
						
							
							[Core] Move multimodal placeholder from chat utils to model definition ( #20355 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-03 08:18:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb97f2bfc5 
					 
					
						
						
							
							[Docs] Replace two list with tables in intel_gaudi.md ( #20414 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-03 00:48:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						359200f6ac 
					 
					
						
						
							
							[doc] fix link ( #20417 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-03 00:21:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						220aee902a 
					 
					
						
						
							
							[Misc] Add rules to label Speculative Decoding Related PRs ( #20406 )  
						
						... 
						
						
						
						Signed-off-by: Lifan Shen <lifans@meta.com > 
						
						
					 
					
						2025-07-02 23:56:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						67d25eca05 
					 
					
						
						
							
							[Tests] Update online DP tests to verify that requests are balanced ( #20157 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-03 14:49:13 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						363528de27 
					 
					
						
						
							
							[Feature] Support MiniMax-M1 function calls features ( #20297 )  
						
						... 
						
						
						
						Signed-off-by: QscQ <qscqesze@gmail.com >
Signed-off-by: qingjun <qingjun@minimaxi.com > 
						
						
					 
					
						2025-07-03 06:48:27 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ff61ababa 
					 
					
						
						
							
							[TPU] Add a case to cover RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 ( #20385 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-03 06:46:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0ec3779df7 
					 
					
						
						
							
							[Bugfix][CI/CD][CPU] Fix CPU CI tests ( #20383 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-02 20:11:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b616f6a53d 
					 
					
						
						
							
							[Misc] Small: Fix video loader return type annotations. ( #20389 )  
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-07-03 03:10:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e25bb12a8 
					 
					
						
						
							
							[Bugfix] Fix import of CutlassExpertsFp8 in compressed_tensors_moe.py ( #20381 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-07-03 02:07:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9965c47d0d 
					 
					
						
						
							
							Enable CPU nightly performance benchmark and its Markdown report ( #18444 )  
						
						... 
						
						
						
						Signed-off-by: Tsai, Louie <louie.tsai@intel.com > 
						
						
					 
					
						2025-07-02 17:50:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						059d4cdb49 
					 
					
						
						
							
							[BugFix] Fix DP headless mode arg validation ( #20398 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-02 17:15:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bdb84e26b0 
					 
					
						
						
							
							[Bugfix] Fixes for FlashInfer's TORCH_CUDA_ARCH_LIST ( #20136 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-07-02 17:15:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3dd359147d 
					 
					
						
						
							
							[Docs] Update EAGLE example ( #20375 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-02 17:13:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						657f2f301a 
					 
					
						
						
							
							[DP] Support external DP Load Balancer mode ( #19790 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-02 10:21:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a1aafc827a 
					 
					
						
						
							
							[ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) ( #20254 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-07-02 16:25:46 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						139508a418 
					 
					
						
						
							
							[Misc] add handler HF_TOKEN is emptry string ( #20369 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-02 09:14:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d265414dbc 
					 
					
						
						
							
							[Minor] Clean up incorrect comment in test ( #20382 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-02 09:13:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						48fb076cbc 
					 
					
						
						
							
							[V1] LogitsProcessor programming model ( #16728 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Andrew Feldman <afeldman@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-02 09:10:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c1909e7e8c 
					 
					
						
						
							
							[Kernels] MoE refactor ( #19636 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-07-02 06:08:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b95877509b 
					 
					
						
						
							
							Documentation update tool_calling: mapping back to function from response ( #20373 )  
						
						
						
						
					 
					
						2025-07-02 05:55:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						706ff13224 
					 
					
						
						
							
							[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct ( #20286 )  
						
						... 
						
						
						
						Signed-off-by: Zichong Li <t-lizichong@microsoft.com @Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net>
Co-authored-by: Zichong Li <t-lizichong@microsoft.com @Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net>
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-02 12:54:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ccbfb1d1c9 
					 
					
						
						
							
							[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models ( #20322 )  
						
						... 
						
						
						
						Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com > 
						
						
					 
					
						2025-07-02 12:53:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e5552aa13 
					 
					
						
						
							
							[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) ( #17280 )  
						
						... 
						
						
						
						Signed-off-by: kaln27 <liaojuncheng123@foxmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-02 06:47:19 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0c600b9ab6 
					 
					
						
						
							
							[Build/CI] Automatically tag DeepSeek related PRs ( #20370 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-02 04:02:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e303dcf523 
					 
					
						
						
							
							[Model] Add Ernie4.5 and Ernie4.5MoE Model Support ( #20220 )  
						
						... 
						
						
						
						Signed-off-by: wangyafeng <wangyafeng@baidu.com > 
						
						
					 
					
						2025-07-02 03:37:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ae9c4d416f 
					 
					
						
						
							
							[Docs] Make TPU ref prettier in google_tpu.md ( #20356 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-02 02:04:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d853520b3e 
					 
					
						
						
							
							[Docs] Fix indentations for 2-level items in deprecation_policy.md ( #20352 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-01 23:50:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ba51aea65e 
					 
					
						
						
							
							[Bugfix] Keye-VL compatibility with tok_kwargs ( #20058 ) ( #20353 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-01 23:46:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8452946c06 
					 
					
						
						
							
							[Model][VLM] Support Keye-VL-8B-Preview ( #20126 )  
						
						... 
						
						
						
						Signed-off-by: Kwai-Keye <Keye@kuaishou.com > 
						
						
					 
					
						2025-07-01 23:35:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e7cbf2d7d 
					 
					
						
						
							
							[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. ( #20105 )  
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-07-01 23:34:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7da296be04 
					 
					
						
						
							
							[TPU] kv cache update kernel supports dynamic grid ( #20235 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-02 06:33:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b205e8467d 
					 
					
						
						
							
							[Doc][TPU] Add models and features supporting matrix. ( #20230 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <cuiq@google.com > 
						
						
					 
					
						2025-07-02 06:33:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						be0cfb2b68 
					 
					
						
						
							
							fix[Docs]: link anchor is incorrect  #20309  ( #20315 )  
						
						... 
						
						
						
						Signed-off-by: zxw <1020938856@qq.com > 
						
						
					 
					
						2025-07-02 06:32:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1a03dd496b 
					 
					
						
						
							
							[Bugfix] Fix dynamic rotary embedding ( #20343 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-02 06:31:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						27b8017636 
					 
					
						
						
							
							[FIX][Intel GPU]fix ipex flash_attn_varlen_func api missing parameter ( #20348 )  
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-01 22:26:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ec1e3065a 
					 
					
						
						
							
							[Misc][Doc] Add missing comment for LLM ( #20285 )  
						
						... 
						
						
						
						Signed-off-by: Lifan Shen <lifans@meta.com > 
						
						
					 
					
						2025-07-01 19:04:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9dae7d46bf 
					 
					
						
						
							
							[Refactor] Remove Unused Env VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON ( #20334 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-01 19:03:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7058d7dd5d 
					 
					
						
						
							
							[Refactor] Remove duplicate find_free_port ( #20333 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-01 19:03:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a0389e0554 
					 
					
						
						
							
							[UT][intel GPU] use current_platform instead of device hardcode in v1 tests ( #20169 )  
						
						... 
						
						
						
						Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com > 
						
						
					 
					
						2025-07-02 09:06:04 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3be8d312a2 
					 
					
						
						
							
							[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 ( #20324 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-07-01 18:05:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3abfe22154 
					 
					
						
						
							
							Enable group size 64 for Machete ( #20290 )  
						
						... 
						
						
						
						Signed-off-by: czhu-cohere <conway.zhu@cohere.com > 
						
						
					 
					
						2025-07-01 18:05:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e81fbefe8a 
					 
					
						
						
							
							[Refactor] Refactor import utils ( #20269 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-01 18:05:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9290de5667 
					 
					
						
						
							
							remove unused variables in marlin_template.h ( #20236 )  
						
						
						
						
					 
					
						2025-07-02 00:51:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7f280d69c9 
					 
					
						
						
							
							[Optimization] Cache sampled token ids in model runner ( #20291 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-01 11:01:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						02cabff207 
					 
					
						
						
							
							[V1] [ROCm] Enable EP with AITER Fused MoE ( #20270 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-07-01 16:48:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d19d47d91 
					 
					
						
						
							
							[Frontend] Expand tools even if tool_choice="none" ( #17177 )  
						
						... 
						
						
						
						Signed-off-by: okada shintarou <okada@preferred.jp > 
						
						
					 
					
						2025-07-01 12:47:38 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8acb4badee 
					 
					
						
						
							
							[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling ( #20301 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-01 09:07:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						314af8617c 
					 
					
						
						
							
							[Docs] Update transcriptions API to use openai client with stream=True  ( #20271 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-01 15:47:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0e96cc9b7e 
					 
					
						
						
							
							[Misc] Minor refactoring for scheduler ( #20299 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-01 07:55:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ecad851cbd 
					 
					
						
						
							
							[Model]Add Tencent HunYuanMoEV1 Model Support ( #20114 )  
						
						... 
						
						
						
						Signed-off-by: aiyiwang <aiyiwang@tencent.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: quinnrong <quinnrong@tencent.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-01 07:28:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed70f3c64f 
					 
					
						
						
							
							Add GLM4.1V model (Draft) ( #19331 )  
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-01 12:48:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						650d5dbd04 
					 
					
						
						
							
							[Misc] Minor refactor of NIXL background handshake ( #20068 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-01 12:40:14 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9025a9a705 
					 
					
						
						
							
							[Quant] [Bugfix] Fix quantization config matching with hf_to_vllm_mapper ( #20046 )  
						
						
						
						
					 
					
						2025-07-01 19:20:34 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c05596f1a3 
					 
					
						
						
							
							[Perf] Validate @config in pre-commit instead of dynamically ( #20200 )  
						
						... 
						
						
						
						Signed-off-by: Lionel Villard <villard@us.ibm.com > 
						
						
					 
					
						2025-07-01 05:10:28 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						787b13389e 
					 
					
						
						
							
							[doc] fix the incorrect logo in dark mode ( #20289 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-01 08:18:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						96453cfa83 
					 
					
						
						
							
							[BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine ( #19067 )  
						
						... 
						
						
						
						Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com > 
						
						
					 
					
						2025-07-01 16:12:19 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b1c1fe35a5 
					 
					
						
						
							
							[Misc] remove redundant char ( #20287 )  
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-07-01 15:33:22 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						08d81f1014 
					 
					
						
						
							
							[Bugfix] Fix deepep tests ( #20288 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-01 15:29:08 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6cc1e7d96d 
					 
					
						
						
							
							[CPU] Update custom ops for the CPU backend ( #20255 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-01 07:25:03 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9909726d2a 
					 
					
						
						
							
							Enable ZP Support for Machete ( #20268 )  
						
						... 
						
						
						
						Signed-off-by: czhu-cohere <conway.zhu@cohere.com > 
						
						
					 
					
						2025-07-01 07:12:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						22e9d42040 
					 
					
						
						
							
							[Misc] add xgrammar for arm64 ( #18359 )  
						
						... 
						
						
						
						Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com > 
						
						
					 
					
						2025-07-01 07:02:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						86debab54c 
					 
					
						
						
							
							Fix numel() downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 ( #17082 )  
						
						... 
						
						
						
						Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-01 06:48:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						be250bbc67 
					 
					
						
						
							
							[V1] Only print cudagraph tqdm on rank 0 with is_global_first_rank ( #19516 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-01 06:02:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						27949354fa 
					 
					
						
						
							
							[Feature] A calibration-free RTN-based quantization for accurate and accelerated INT4/INT8 inference ( #18768 )  
						
						... 
						
						
						
						Signed-off-by: Alex Kogan <alex.kogan@oracle.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-01 05:44:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bd5038af07 
					 
					
						
						
							
							[Doc] add config and troubleshooting guide for NCCL & GPUDirect RDMA ( #15897 )  
						
						... 
						
						
						
						Signed-off-by: Ernest Wong <chwong719@gmail.com > 
						
						
					 
					
						2025-06-30 21:44:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a2f14dc8f9 
					 
					
						
						
							
							[CI][Intel Gaudi][vllm-Plugin]Add CI for hpu-plugin-v1-test ( #20196 )  
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-07-01 04:17:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						92ee7baaf9 
					 
					
						
						
							
							[Example] add one-click runnable example for P2P NCCL XpYd ( #20246 )  
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu > 
						
						
					 
					
						2025-06-30 21:03:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7151f92241 
					 
					
						
						
							
							[Misc] Fix spec decode example ( #20296 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-30 21:01:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e28533a16f 
					 
					
						
						
							
							[Bugfix] Fix include prompt in stream response when echo=true ( #15233 )  
						
						... 
						
						
						
						Signed-off-by: Yuan Fang <yuanfang@alauda.io > 
						
						
					 
					
						2025-07-01 01:30:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d42ce8315 
					 
					
						
						
							
							[CLI] Improve CLI arg parsing for -O/--compilation-config ( #20156 )  
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-07-01 01:03:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ded1fb635b 
					 
					
						
						
							
							[Bugfix][V1][P/D]Fix the issue of occasional garbled output  for P2pNcclConnector ( #20263 )  
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com > 
						
						
					 
					
						2025-06-30 16:45:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						97d9524fe9 
					 
					
						
						
							
							[Refactor] Remove useless pdb comment ( #20266 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-30 18:15:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d8cf819a9a 
					 
					
						
						
							
							[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models ( #20058 )  
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-06-30 17:26:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						551ef1631a 
					 
					
						
						
							
							[Unit Test] Add unit test for deep gemm ( #20090 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-06-30 10:26:42 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2863befce3 
					 
					
						
						
							
							[Optimization] Use Shared CachedRequestData Instance Across All Requests ( #20232 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-30 09:07:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2965c99c86 
					 
					
						
						
							
							[Spec Decode] Clean up spec decode example ( #20240 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-30 08:28:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2062c0723d 
					 
					
						
						
							
							[Spec Decode] Refactor spec decoding into a separate function ( #20238 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-30 08:13:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1c50e100a9 
					 
					
						
						
							
							[Bugfix] fix quark ptpc ( #20251 )  
						
						... 
						
						
						
						Signed-off-by: Haoyang Li <Haoyang.Li@amd.com >
Co-authored-by: Haoyang Li <307790822@qq.com > 
						
						
					 
					
						2025-06-30 22:24:50 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3ee56e26be 
					 
					
						
						
							
							[Docs] Fix 1-2-3 list in v1/prefix_caching.md ( #20243 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-06-30 11:20:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8fe7fc8634 
					 
					
						
						
							
							[Quantization] Improve BitsAndBytesModelLoader ( #20242 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-30 18:22:09 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e936e401de 
					 
					
						
						
							
							[Bugfix] Fix processor initialization in transformers 4.53.0 ( #20244 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-30 10:16:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f5dfa07531 
					 
					
						
						
							
							[Bugfix] Skip loading extra parameters for modelopt Qwen3 MoE model ( #19598 )  
						
						... 
						
						
						
						Signed-off-by: noiji <> 
						
						
					 
					
						2025-06-30 18:21:56 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						022c58b80f 
					 
					
						
						
							
							[doc] Add Slack and Forum to the top navigation ( #20208 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-30 07:53:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						19108ef311 
					 
					
						
						
							
							[Misc] Fix import ( #20233 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-29 20:34:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5a52f389dd 
					 
					
						
						
							
							[BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert ( #20202 )  
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-06-29 19:46:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65b1cbb138 
					 
					
						
						
							
							[Model] support dots1 ( #18254 )  
						
						... 
						
						
						
						Signed-off-by: redmoe-moutain <agiredmoe@gmail.com > 
						
						
					 
					
						2025-06-29 19:34:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6c9837a761 
					 
					
						
						
							
							Fix cuda_archs_loose_intersection when handling sm_*a ( #20207 )  
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-06-29 16:52:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6f2f53a82d 
					 
					
						
						
							
							[Quantization] Add compressed-tensors NVFP4 MoE Support ( #19990 )  
						
						... 
						
						
						
						Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-06-29 22:05:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7b1895e6ce 
					 
					
						
						
							
							[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation ( #20213 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-29 10:31:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4d36693687 
					 
					
						
						
							
							[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx ( #20187 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-28 22:06:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						daec9dea6e 
					 
					
						
						
							
							[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution ( #20137 )  
						
						... 
						
						
						
						Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com > 
						
						
					 
					
						2025-06-28 08:16:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						daceac57c7 
					 
					
						
						
							
							[Frontend] Generalize v1/audio/transcriptions endpoint ( #20179 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-06-28 08:15:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8615d9776f 
					 
					
						
						
							
							[CI/Build] Add new CI job to validate Hybrid Models for every PR  ( #20147 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-06-27 23:00:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7b460c25f9 
					 
					
						
						
							
							[BugFix] Fix the incorrect func name in the comments. (config.py) ( #20185 )  
						
						
						
						
					 
					
						2025-06-27 22:51:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f719772281 
					 
					
						
						
							
							[Bugfix] Properly reject requests with empty list guided_choice ( #20195 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-27 22:50:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d45417b804 
					 
					
						
						
							
							fix ci issue distributed 4 gpu test ( #20204 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-27 22:50:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a29e62ea34 
					 
					
						
						
							
							Fix num_token_padding support for static per-tensor scaled_fp8_quant ( #20188 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-27 22:48:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e53be6f00a 
					 
					
						
						
							
							[Misc] Add type assertion of request_id for LLMEngine.add_request ( #19700 )  
						
						... 
						
						
						
						Signed-off-by: n2ptr <xuzhanchaomail@163.com > 
						
						
					 
					
						2025-06-27 22:47:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c329ceca6d 
					 
					
						
						
							
							[CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes ( #20199 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-28 13:43:06 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c545c0c3b 
					 
					
						
						
							
							[CI/Build] Allow hermetic builds ( #18064 )  
						
						... 
						
						
						
						Signed-off-by: Fabien Dupont <fdupont@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Fabien Dupont <fabiendupont@pm.me >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Elias Levy <eliaslevy@google.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-06-27 09:04:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e8c3bd2cd1 
					 
					
						
						
							
							[Bugfix] Fix some narrowing conversion warnings ( #20141 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-06-27 09:01:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c6c983053d 
					 
					
						
						
							
							[Bugfix] Mark 'hidden_states' as mutable in moe_forward registration. ( #20152 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-06-27 09:42:22 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aafabaa0d5 
					 
					
						
						
							
							[Fix][torch.compile] Enable custom ops by default when Inductor off ( #20102 )  
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-06-27 09:00:42 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						94a55c7681 
					 
					
						
						
							
							[Fix][ROCm] Remove unused variables to fix build error on GFX11/12 ( #19891 )  
						
						... 
						
						
						
						Signed-off-by: Hosang Yoon <hosang.yoon@amd.com > 
						
						
					 
					
						2025-06-27 07:14:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa0dc77ef5 
					 
					
						
						
							
							[Perf] Improved perf for resolve_chat_template_content_format ( #20065 )  
						
						... 
						
						
						
						Signed-off-by: Ilya Lavrenov <ilya.lavrenov@cerebras.net > 
						
						
					 
					
						2025-06-27 09:16:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ab3ac285e 
					 
					
						
						
							
							[Bugfix] Fix flaky failure when getting DP ports ( #20151 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-27 15:30:53 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1c956dc0f 
					 
					
						
						
							
							Gemma3n (Text-only) ( #20134 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-06-27 07:16:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dec197e3e5 
					 
					
						
						
							
							Quick Fix by adding conditional import for flash_attn_varlen_func in flash_attn ( #20143 )  
						
						... 
						
						
						
						Signed-off-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-06-27 05:48:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e244ae091 
					 
					
						
						
							
							[Perf][Frontend] eliminate api_key and x_request_id headers middleware overhead ( #19946 )  
						
						... 
						
						
						
						Signed-off-by: Yazan-Sharaya <yazan.sharaya.yes@gmail.com > 
						
						
					 
					
						2025-06-27 00:44:14 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cd4cfee689 
					 
					
						
						
							
							[Model][1/N] Automatic conversion of CrossEncoding model ( #20012 )  
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-06-26 21:10:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e110930680 
					 
					
						
						
							
							[Fix] Fix gemma CI test failing on main ( #20124 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-06-26 21:06:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8b64c895c0 
					 
					
						
						
							
							[CI] Sync test dependency with test.in for torch nightly ( #19632 )  
						
						... 
						
						
						
						Signed-off-by: Yang Wang <elainewy@meta.com >
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Concurrensee <yida.wu@amd.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-26 20:55:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0740e29b66 
					 
					
						
						
							
							[Feature] add quick all reduce ( #19744 )  
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-06-26 20:54:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						44d2e6af63 
					 
					
						
						
							
							[Bugfix] Build moe_data for both sm100 and sm90 ( #20086 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-26 20:50:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2d7779f888 
					 
					
						
						
							
							[Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler ( #20071 )  
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-06-26 20:50:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a57d57fa72 
					 
					
						
						
							
							[Quantization] Bump to use latest compressed-tensors ( #20033 )  
						
						... 
						
						
						
						Signed-off-by: Dipika <dipikasikka1@gmail.com >
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-06-26 20:50:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71799fd005 
					 
					
						
						
							
							[CI Failure] Fix OOM with test_oot_registration_embedding ( #20144 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-27 11:21:04 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e9fd658a73 
					 
					
						
						
							
							[Feature] Expert Parallelism Load Balancer (EPLB) ( #18343 )  
						
						... 
						
						
						
						Signed-off-by: Bowen Wang <abmfy@icloud.com > 
						
						
					 
					
						2025-06-26 15:30:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						07b8fae219 
					 
					
						
						
							
							[Doc] correct LoRA capitalization ( #20135 )  
						
						... 
						
						
						
						Signed-off-by: kyolebu <kyu@redhat.com > 
						
						
					 
					
						2025-06-26 15:22:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						562308816c 
					 
					
						
						
							
							[Refactor] Rename commnication utils ( #20091 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-26 22:19:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						04e1642e32 
					 
					
						
						
							
							[TPU] add kv cache update kernel ( #19928 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-06-26 10:01:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b69781f107 
					 
					
						
						
							
							[Hardware][Intel GPU] Add v1 Intel GPU support with Flash attention backend. ( #19560 )  
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-06-26 09:27:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0bceac9810 
					 
					
						
						
							
							Spam folks if config.py changes ( #20131 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-06-26 08:19:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						34878a0b48 
					 
					
						
						
							
							[Doc] Rename page titles ( #20130 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-26 08:18:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6393b03986 
					 
					
						
						
							
							[Doc] Auto sign-off for VSCode ( #20132 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-26 08:18:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0907d507bf 
					 
					
						
						
							
							[Doc] Automatically signed-off by PyCharm ( #20120 )  
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-06-26 14:34:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c894c5dc1f 
					 
					
						
						
							
							[Bug Fix] Fix address/port already in use error for deep_ep test ( #20094 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-26 22:33:13 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1f5d178e9c 
					 
					
						
						
							
							Revert "[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine" ( #20128 )  
						
						
						
						
					 
					
						2025-06-26 07:32:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						27c065df50 
					 
					
						
						
							
							[Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) ( #19904 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-06-26 12:42:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						84c260caeb 
					 
					
						
						
							
							[Docs] Improve frameworks/helm.md ( #20113 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-06-26 10:41:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						167aca45cb 
					 
					
						
						
							
							[Misc] Use collapsible blocks for benchmark examples. ( #20017 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-26 03:35:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0567c8249f 
					 
					
						
						
							
							[CPU] Fix torch version in x86 CPU backend ( #19258 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-06-26 03:34:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d188913d99 
					 
					
						
						
							
							[Refactor] Remove unused library ( #20099 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-26 09:16:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1d7c29f5fe 
					 
					
						
						
							
							[Doc] Update docs for New Model Implementation ( #20115 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-26 00:47:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65397e40f5 
					 
					
						
						
							
							[Bugfix] Allow CUDA_VISIBLE_DEVICES='' in Platform.device_id_to_physical_device_id ( #18979 )  
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-06-26 00:01:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9502c38138 
					 
					
						
						
							
							[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline ( #20083 )  
						
						
						
						
					 
					
						2025-06-25 22:06:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2582683566 
					 
					
						
						
							
							[PD] Skip tp_size exchange with rank0 ( #19413 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-06-25 20:04:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						754b00edb3 
					 
					
						
						
							
							[Bugfix] Fix Mistral tool-parser regex for nested JSON ( #20093 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-26 01:01:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						296ce95d8e 
					 
					
						
						
							
							[CI] Add SM120 to the Dockerfile ( #19794 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-25 16:23:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2d7620c3eb 
					 
					
						
						
							
							[TPU] Add TPU specific var VLLM_TPU_MOST_MODEL_LEN ( #19919 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-06-25 15:51:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55c65ab495 
					 
					
						
						
							
							[P/D] Avoid stranding blocks in P when aborted in D's waiting queue ( #19223 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-25 15:19:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2cc2069970 
					 
					
						
						
							
							[TPU][Bugfix] fix kv cache padding ( #20048 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-06-25 21:24:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9f0608fc16 
					 
					
						
						
							
							[Bugfix] default set cuda_graph_sizes to max_num_seqs for v1 engine ( #20062 )  
						
						... 
						
						
						
						Signed-off-by: izhuhaoran <izhuhaoran@qq.com > 
						
						
					 
					
						2025-06-25 21:03:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e0db57fff 
					 
					
						
						
							
							Fix the path to the testing script. ( #20082 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-06-25 20:48:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c40692bf9a 
					 
					
						
						
							
							[Misc] Add parallel state node_count function ( #20045 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-25 13:38:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4734704b30 
					 
					
						
						
							
							[PD] let toy proxy handle /chat/completions ( #19730 )  
						
						... 
						
						
						
						Signed-off-by: Linkun <github@lkchen.net > 
						
						
					 
					
						2025-06-25 15:17:45 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8b8c209e35 
					 
					
						
						
							
							static_scaled_fp8_quant should not run when scale.numel is not 1 ( #20076 )  
						
						
						
						
					 
					
						2025-06-25 15:08:03 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23a04e0895 
					 
					
						
						
							
							[Fix] Support cls pooling in ModernBertPooler ( #20067 )  
						
						... 
						
						
						
						Signed-off-by: shengzhe.li <shengzhe.li@sbintuitions.co.jp > 
						
						
					 
					
						2025-06-25 15:07:45 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						02c97d9a92 
					 
					
						
						
							
							[Quantization] Add compressed-tensors emulations support for NVFP4 ( #19879 )  
						
						... 
						
						
						
						Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-06-25 14:28:19 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e795d723ed 
					 
					
						
						
							
							[Frontend] Add /v1/audio/translations OpenAI API endpoint ( #19615 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-06-25 17:54:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8359f4c8d8 
					 
					
						
						
							
							[V1][Speculative Decoding] Fix DeepSeek MTP ( #20022 )  
						
						... 
						
						
						
						Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com > 
						
						
					 
					
						2025-06-25 08:41:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bf5181583f 
					 
					
						
						
							
							[Doc] Guide for Incremental Compilation Workflow ( #19109 )  
						
						
						
						
					 
					
						2025-06-25 22:06:46 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c53fec1fcb 
					 
					
						
						
							
							[doc] add reference link for Intel XPU ( #20064 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-25 12:24:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f9e7354f5 
					 
					
						
						
							
							[BugFix] Fix full-cuda-graph illegal memory access in FA3 ( #20057 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-06-25 08:39:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ba7ba35cda 
					 
					
						
						
							
							[Chore] debloat some initial logs ( #19438 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-06-25 06:36:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						015fab8c2f 
					 
					
						
						
							
							[Kernels][Bugfix] Use torch op for all kernels in FusedMoE forward.  Add additional testing for cudagraphs. ( #19717 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-06-24 23:22:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f59fc60fb3 
					 
					
						
						
							
							[Feat][CLI] enforce-include-usage ( #19695 )  
						
						... 
						
						
						
						Signed-off-by: Max Wittig <max.wittig@siemens.com > 
						
						
					 
					
						2025-06-25 01:43:04 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						879f69bed3 
					 
					
						
						
							
							[Refactor] Remove duplicate ceil_div ( #20023 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-25 05:19:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7108934142 
					 
					
						
						
							
							[Frontend] speed up import time of vllm.config ( #18036 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-06-25 00:41:11 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3443aaf8dd 
					 
					
						
						
							
							Move to a faster base64 implementation ( #19984 )  
						
						... 
						
						
						
						Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai > 
						
						
					 
					
						2025-06-24 20:33:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2273ec322c 
					 
					
						
						
							
							Revert "Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor" ( #20030 )  
						
						
						
						
					 
					
						2025-06-25 11:23:29 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6c4b87fbc 
					 
					
						
						
							
							Revert "[Feature] Integrate new deepgemm ( #19820 )" ( #20049 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-24 19:45:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1afa9948f5 
					 
					
						
						
							
							[Llama4] Update attn_temperature_tuning ( #19997 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-06-24 22:42:53 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d06b533a0 
					 
					
						
						
							
							cmake: Update vllm_flash_attn for vllm_kernels ( #20032 )  
						
						... 
						
						
						
						Signed-off-by: Eli Uriegas <eliuriegas@meta.com > 
						
						
					 
					
						2025-06-24 22:44:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c01d1c5aba 
					 
					
						
						
							
							use .dev for version comparison with pytorch nightly release ( #20031 )  
						
						... 
						
						
						
						Signed-off-by: Boyuan Feng <boyuan@meta.com > 
						
						
					 
					
						2025-06-24 21:52:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ead369845d 
					 
					
						
						
							
							[Easy] Remove submodule added in  #19463  ( #20039 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-06-24 13:23:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c6e3bba8e6 
					 
					
						
						
							
							[Feature] Integrate new deepgemm ( #19820 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-24 12:51:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						91f7d9d0b6 
					 
					
						
						
							
							[P/D] Asynchronously do _nixl_handshake ( #19836 )  
						
						... 
						
						
						
						Signed-off-by: Linkun Chen <github@lkchen.net >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-24 12:46:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8619e7158c 
					 
					
						
						
							
							[BugFix] Fix multi-node offline data parallel ( #19937 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-24 12:45:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c635c5f744 
					 
					
						
						
							
							[Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. ( #19423 )  
						
						... 
						
						
						
						Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b .c.jetbrains-grazie.internal>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b .c.jetbrains-grazie.internal>
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-06-24 18:41:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a045b7e89a 
					 
					
						
						
							
							[Perf] Improve/Fix-regression for FA3 in High QPS regimes ( #19463 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-06-24 13:09:01 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						981eeca41a 
					 
					
						
						
							
							[Fix][V1] Remove --scheduling-policy oracle ( #20010 )  
						
						... 
						
						
						
						Signed-off-by: amit <amit.man@gmail.com > 
						
						
					 
					
						2025-06-24 09:52:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						26d34eb67e 
					 
					
						
						
							
							refactor example - qwen3_reranker ( #19847 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-24 14:03:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						53da4cd397 
					 
					
						
						
							
							[Bugfix][CPU] Fix InputBatch for pooling models in the CPU v1 ( #20014 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-06-24 13:20:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9a3b88328f 
					 
					
						
						
							
							[PERF] Speedup of MRoPE prepare inputs ( #19939 )  
						
						... 
						
						
						
						Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai > 
						
						
					 
					
						2025-06-23 23:01:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3014c920da 
					 
					
						
						
							
							add some examples for other benchmark scripts ( #19893 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-24 05:57:46 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0eed516951 
					 
					
						
						
							
							[doc] Fix broken link in the installation for CPU ( #19980 )  
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-06-24 12:04:11 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ee5ad8d2c5 
					 
					
						
						
							
							[Misc][Tools][Benchmark] Add profile to autotune script ( #19711 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-06-24 00:59:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a738dbb2a1 
					 
					
						
						
							
							Update test case parameter to have the throughput above 8.0 ( #19994 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-06-24 00:18:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						33d5e29be9 
					 
					
						
						
							
							[TPU] Fix tpu model runner test ( #19995 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-06-23 16:04:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4671ac6e2a 
					 
					
						
						
							
							[Bugfix][Benchmark] Fix Marlin benchmark ( #19929 )  
						
						
						
						
					 
					
						2025-06-24 07:25:12 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dd2ccf8dde 
					 
					
						
						
							
							Feat Dynamic Quantization for MoE Layers in GPTQ Marlin Backend ( #19395 )  
						
						
						
						
					 
					
						2025-06-24 07:23:28 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a3bc76e4b5 
					 
					
						
						
							
							[CI/Build] Push latest tag for cpu and neuron docker image ( #19897 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-23 14:15:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e6327c9b3e 
					 
					
						
						
							
							[Feature] Support sequence parallelism for static fp8 quantization ( #19181 )  
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com > 
						
						
					 
					
						2025-06-23 16:09:02 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d0132f025d 
					 
					
						
						
							
							[Misc] Add type alias ReqId and EngineId for better readability ( #19880 )  
						
						... 
						
						
						
						Signed-off-by: Linkun Chen <github@lkchen.net > 
						
						
					 
					
						2025-06-23 12:57:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61f4fc5dc6 
					 
					
						
						
							
							[Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 ( #19956 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-23 18:38:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						68aaeb3749 
					 
					
						
						
							
							[EP+DP] Optimize the little operations in the DeepGEMM + DeepEP low latency case ( #19885 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-06-23 11:07:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c3649e4fee 
					 
					
						
						
							
							[Docs] Fix syntax highlighting of shell commands ( #19870 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-06-23 17:59:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						53243e5c42 
					 
					
						
						
							
							[doc] improve readability for long commands ( #19920 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-23 14:27:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6e6604d32 
					 
					
						
						
							
							[Bugfix] Fix CI bitsandbytes failure ( #19969 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-23 21:30:55 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b82e0f82cb 
					 
					
						
						
							
							[doc] use MkDocs collapsible blocks - supplement ( #19973 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-23 10:54:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5111642a6f 
					 
					
						
						
							
							[Doc] Update V1 status for decoder-only embedding models ( #19952 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-23 09:31:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1bcd15edc7 
					 
					
						
						
							
							[BugFix][P/D] Fix for cases where _recving_transfers can be cleaned up when *all* transfer done ( #19874 )  
						
						... 
						
						
						
						Signed-off-by: Linkun Chen <github@lkchen.net > 
						
						
					 
					
						2025-06-22 22:41:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ebff5b77c 
					 
					
						
						
							
							[P/D][NixlConnector] Support tp_size > num_kv_heads deployments ( #19691 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-22 22:41:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f17aec0d63 
					 
					
						
						
							
							[doc] Fold long code blocks to improve readability ( #19926 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-23 05:24:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						493c275352 
					 
					
						
						
							
							Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor ( #19643 )  
						
						... 
						
						
						
						Signed-off-by: Vensenmu <vensenmu@gmail.com > 
						
						
					 
					
						2025-06-23 03:40:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f39ab2d4bd 
					 
					
						
						
							
							[Misc] Configurable timeout for execute_model RPC calls via env var ( #19544 )  
						
						... 
						
						
						
						Signed-off-by: jinqinn <goodqinjin@163.com > 
						
						
					 
					
						2025-06-22 20:36:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4a0f7888a3 
					 
					
						
						
							
							[Core] feat: Implement Priority Scheduling in V1 Engine ( #19057 )  
						
						... 
						
						
						
						Signed-off-by: amit <amit.man@gmail.com >
Co-authored-by: Roger Wang <Rogerw0108@gmail.com > 
						
						
					 
					
						2025-06-22 20:18:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c4cf260677 
					 
					
						
						
							
							[Perf][CLI] Improve overall startup time ( #19941 )  
						
						
						
						
					 
					
						2025-06-22 23:11:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						33d51f599e 
					 
					
						
						
							
							[BugFix] Add an env to disable moe chunking to work around compile incompatibility ( #19642 )  
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-06-22 15:17:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e91386cde1 
					 
					
						
						
							
							[Chore] dedup logs ( #19955 )  
						
						
						
						
					 
					
						2025-06-22 19:43:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c11a29f0b 
					 
					
						
						
							
							[Misc] Simplify vllm bench cli subcommand implementation ( #19948 )  
						
						
						
						
					 
					
						2025-06-22 12:34:48 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c76a506bd6 
					 
					
						
						
							
							[Misc] Update model-specific PR tagging ( #19949 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-06-22 12:16:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec0db6f51c 
					 
					
						
						
							
							[doc] use snippets for contact us ( #19944 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-22 10:26:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c305a2109d 
					 
					
						
						
							
							[CI/Build] Auto tag perf benchmarks related PRs ( #19943 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-22 08:46:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						202c5df935 
					 
					
						
						
							
							[Benchmark] fix request loss if "ping" is returned ( #19535 )  
						
						... 
						
						
						
						Signed-off-by: Wang, Yi A <yi.a.wang@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-06-22 07:21:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2bb246b8f7 
					 
					
						
						
							
							[MISC] add cpu_kvcache_space_bytes to CacheConfig ( #19812 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-22 13:39:09 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4c409cabc2 
					 
					
						
						
							
							[Misc] add vllm_config in __init__ ( #19866 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-21 23:10:46 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3b1e4c6a23 
					 
					
						
						
							
							[Docs] Add GPT2ForSequenceClassification to supported models in docs ( #19932 )  
						
						... 
						
						
						
						Signed-off-by: nie3e <adrcwiek@gmail.com > 
						
						
					 
					
						2025-06-21 20:57:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c5302fadd 
					 
					
						
						
							
							[Multimodal] Optimize Qwen2/2.5-VL startup time ( #19756 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-06-21 20:01:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						caa680fd2e 
					 
					
						
						
							
							[doc] add contact us in community ( #19922 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-21 17:29:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c3bf9bad11 
					 
					
						
						
							
							[New model support]Support Tarsier2 ( #19887 )  
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-06-21 04:01:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6f170f11dd 
					 
					
						
						
							
							[Bugfix] Fix bnb 8bit model weights loading ( #19917 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-21 03:29:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8ca81bb069 
					 
					
						
						
							
							Fix: Check the type of params to be a Sequence not list. ( #19910 )  
						
						... 
						
						
						
						Signed-off-by: Rabin Adhikari <rabin.adk1@gmail.com > 
						
						
					 
					
						2025-06-20 23:03:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e773a9e1c2 
					 
					
						
						
							
							[Misc] Clean up useless code ( #19889 )  
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-06-20 21:09:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71baf85ae1 
					 
					
						
						
							
							[Kernel] mark TorchSDPABackend swap_blocks NotImplementedError ( #19749 )  
						
						
						
						
					 
					
						2025-06-20 18:18:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						79f2f1c2a1 
					 
					
						
						
							
							[CPU][CI] Fallback sliding window to v0 and fix CPU pooling model tests ( #19901 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-06-20 15:30:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e3e3c86dc 
					 
					
						
						
							
							Export NaNs in logits to scheduler_stats if output is corrupted ( #18777 )  
						
						... 
						
						
						
						Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com > 
						
						
					 
					
						2025-06-20 22:47:16 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e8977fcd4 
					 
					
						
						
							
							[custom_op][vllm-plugin] update custom_op class to use op_registry ( #19164 )  
						
						... 
						
						
						
						Signed-off-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-06-20 07:44:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f1e840e842 
					 
					
						
						
							
							[Model] GPT2ForSequenceClassification model ( #19663 )  
						
						... 
						
						
						
						Signed-off-by: nie3e <adrcwiek@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-06-20 12:07:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7771d1de88 
					 
					
						
						
							
							[Fix] import regex instead of re ( #19875 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-06-20 11:16:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71d1219545 
					 
					
						
						
							
							[Kernel] correct cpu worker function parameter type ( #19745 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-20 10:50:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e384f2f108 
					 
					
						
						
							
							[Misc] refactor example - openai_transcription_client ( #19851 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-20 08:02:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						089a306f19 
					 
					
						
						
							
							[Misc] update cuda version ( #19526 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-20 07:25:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5e666f72cd 
					 
					
						
						
							
							[Bugfix][Ray] Set the cuda context eagerly in the ray worker  ( #19583 )  
						
						
						
						
					 
					
						2025-06-19 22:01:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e3a3e4db46 
					 
					
						
						
							
							[Bugfix] Enable PP with AITER+V1 ( #19822 )  
						
						... 
						
						
						
						Signed-off-by: Qiang Li <qiang.li2@amd.com > 
						
						
					 
					
						2025-06-20 12:43:20 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e41bf15cd0 
					 
					
						
						
							
							[Chore]: qwen3-moe-type-hints-mistake ( #19860 )  
						
						... 
						
						
						
						Co-authored-by: xinnan.hou <hxn02029096@alibaba-inc.com > 
						
						
					 
					
						2025-06-19 21:43:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5aa4a015ce 
					 
					
						
						
							
							[Benchmark] Fix Value of type "SampleRequest" is not indexable ( #18032 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-06-19 21:28:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6bad3d186 
					 
					
						
						
							
							[CI][Neuron] Fail and exit on first error ( #19622 )  
						
						... 
						
						
						
						Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-06-20 12:27:51 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ee9a1531aa 
					 
					
						
						
							
							[CI/Build][Bugfix] Fix deadlock on v1 engine test CI ( #19872 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-20 09:51:07 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						10d82f9ac5 
					 
					
						
						
							
							[Benchmark][Bugfix] Fix Dataset Length Calculation ( #19868 )  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-06-19 18:30:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ea10dd9d9e 
					 
					
						
						
							
							[Frontend] early return chat format resolution when specified ( #19735 )  
						
						
						
						
					 
					
						2025-06-19 18:49:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ead2110297 
					 
					
						
						
							
							[Core][Bugfix] Fix Online MM Beam Search ( #19688 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-06-19 17:18:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						01220ce89a 
					 
					
						
						
							
							[CI][CPU] Improve dummy Triton interfaces and fix the CPU CI ( #19838 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-06-19 15:46:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6f68c49220 
					 
					
						
						
							
							[Doc] Update V1 user guide for embedding models ( #19842 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-19 09:43:27 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4719460644 
					 
					
						
						
							
							Fixing Chunked Prefill Test. ( #19762 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-06-19 01:36:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						466166dcfd 
					 
					
						
						
							
							[Frontend] Add optional token-level progress bar to LLM.beam_search ( #19301 )  
						
						... 
						
						
						
						Signed-off-by: Ruosen Li <rxl190028@utdallas.edu >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Ubuntu <ubuntu@ip-172-31-71-179.ec2.internal >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-19 03:21:41 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1d0ae26c85 
					 
					
						
						
							
							Add xLAM tool parser support ( #17148 )  
						
						
						
						
					 
					
						2025-06-19 14:26:41 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6021999573 
					 
					
						
						
							
							[Minor] Allow redirecting model path for HfRunner in test ( #19795 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-18 23:04:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c7b370c603 
					 
					
						
						
							
							raise exception for pin_lora ( #19809 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-18 22:57:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa20d10a91 
					 
					
						
						
							
							[Misc] [ROCm] Prevent surplus tensor reshape ( #19803 )  
						
						... 
						
						
						
						Signed-off-by: Zsolt Borbely <zsolt.borbely@htecgroup.com > 
						
						
					 
					
						2025-06-19 13:57:16 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2de12be428 
					 
					
						
						
							
							[ROCm] [AITER] [Bugfix] Patch for AITER commit 648764942e552a8bb5fe16026703716a81f05374 ( #18990 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-06-18 22:56:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83ca9ae47b 
					 
					
						
						
							
							Mark invariant normalizer in Gemma as non-persistent ( #19788 )  
						
						... 
						
						
						
						Signed-off-by: Yu-Hang Tang <Tang.Maxin@gmail.com > 
						
						
					 
					
						2025-06-18 22:56:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e2148dc5ea 
					 
					
						
						
							
							[Bugfix] Add check_health to v1 async client. ( #19821 )  
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-06-18 21:47:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b1098b4072 
					 
					
						
						
							
							[Bugfix] Fix the linter ( #19826 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-18 21:44:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						799397ee4f 
					 
					
						
						
							
							Support embedding models in V1 ( #16188 )  
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-18 21:36:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4959915089 
					 
					
						
						
							
							[Quantization] Modify the logic of BNB double quantization ( #19742 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-19 03:52:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d1e89d946 
					 
					
						
						
							
							[Misc][ROCm] Enforce no unused variable in ROCm C++ files ( #19796 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-18 20:25:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						36239f79dd 
					 
					
						
						
							
							Fix FA2 fallback for Blackwell V1 ( #19781 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-19 09:53:55 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dfada85eee 
					 
					
						
						
							
							[Frontend] Expose custom args in OpenAI APIs ( #16862 )  
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Andrew Feldman <afeldman@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-18 17:41:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed33349738 
					 
					
						
						
							
							[BugFix] Fix use_cudagraph=False ( #19612 )  
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-06-19 08:23:12 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d49adea1f9 
					 
					
						
						
							
							[Multimodal] Use fast processor for Qwen2/2.5-VL ( #19789 )  
						
						
						
						
					 
					
						2025-06-18 15:49:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						14fdd21d39 
					 
					
						
						
							
							[Core] More fixes to MultiModalEmbeddings type handling ( #19715 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-06-18 22:48:29 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						04fefe7c9a 
					 
					
						
						
							
							[TPU] Update torch-xla version to include paged attention tuned block change ( #19813 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-06-18 22:41:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3b523e38d9 
					 
					
						
						
							
							[Core] Do not copy array during hashing ( #19484 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-06-18 15:36:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						16c16301c8 
					 
					
						
						
							
							Disable "Forbid direct 'import triton'" check for vllm/triton_utils/importing.py in an extensible way ( #19783 )  
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@redhat.com > 
						
						
					 
					
						2025-06-18 15:08:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9206d0ff01 
					 
					
						
						
							
							docs: fix Slack bulletpoint in README ( #19811 )  
						
						... 
						
						
						
						Signed-off-by: Nathan Weinberg <nweinber@redhat.com > 
						
						
					 
					
						2025-06-18 20:47:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a89209b78d 
					 
					
						
						
							
							[v1] Support mamba2 ( #19327 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-06-18 20:34:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ffacb222cb 
					 
					
						
						
							
							[Docs] Add Huzaifa Sidhpurwala to vuln mgmt team doc ( #19808 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-06-18 20:22:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						12575cfa7a 
					 
					
						
						
							
							[Bugfix] fix RAY_CGRAPH_get_timeout is not set successfully ( #19725 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-06-18 10:26:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8b6e1d639c 
					 
					
						
						
							
							[Hardware][AMD] integrate aiter chunked prefill into vllm ( #18596 )  
						
						... 
						
						
						
						Signed-off-by: fsx950223 <fsx950223@outlook.com >
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: fsx950223 <fsx950223@outlook.com >
Co-authored-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-06-18 08:46:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						735a9de71f 
					 
					
						
						
							
							[Qwen] Add tagging rule for Qwen related PRs ( #19799 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-18 14:26:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						257ab95439 
					 
					
						
						
							
							[Platform] Allow platform use V1 Engine by default ( #19792 )  
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-06-18 13:03:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cca91a7a10 
					 
					
						
						
							
							[doc] fix the incorrect label ( #19787 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-18 10:30:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f04d604567 
					 
					
						
						
							
							[Minor] Zero-initialize attn output buffer ( #19784 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-18 06:59:27 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						19a53b2783 
					 
					
						
						
							
							[V1] Decouple GPU and TPU InputBatch ( #19778 )  
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@redhat.com > 
						
						
					 
					
						2025-06-18 06:38:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eccdc8318c 
					 
					
						
						
							
							[V1][P/D] An native implementation of xPyD based on P2P NCCL ( #18242 )  
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com > 
						
						
					 
					
						2025-06-18 06:32:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5f52a84685 
					 
					
						
						
							
							[V1] Add API docs for EncoderCacheManager ( #19294 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-06-18 13:37:01 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d4629dc43f 
					 
					
						
						
							
							[Misc] Add __str__ for RequestStatus ( #19780 )  
						
						... 
						
						
						
						Signed-off-by: Linkun Chen <github@lkchen.net > 
						
						
					 
					
						2025-06-18 03:03:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e9cc73f67 
					 
					
						
						
							
							[MISC] correct DeviceConfig device field static type analysis ( #19699 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-17 17:21:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c53711bd63 
					 
					
						
						
							
							[MISC] correct copy_blocks src_to_dists param type ( #19696 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-17 17:21:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dac8cc49f4 
					 
					
						
						
							
							[TPU] Update torch version to include paged attention kernel change ( #19706 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-06-17 22:24:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a44b1c951d 
					 
					
						
						
							
							[Feature][ROCm] Add full graph capture support for TritonAttentionBackend ( #19158 )  
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-06-17 17:03:06 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b447624ee3 
					 
					
						
						
							
							[Bugfix] Fix faulty triton importing logic when using Ray for DP ( #19734 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-17 20:59:29 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cda92307c1 
					 
					
						
						
							
							[Misc] Update lmcache connector with the latest connector apis ( #19441 )  
						
						... 
						
						
						
						Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn > 
						
						
					 
					
						2025-06-17 19:57:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bf57ccc5c2 
					 
					
						
						
							
							Remove sm120 arch from sm100 cutlass kernel arch list ( #19716 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-17 11:49:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ffb2cd6b54 
					 
					
						
						
							
							[Perf] Optimize moe_align_block_size CUDA kernel ( #19572 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-17 11:49:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ca94d7fa00 
					 
					
						
						
							
							[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 ( #19151 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-17 15:58:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5a1c2e15d8 
					 
					
						
						
							
							[Mis] remove duplicate engine status checks ( #19647 )  
						
						... 
						
						
						
						Signed-off-by: googs1025 <googs1025@gmail.com > 
						
						
					 
					
						2025-06-17 08:17:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4c8f64faa7 
					 
					
						
						
							
							[V1][Kernel] Flashinfer HND KV cache layout ( #19280 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-06-17 09:09:22 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93aee29fdb 
					 
					
						
						
							
							[doc] split "Other AI Accelerators" tabs ( #19708 )  
						
						
						
						
					 
					
						2025-06-17 22:05:29 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						154d063b9f 
					 
					
						
						
							
							[doc][mkdocs] Add edit  button to documentation ( #19637 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-17 11:10:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ccd7c05089 
					 
					
						
						
							
							[Kernel] Add Split-KV Support to Unified Triton Attention Kernel ( #19152 )  
						
						... 
						
						
						
						Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com > 
						
						
					 
					
						2025-06-17 10:45:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c48c6c4008 
					 
					
						
						
							
							Add a doc on how to update PyTorch version ( #19705 )  
						
						
						
						
					 
					
						2025-06-17 18:10:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aed8468642 
					 
					
						
						
							
							[Doc] Add missing llava family multi-image examples ( #19698 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-17 07:05:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5c76b9cdaf 
					 
					
						
						
							
							[Core] add remove_seq_from_computed_blocks_tracker to BlockSpaceManager ( #19686 )  
						
						... 
						
						
						
						Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn >
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn > 
						
						
					 
					
						2025-06-17 04:40:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ddfed314f9 
					 
					
						
						
							
							Fixes IMA for TP w/ flex-attention ( #19712 )  
						
						... 
						
						
						
						Signed-off-by: drisspg <drisspguessous@gmail.com > 
						
						
					 
					
						2025-06-17 04:01:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b3ad5ecf2 
					 
					
						
						
							
							[DOC] fix doc typos ( #19600 )  
						
						... 
						
						
						
						Signed-off-by: Di Liu <liu-di@sjtu.edu.cn > 
						
						
					 
					
						2025-06-17 11:34:53 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ede5c4ebdf 
					 
					
						
						
							
							[Frontend] add chunking audio for > 30s audio ( #19597 )  
						
						... 
						
						
						
						Signed-off-by: nguyenhoangthuan99 <thuanhppro12@gmail.com > 
						
						
					 
					
						2025-06-17 11:34:00 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						07334959d8 
					 
					
						
						
							
							[Wheel Size] Only build FA2 8.0+PTX ( #19336 )  
						
						
						
						
					 
					
						2025-06-17 12:32:49 +09:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						119f683949 
					 
					
						
						
							
							[doc] add project flag to gcloud TPU command ( #19664 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-06-17 01:00:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0860087aff 
					 
					
						
						
							
							[Fix] Fall back to Gloo when NCCL backend is unavailable ( #19641 )  
						
						... 
						
						
						
						Signed-off-by: conroy-cheers <conroy@corncheese.org > 
						
						
					 
					
						2025-06-17 08:42:14 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6bc7b57315 
					 
					
						
						
							
							[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 ( #19563 )  
						
						
						
						
					 
					
						2025-06-16 17:33:51 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90f9c2eb5c 
					 
					
						
						
							
							[V1] Change return type on get_multimodal_embeddings() ( #19446 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-06-16 13:32:15 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						387bdf0ab9 
					 
					
						
						
							
							[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) ( #19677 )  
						
						... 
						
						
						
						Signed-off-by: QscQ <qscqesze@gmail.com > 
						
						
					 
					
						2025-06-16 09:47:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5e5baa91aa 
					 
					
						
						
							
							[Kernels] Use empty for modular MoE workspaces ( #19667 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-06-16 14:58:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						836d4ce140 
					 
					
						
						
							
							[Bugfix] fix missing 'finish_reason': null in streaming chat ( #19662 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-06-16 14:10:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c3fec47bb7 
					 
					
						
						
							
							[MISC] bump huggingface_hub pkg to 0.33.0 ( #19547 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-16 05:22:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1173804dca 
					 
					
						
						
							
							[Bugfix] Fix TP inference for Flex attention backend ( #19657 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-16 11:21:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4d5424029b 
					 
					
						
						
							
							[Feature]:Allow for Granite MoE Hybrid models with _only_ shared experts. ( #19652 )  
						
						... 
						
						
						
						Signed-off-by: Shawn Tan <shawntan@ibm.com > 
						
						
					 
					
						2025-06-16 11:14:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e7506975c 
					 
					
						
						
							
							[DOC] Add reasoning capability to vLLM streamlit code ( #19557 )  
						
						
						
						
					 
					
						2025-06-16 07:09:12 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ee35e96ac3 
					 
					
						
						
							
							[BugFix] Don't catch BaseException when dumping execute_model errors ( #19626 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-16 11:01:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dec66d253b 
					 
					
						
						
							
							[Kernel] GGUF MMVQ kernel for multiple input vectors ( #18754 )  
						
						... 
						
						
						
						Signed-off-by: SzymonOzog <szymon.ozog@gmail.com > 
						
						
					 
					
						2025-06-16 17:33:26 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d120701fd 
					 
					
						
						
							
							[Docs] Move multiproc doc to v1 dir ( #19651 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-06-16 09:10:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f40f763f12 
					 
					
						
						
							
							[CI] Add mteb testing for rerank models ( #19344 )  
						
						
						
						
					 
					
						2025-06-16 01:36:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						26bc46ef89 
					 
					
						
						
							
							[MISC] typo fix ( #19672 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-16 07:18:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a77aea59fd 
					 
					
						
						
							
							[TPU] support attention head dim smaller than 128 ( #19620 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-16 06:40:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b692e9cd07 
					 
					
						
						
							
							[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config ( #19660 )  
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-06-16 06:30:29 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						367871a469 
					 
					
						
						
							
							[Misc][Frontend] passthrough bad_words ( #19564 )  
						
						... 
						
						
						
						Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai >
Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai >
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com > 
						
						
					 
					
						2025-06-16 05:05:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						92183b41f3 
					 
					
						
						
							
							[Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker ( #18957 )  
						
						... 
						
						
						
						Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn >
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn > 
						
						
					 
					
						2025-06-15 21:56:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c6703d1e0d 
					 
					
						
						
							
							[MISC] Remove unused variableds in C++ ( #19609 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-15 20:05:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a5e7242d5f 
					 
					
						
						
							
							[Misc] Remove duplicate multiproc method setting for CPU platform ( #19649 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-16 02:26:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						91b2c17a55 
					 
					
						
						
							
							[CI/Build] Fix torch nightly CI dependencies part 2 ( #19589 )  
						
						
						
						
					 
					
						2025-06-15 20:01:10 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						055915e6ce 
					 
					
						
						
							
							Enable prefix caching with full cuda graphs ( #19617 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-15 01:05:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d330c4c09 
					 
					
						
						
							
							[Benchmark] Refactor benchmark script for fp8 & int8 ( #19627 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-15 15:15:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0b73736a0d 
					 
					
						
						
							
							[Kernel] Raise verbose error and consolidate num_heads/num_kv_heads divisibility check ( #19339 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-15 13:43:48 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ee1531bc38 
					 
					
						
						
							
							[Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy_correctness ( #19644 )  
						
						
						
						
					 
					
						2025-06-14 21:15:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e13945f9dd 
					 
					
						
						
							
							[Perf] Further tunings for SM100 FP8 CUTLASS kernel ( #19566 )  
						
						
						
						
					 
					
						2025-06-14 17:25:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						08500011d3 
					 
					
						
						
							
							[Fix] Convert kv_transfer_config from dict to KVTransferConfig ( #19262 )  
						
						
						
						
					 
					
						2025-06-14 12:32:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						861a0a0a39 
					 
					
						
						
							
							[Bugfix] Don't attempt to use triton if no driver is active ( #19561 )  
						
						
						
						
					 
					
						2025-06-14 12:30:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bc956b38d0 
					 
					
						
						
							
							Only build CUTLASS MoE kernels on Hopper ( #19648 )  
						
						
						
						
					 
					
						2025-06-14 11:44:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						294fc1e2c9 
					 
					
						
						
							
							[Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization ( #19500 )  
						
						
						
						
					 
					
						2025-06-14 09:34:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2db9044ab6 
					 
					
						
						
							
							[Bugfix] Fix auto dtype casting for BatchFeature ( #19316 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-06-14 15:13:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6fa718a460 
					 
					
						
						
							
							[Misc] Modularize CLI Argument Parsing in Benchmark Scripts ( #19593 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-14 16:54:52 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						06be858828 
					 
					
						
						
							
							[Bugfix] Fix the speculative decoding test by setting the target dtype ( #19633 )  
						
						
						
						
					 
					
						2025-06-13 20:57:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1e34cc9ac 
					 
					
						
						
							
							[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. ( #18354 )  
						
						... 
						
						
						
						Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai > 
						
						
					 
					
						2025-06-14 11:07:36 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bd517eb9fe 
					 
					
						
						
							
							[BugFix] Fix DP Coordinator incorrect debug log message ( #19624 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-14 00:18:03 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d65668b4e8 
					 
					
						
						
							
							Adding "AMD: Multi-step Tests" to amdproduction. ( #19508 )  
						
						... 
						
						
						
						Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-06-13 17:08:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aafbbd981f 
					 
					
						
						
							
							[torch.compile] Use custom ops when use_inductor=False ( #19618 )  
						
						
						
						
					 
					
						2025-06-13 15:05:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f0874515a 
					 
					
						
						
							
							[Doc] Add troubleshooting section to k8s deployment ( #19377 )  
						
						... 
						
						
						
						Signed-off-by: Anna Pendleton <pendleton@google.com > 
						
						
					 
					
						2025-06-13 21:47:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3597b06a4f 
					 
					
						
						
							
							[CUDA] Enable full cudagraph for FlashMLA ( #18581 )  
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-06-13 18:12:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1015296b79 
					 
					
						
						
							
							[doc][mkdocs] fix the  duplicate Supported features sections in GPU docs ( #19606 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-13 16:25:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce9dc02c93 
					 
					
						
						
							
							[Refactor] Remove unused variables in moe_permute_unpermute_kernel.inl ( #19573 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-13 06:12:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a24cb91600 
					 
					
						
						
							
							[Model] Fix minimax model cache & lm_head precision ( #19592 )  
						
						... 
						
						
						
						Signed-off-by: qingjun <qingjun@minimaxi.com > 
						
						
					 
					
						2025-06-13 12:08:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e8d97dd3f 
					 
					
						
						
							
							[BugFix] Honor enable_caching in connector-delayed kvcache load case ( #19435 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-13 09:46:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d70bc7c029 
					 
					
						
						
							
							[torch.compile] reorganize the cache directory to support compiling multiple models ( #19064 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-06-13 15:23:25 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce688ad46e 
					 
					
						
						
							
							use base version for version comparison ( #19587 )  
						
						... 
						
						
						
						Signed-off-by: Boyuan Feng <boyuan@meta.com > 
						
						
					 
					
						2025-06-13 15:09:34 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cefdb9962d 
					 
					
						
						
							
							[Fix] The zip function in Python 3.9 does not have the strict argument ( #19549 )  
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-06-13 14:57:48 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ace5cdaff0 
					 
					
						
						
							
							[Fix] bump mistral common to support magistral ( #19533 )  
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-06-12 22:28:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6458721108 
					 
					
						
						
							
							[CPU] Refine default config for the CPU backend ( #19539 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-06-13 13:27:39 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bb4a0decef 
					 
					
						
						
							
							[Misc] Correct broken docs link ( #19553 )  
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-06-12 22:27:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c707cfc12e 
					 
					
						
						
							
							[doc] fix incorrect link ( #19586 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-13 04:26:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7b3c9ff91d 
					 
					
						
						
							
							[Doc] uses absolute links for structured outputs ( #19582 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-06-13 03:35:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c68698b326 
					 
					
						
						
							
							[Bugfix] Fix EAGLE vocab embedding for multimodal target model ( #19570 )  
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com > 
						
						
					 
					
						2025-06-12 23:09:19 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e3b12667d4 
					 
					
						
						
							
							[BugFix] : Fix Batched DeepGemm Experts ( #19515 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-06-12 20:43:02 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e6aab5de29 
					 
					
						
						
							
							Revert "[Build/CI] Add tracing deps to vllm container image ( #15224 )" ( #19378 )  
						
						
						
						
					 
					
						2025-06-12 17:26:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c57bb199b3 
					 
					
						
						
							
							[V1] Resolve failed concurrent structured output requests ( #19565 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-06-12 23:30:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dba68f9159 
					 
					
						
						
							
							[Doc] Unify structured outputs examples ( #18196 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-06-12 22:50:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a3319f4f04 
					 
					
						
						
							
							[Bugfix] Enforce contiguous input for dynamic_per_token FP8/INT8 quant ( #19452 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-12 15:39:15 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9d880f594d 
					 
					
						
						
							
							[Misc] Turn MOE_DP_CHUNK_SIZE into an env var ( #19506 )  
						
						
						
						
					 
					
						2025-06-12 18:01:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						017ef648e9 
					 
					
						
						
							
							[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets ( #18847 )  
						
						
						
						
					 
					
						2025-06-12 10:30:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4b25ab14e2 
					 
					
						
						
							
							[doc] Make top navigation sticky ( #19540 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-12 15:48:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f98548b9da 
					 
					
						
						
							
							[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass ( #16756 )  
						
						... 
						
						
						
						Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-06-12 08:31:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						96846bb360 
					 
					
						
						
							
							Fix TorchAOConfig skip layers ( #19265 )  
						
						... 
						
						
						
						Signed-off-by: mobicham <hicham@mobiuslabs.com > 
						
						
					 
					
						2025-06-12 22:22:53 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6efafd9e4 
					 
					
						
						
							
							[Perf] Vectorize static / dynamic INT8 quant kernels ( #19233 )  
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-12 06:51:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1129e2b1ab 
					 
					
						
						
							
							[V1][NixlConnector] Drop num_blocks check  ( #19532 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-06-12 12:36:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c742438f8b 
					 
					
						
						
							
							[Doc] Add V1 column to supported models list ( #19523 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-12 19:16:44 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						73e2e0118f 
					 
					
						
						
							
							[Quantization] Improve AWQ logic ( #19431 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-12 11:02:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c9280e6346 
					 
					
						
						
							
							[Bugfix] Respect num-gpu-blocks-override in v1 ( #19503 )  
						
						... 
						
						
						
						Signed-off-by: Jon Swenson <jmswen@gmail.com > 
						
						
					 
					
						2025-06-12 11:00:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						af09b3f0a0 
					 
					
						
						
							
							[Bugfix][V1] Allow manual FlashAttention for Blackwell ( #19492 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-12 10:40:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4f6c42fa0a 
					 
					
						
						
							
							[Security] Prevent new imports of (cloud)pickle ( #18018 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com > 
						
						
					 
					
						2025-06-12 10:30:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dff680001d 
					 
					
						
						
							
							Fix typo ( #19525 )  
						
						... 
						
						
						
						Signed-off-by: 2niuhe <carlton2tang@gmail.com > 
						
						
					 
					
						2025-06-12 09:24:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e090bd5df 
					 
					
						
						
							
							[AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm ( #19509 )  
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-06-12 07:14:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1b0b065eb5 
					 
					
						
						
							
							[BugFix] Handle missing sep_token for Qwen3-Reranker in Score API ( #19522 )  
						
						... 
						
						
						
						Signed-off-by: strutive07 <strutive07@gmail.com > 
						
						
					 
					
						2025-06-12 07:00:47 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d5bdf899e4 
					 
					
						
						
							
							[BugFix] Work-around incremental detokenization edge case error ( #19449 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-12 06:43:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e3e74c97c 
					 
					
						
						
							
							[Frontend] Improve error message in tool_choice validation ( #19239 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-12 01:13:00 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3f6341bf7f 
					 
					
						
						
							
							Add Triton Fused MoE kernel config for E=16 on B200 ( #19518 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-06-12 04:31:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e5d35d62f5 
					 
					
						
						
							
							[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import ( #19514 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-06-12 04:28:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2f1c19b245 
					 
					
						
						
							
							[CI] change spell checker from codespell to typos ( #18711 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-06-11 19:57:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						42f52cc95b 
					 
					
						
						
							
							[CI/Build] Fix torch nightly CI dependencies ( #19505 )  
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-06-11 14:40:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						97a9465bbc 
					 
					
						
						
							
							[UX] Add Feedback During CUDAGraph Capture ( #19501 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-06-11 21:09:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c7ea0b56cd 
					 
					
						
						
							
							[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger ( #17331 )  
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-06-11 15:53:28 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						29fa5cac1c 
					 
					
						
						
							
							[Kernels] Add activation chunking logic to FusedMoEModularKernel ( #19168 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-06-11 12:53:10 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b2d9be6f7d 
					 
					
						
						
							
							[Docs] Remove WIP features in V1 guide ( #19498 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-11 09:15:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						04a55612dd 
					 
					
						
						
							
							[Misc] Fix  misleading ROCm warning ( #19486 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-12 00:12:10 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						89b0f84e17 
					 
					
						
						
							
							[doc] fix "Other AI accelerators" getting started page ( #19457 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-06-11 16:11:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						497a91e9f7 
					 
					
						
						
							
							[CI] Update FlashInfer to 0.2.6.post1 ( #19297 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-11 22:57:28 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						943ffa5703 
					 
					
						
						
							
							[Bugfix] Update the example code, make it work with the latest lmcache ( #19453 )  
						
						... 
						
						
						
						Signed-off-by: Runzhen Wang <wangrunzhen@gmail.com > 
						
						
					 
					
						2025-06-11 12:42:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5c8d34a42c 
					 
					
						
						
							
							Support no privileged mode on CPU for docker and kubernetes deployments ( #19241 )  
						
						... 
						
						
						
						Signed-off-by: Tsai, Louie <louie.tsai@intel.com > 
						
						
					 
					
						2025-06-11 04:11:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c8694eabe 
					 
					
						
						
							
							Fix some typo ( #19475 )  
						
						... 
						
						
						
						Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com >
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com > 
						
						
					 
					
						2025-06-11 10:36:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7484e1fce2 
					 
					
						
						
							
							Add cache to cuda get_device_capability ( #19436 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-11 17:37:05 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a2142f0196 
					 
					
						
						
							
							Support non-string values in JSON keys from CLI ( #19471 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-11 09:34:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						871d6b7c74 
					 
					
						
						
							
							[Misc] Reduce warning message introduced in env_override ( #19476 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-11 17:29:54 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						29a38f0352 
					 
					
						
						
							
							[Doc] Support "important" and "announcement" admonitions ( #19479 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-11 01:39:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a5115f4ff5 
					 
					
						
						
							
							[Doc] Fix quantization link titles ( #19478 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-11 01:27:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						68b4a26149 
					 
					
						
						
							
							[Doc] Update V1 User Guide for Hardware and Models ( #19474 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-11 00:49:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b8e809a057 
					 
					
						
						
							
							[Kernel] Support deep_gemm for linear methods ( #19085 )  
						
						... 
						
						
						
						Signed-off-by: artetaout <lulala341@gmail.com > 
						
						
					 
					
						2025-06-11 15:14:45 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5039ec2336 
					 
					
						
						
							
							[ROCm] Add rules to automatically label ROCm related PRs ( #19405 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-11 15:09:18 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7c644ab6d5 
					 
					
						
						
							
							Fix Typo in Documentation and Function Name ( #19442 )  
						
						
						
						
					 
					
						2025-06-10 22:44:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2d40665fe8 
					 
					
						
						
							
							Add fused MOE config for Qwen3 30B A3B on B200 ( #19455 )  
						
						... 
						
						
						
						Signed-off-by: Junhao Li <junhao@ubicloud.com > 
						
						
					 
					
						2025-06-11 13:43:46 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						96ada386b7 
					 
					
						
						
							
							[Misc] Remove unused MultiModalHasher.hash_prompt_mm_data ( #19422 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-06-11 05:18:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1e473b3010 
					 
					
						
						
							
							[CI] Disable failing GGUF model test ( #19454 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-11 05:12:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2b1e2111b0 
					 
					
						
						
							
							Fix test_max_model_len in tests/entrypoints/llm/test_generate.py ( #19451 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-11 12:54:59 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a45b979d9f 
					 
					
						
						
							
							[BugFix] Fix docker build cpu-dev image error ( #19394 )  
						
						... 
						
						
						
						Signed-off-by: niu_he <carlton2tang@gmail.com > 
						
						
					 
					
						2025-06-10 20:56:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3952731e8f 
					 
					
						
						
							
							[New Model]: Support Qwen3 Embedding & Reranker  ( #19260 )  
						
						
						
						
					 
					
						2025-06-10 20:07:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						77f0d465d0 
					 
					
						
						
							
							[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 ( #19390 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-06-11 07:54:41 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						22c3c0aa4a 
					 
					
						
						
							
							Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 ( #19401 )  
						
						... 
						
						
						
						Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com > 
						
						
					 
					
						2025-06-11 07:23:57 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						33f8dba7c6 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for commandr ( #19399 )  
						
						... 
						
						
						
						Signed-off-by: py-andy-c <pychen1017@gmail.com > 
						
						
					 
					
						2025-06-10 22:42:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5241ca50d6 
					 
					
						
						
							
							[ROCm][V1] Adding ROCm to the list of plaforms using V1 by default ( #19440 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-06-10 22:06:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						da9b523ce1 
					 
					
						
						
							
							[Docs] Note that alternative structured output backends are supported ( #19426 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-06-10 16:20:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6553be1bc 
					 
					
						
						
							
							[Misc] Slight improvement of the BNB  ( #19418 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-06-10 13:51:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						64a9af5afa 
					 
					
						
						
							
							Simplify ep kernels installation ( #19412 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-06-10 20:06:08 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e4248849ec 
					 
					
						
						
							
							[BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral ( #19411 )  
						
						... 
						
						
						
						Signed-off-by: jiang.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-06-10 12:02:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						467bef18a3 
					 
					
						
						
							
							[BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword use_irope ( #19134 )  
						
						... 
						
						
						
						Signed-off-by: Yunqiu Guo <guorachel@meta.com > 
						
						
					 
					
						2025-06-10 16:48:51 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5f1ac1e1d1 
					 
					
						
						
							
							Revert "[v1] Add fp32 support to v1 engine through flex attn" ( #19404 )  
						
						
						
						
					 
					
						2025-06-10 01:30:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9368cc90b2 
					 
					
						
						
							
							Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. ( #17930 )  
						
						... 
						
						
						
						Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Co-authored-by: Li, Jiang <bigpyj64@gmail.com > 
						
						
					 
					
						2025-06-10 06:22:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32b3946bb4 
					 
					
						
						
							
							Add clear documentation around the impact of debugging flag ( #19369 )  
						
						... 
						
						
						
						Signed-off-by: Anna Pendleton <pendleton@google.com > 
						
						
					 
					
						2025-06-10 06:16:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6b1391ca7e 
					 
					
						
						
							
							[Misc] refactor neuron_multimodal and profiling ( #19397 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-10 06:12:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a3f66e75d1 
					 
					
						
						
							
							Add security warning to bug report template ( #19365 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com > 
						
						
					 
					
						2025-06-10 06:06:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						319cb1e351 
					 
					
						
						
							
							[Core] Batch multi modal input using pinned memory ( #19169 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-06-10 13:44:59 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1efef71645 
					 
					
						
						
							
							[Bugfix] Fix modelscope token passed in ( #19389 )  
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-10 13:39:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						646d62f636 
					 
					
						
						
							
							[Core] Use tuple for kv cache group block ids ( #19175 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-10 07:01:17 +02:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6cd4ae8acd 
					 
					
						
						
							
							[Frontend] Add tqdm_leave_pbar to control progress bar visibility ( #19357 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-10 04:55:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c016047ed7 
					 
					
						
						
							
							Fix docs/mkdocs/hooks/remove_announcement.py ( #19382 )  
						
						
						
						
					 
					
						2025-06-09 21:36:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9af6d22e4c 
					 
					
						
						
							
							Use xla flag to improve the quantized model performance ( #19303 )  
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-06-10 01:28:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4589b94032 
					 
					
						
						
							
							[Bugfix] Fix benchmark_moe.py ( #19016 )  
						
						... 
						
						
						
						Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-06-09 18:04:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cc867be19c 
					 
					
						
						
							
							[V1] Reuse V0's memory_profiling util for gpu worker memory profiling ( #19312 )  
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-06-10 08:40:01 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a7cd627a8 
					 
					
						
						
							
							[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration ( #19383 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-06-09 16:41:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8058c91108 
					 
					
						
						
							
							[HOT-FIX] Add kv_sharing_target_layer_name argument to cutlass_mla backend ( #19374 )  
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com > 
						
						
					 
					
						2025-06-09 19:00:07 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7d44c469fe 
					 
					
						
						
							
							[TPU]Fix KV cache sharing tests ( #19371 )  
						
						
						
						
					 
					
						2025-06-09 18:38:15 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						31f58be96a 
					 
					
						
						
							
							[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var ( #18472 )  
						
						... 
						
						
						
						Signed-off-by: liusiqian <liusiqian@tal.com > 
						
						
					 
					
						2025-06-09 21:41:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ebb2f383b8 
					 
					
						
						
							
							[Quantization] Bump compressed-tensors version ( #19295 )  
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-06-09 14:33:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c1c7dbbeeb 
					 
					
						
						
							
							[Bugfix][Core] Prevent token lengths exceeding max_model_len in V0 ( #19348 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-09 23:01:29 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5cf2daea9a 
					 
					
						
						
							
							[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. ( #19298 )  
						
						... 
						
						
						
						Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun <vsundarr@redhat.com > 
						
						
					 
					
						2025-06-09 10:50:39 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b8089195b4 
					 
					
						
						
							
							[v1] Add fp32 support to v1 engine through flex attn ( #19319 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-06-09 22:10:44 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						770e5dcdb8 
					 
					
						
						
							
							[full_graph] Fix query_start_loc padding ( #19321 )  
						
						... 
						
						
						
						Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai > 
						
						
					 
					
						2025-06-09 21:32:56 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c57c9415b1 
					 
					
						
						
							
							[Docs] Fix a bullet list in usage/security.md ( #19358 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-06-09 13:28:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						01810f9236 
					 
					
						
						
							
							[CI] Introduce rules for llama auto-label ( #19323 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-09 20:05:42 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						59abbd84f9 
					 
					
						
						
							
							[Fix] Allow kernel compilation for CUDA capability 8.7 ( #19328 )  
						
						... 
						
						
						
						Signed-off-by: Conroy Cheers <conroy@corncheese.org > 
						
						
					 
					
						2025-06-09 02:57:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						95a6568b5c 
					 
					
						
						
							
							[CI/Build] Fix LoRA test ( #19350 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-09 09:52:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0eca5eacd0 
					 
					
						
						
							
							[Doc] Fix description in the Automatic Prefix Caching design doc ( #19333 )  
						
						... 
						
						
						
						Signed-off-by: cr7258 <chengzw258@163.com > 
						
						
					 
					
						2025-06-09 17:30:02 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						12e5829221 
					 
					
						
						
							
							[doc] improve ci doc ( #19307 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-09 07:26:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a4d417707 
					 
					
						
						
							
							[Misc] Cleanup compilation tests ( #19343 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-06-09 15:05:44 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8335667c22 
					 
					
						
						
							
							[Frontend] Remove unreachable code from llm.py ( #19288 )  
						
						... 
						
						
						
						Signed-off-by: KsuParkhamchuk <k.parkhamchuk@gmail.com > 
						
						
					 
					
						2025-06-09 10:22:10 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e1c4380d4c 
					 
					
						
						
							
							[Misc] Add documentation update reminder to PR template ( #19289 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-09 10:20:53 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e31ae3de36 
					 
					
						
						
							
							[Deprecation] Remove inputs arg fallback in Engine classes ( #18799 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-09 10:19:56 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ffb9b6e07 
					 
					
						
						
							
							[Bugfix] model_max_length should consider max_model_len in tokenizer_config ( #19201 )  
						
						
						
						
					 
					
						2025-06-08 07:17:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cda10fa3e2 
					 
					
						
						
							
							[Multi Modal] Add an env var for message queue max chunk bytes  ( #19242 )  
						
						... 
						
						
						
						Signed-off-by: yZhen <yZhen@fb.com >
Co-authored-by: yZhen <yZhen@fb.com > 
						
						
					 
					
						2025-06-08 21:39:12 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c123bc33f9 
					 
					
						
						
							
							[Quantization] Add compressed-tensors NVFP4 support ( #18312 )  
						
						
						
						
					 
					
						2025-06-08 09:05:55 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b9a1791e2c 
					 
					
						
						
							
							[Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection ( #19082 )  
						
						... 
						
						
						
						Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com > 
						
						
					 
					
						2025-06-08 09:17:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						989dcee981 
					 
					
						
						
							
							Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B ( #19315 )  
						
						... 
						
						
						
						Signed-off-by: Xu Wenqing <xuwq1993@qq.com > 
						
						
					 
					
						2025-06-08 16:07:02 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d64d366e0 
					 
					
						
						
							
							[Misc] Change tests/compile to use VLLM_V1 by default ( #19302 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-06-08 16:06:48 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eaa2e51088 
					 
					
						
						
							
							[Bugfix] Re-enable use_cudagraph in vLLM v1 ( #19299 )  
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-06-08 08:56:12 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d77f7fb871 
					 
					
						
						
							
							[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer ( #19283 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-06-08 08:16:31 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2d8476e465 
					 
					
						
						
							
							[BugFix][V1] Fix memory profiling bug ( #18974 )  
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-06-07 10:34:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						88be823d57 
					 
					
						
						
							
							[AMD] Update compatible packaging version ( #19309 )  
						
						... 
						
						
						
						Signed-off-by: pramkuma <Pramendra.Kumar@amd.com > 
						
						
					 
					
						2025-06-07 20:55:09 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e4f63ad45 
					 
					
						
						
							
							[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py ( #19311 )  
						
						... 
						
						
						
						Signed-off-by: Lifan Shen <lifans@meta.com > 
						
						
					 
					
						2025-06-07 18:25:38 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d2f0e7e615 
					 
					
						
						
							
							[CI/Build] Improve Llama GGUF test robustness ( #19287 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-07 17:23:28 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						122cdca5f6 
					 
					
						
						
							
							[Misc] refactor context extension ( #19246 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-07 05:13:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cf02f9b283 
					 
					
						
						
							
							Add FlexAttention to V1 ( #16078 )  
						
						... 
						
						
						
						Signed-off-by: drisspg <drisspguessous@gmail.com > 
						
						
					 
					
						2025-06-06 21:58:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c4296b1a27 
					 
					
						
						
							
							[CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py ( #19253 )  
						
						... 
						
						
						
						Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com > 
						
						
					 
					
						2025-06-07 11:52:52 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						66c508b137 
					 
					
						
						
							
							[TPU][Test] Add script to run benchmark on TPU for buildkite ( #19039 )  
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-06-06 20:10:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						84166fee97 
					 
					
						
						
							
							[Kernel] Integrate CUTLASS MoE kernel with PPLX ( #18762 )  
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-06-06 18:26:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e0cd10f72 
					 
					
						
						
							
							[Easy][Test] Simplify test_function_tool_use with multiple parametrizes ( #19269 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-07 09:19:09 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e010688f50 
					 
					
						
						
							
							[Build][ROCm] Update Dockerfile.rocm ( #19296 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-06-06 19:35:16 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						441b65d8c7 
					 
					
						
						
							
							[Misc][Tools][Benchmark] Fix and improve auto tune script ( #19163 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-06-06 23:31:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						46ecc57973 
					 
					
						
						
							
							[BugFix] Fix tpu_model_runner block_id concatenation ( #19228 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-06 16:28:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6a3a9f76d 
					 
					
						
						
							
							[Core] Fix abrupt request abort ( #18485 )  
						
						... 
						
						
						
						Signed-off-by: nicklucche <nlucches@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-06 16:27:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ca27f0f9c1 
					 
					
						
						
							
							[Bugfix][Core] Update cancellation logic in generate() to handle Generator exits ( #19225 )  
						
						... 
						
						
						
						Co-authored-by: Adolfo Victoria <adovi@meta.com > 
						
						
					 
					
						2025-06-06 20:17:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aad30bd306 
					 
					
						
						
							
							[BugFix] Fix MultiConnector test after HMA changes ( #19291 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-06 20:16:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						94ecee6282 
					 
					
						
						
							
							Fixed ppc build when it runs on non-RHEL based linux distros ( #18422 )  
						
						... 
						
						
						
						Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com >
Co-authored-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com > 
						
						
					 
					
						2025-06-06 11:54:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8267f9916f 
					 
					
						
						
							
							improve logits bias ( #19041 )  
						
						
						
						
					 
					
						2025-06-06 19:59:25 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7353492a47 
					 
					
						
						
							
							[Core] Raise when non-multi-instance DP clients target a DP rank ( #19227 )  
						
						... 
						
						
						
						Signed-off-by: Jon Swenson <jmswen@gmail.com > 
						
						
					 
					
						2025-06-06 19:03:01 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7661e92ef8 
					 
					
						
						
							
							[Model] Optimize nemotron_h implementation ( #19249 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-06 10:05:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f168b85725 
					 
					
						
						
							
							Unit Test for run_dp_sharded_vision_model ( #19103 )  
						
						... 
						
						
						
						Signed-off-by: Siqi Yan <siqi@meta.com >
Co-authored-by: Siqi Yan <siqi@meta.com > 
						
						
					 
					
						2025-06-06 16:24:02 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						da511d54d8 
					 
					
						
						
							
							Fix CompilationConfig repr ( #19091 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-06-06 16:23:35 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65c69444b1 
					 
					
						
						
							
							[Docs] Improve V1 KVConnector interface documentation ( #19172 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-06 16:22:45 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						94870359cd 
					 
					
						
						
							
							[Quantization] Bump compressed-tensors version; update NVFP4A16 test model ( #19224 )  
						
						... 
						
						
						
						Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-06-06 01:21:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d49483ea9 
					 
					
						
						
							
							[TPU] fix kv cache dtype in model runner ( #19244 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-06-06 16:20:16 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90b78ec5f9 
					 
					
						
						
							
							[v1][P/D] Fix a edge case in kv cache schedule ( #19182 )  
						
						... 
						
						
						
						Co-authored-by: jinghui <jinghui@fb.com > 
						
						
					 
					
						2025-06-05 23:32:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						91a2ef98ea 
					 
					
						
						
							
							[Chore] update CODEOWNERS ( #19247 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-06-06 06:09:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3da2313d78 
					 
					
						
						
							
							Support allowed_token_ids in ChatCompletionRequest ( #19143 )  
						
						... 
						
						
						
						Signed-off-by: Xu Song <xusong.vip@gmail.com > 
						
						
					 
					
						2025-06-06 05:06:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b61dc5f972 
					 
					
						
						
							
							[TPU] update torch_xla pin ( #19231 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-06-06 04:27:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f8a1a2d108 
					 
					
						
						
							
							[v1] Hybrid Memory Allocator ( #17996 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-06-05 20:47:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3465b87ef8 
					 
					
						
						
							
							[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B ( #19033 )  
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-06-05 19:10:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c8134bea15 
					 
					
						
						
							
							Fix AOPerModuleConfig name changes ( #18869 )  
						
						... 
						
						
						
						Signed-off-by: Jerry Zhang <jerryzh168@gmail.com > 
						
						
					 
					
						2025-06-05 18:51:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb6d572e85 
					 
					
						
						
							
							[Model] NemotronH support ( #18863 )  
						
						... 
						
						
						
						Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com >
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com > 
						
						
					 
					
						2025-06-05 21:29:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						87360308b7 
					 
					
						
						
							
							[V1] Use FlashInfer by default on Blackwell GPUs ( #19118 )  
						
						
						
						
					 
					
						2025-06-05 15:40:39 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa49f14832 
					 
					
						
						
							
							[Quantization] Skip Fp4 Test for compressed-tensors ( #19217 )  
						
						
						
						
					 
					
						2025-06-05 18:21:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ef9173cfa 
					 
					
						
						
							
							[P/D][NixlConnector] Enable FlashInfer backend ( #19090 )  
						
						
						
						
					 
					
						2025-06-05 17:10:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						85e2b7bb13 
					 
					
						
						
							
							[MISC][Bugfix] Use less CPU when message queue has been empty for some time ( #16226 )  
						
						... 
						
						
						
						Signed-off-by: Povilas Kanapickas <povilas@radix.lt > 
						
						
					 
					
						2025-06-05 16:53:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61059bee40 
					 
					
						
						
							
							[Hardware][NVIDIA] FP4 MoE kernel optimization ( #19110 )  
						
						... 
						
						
						
						Signed-off-by: Chiyue Wei <chiyuew@nvidia.com >
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com > 
						
						
					 
					
						2025-06-05 09:48:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec89524f50 
					 
					
						
						
							
							Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 ( #19205 )  
						
						
						
						
					 
					
						2025-06-05 16:38:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f20f9f063b 
					 
					
						
						
							
							[mistral_common] Add v11 tokenizer ( #19193 )  
						
						... 
						
						
						
						Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com > 
						
						
					 
					
						2025-06-05 08:27:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9bc8bb07cf 
					 
					
						
						
							
							[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided ( #19202 )  
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-06-05 12:59:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1aeb925f34 
					 
					
						
						
							
							[Frontend] improve vllm run-batch --help display ( #19187 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-05 11:16:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						188a4590d8 
					 
					
						
						
							
							[Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly ( #19105 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-05 11:14:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						18093084be 
					 
					
						
						
							
							[Misc] Remove unnecessary fallback to prefill-decode attention ( #19138 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-06-05 16:08:26 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						da40380214 
					 
					
						
						
							
							[Build] Annotate wheel and container path for release workflow ( #19162 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-06-04 23:24:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8fc57501d3 
					 
					
						
						
							
							[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled ( #19135 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-06-05 06:24:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						af7fc84fd2 
					 
					
						
						
							
							[BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 ( #19171 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-05 13:41:25 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0678b52251 
					 
					
						
						
							
							Handle non-serializable objects when dumping benchmark results ( #19114 )  
						
						
						
						
					 
					
						2025-06-04 22:40:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						25b918eee6 
					 
					
						
						
							
							[Torch Nightly]add missing dependency ( #18770 )  
						
						... 
						
						
						
						Signed-off-by: Yang Wang <elainewy@meta.com > 
						
						
					 
					
						2025-06-04 21:56:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a408820f2f 
					 
					
						
						
							
							[Bugfix] Fix port handling in make_zmq_path ( #19117 )  
						
						
						
						
					 
					
						2025-06-04 21:00:59 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c56ed8bb0e 
					 
					
						
						
							
							[Bugfix][Nixl] Fix full prefix cache hit bug ( #18632 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-05 02:07:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						78dcf56cb3 
					 
					
						
						
							
							[doc] small fix ( #19167 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-05 09:13:50 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b2fac67130 
					 
					
						
						
							
							[P/D] Heterogeneous TP ( #18833 )  
						
						... 
						
						
						
						Signed-off-by: nicklucche <nlucches@redhat.com > 
						
						
					 
					
						2025-06-04 23:25:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23027e2daf 
					 
					
						
						
							
							[Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM ( #18817 )  
						
						... 
						
						
						
						Signed-off-by: googs1025 <googs1025@gmail.com > 
						
						
					 
					
						2025-06-04 15:37:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c3fd4d669a 
					 
					
						
						
							
							[Kernel] Integrate batched/masked deepgemm kernel ( #19111 )  
						
						... 
						
						
						
						Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun <vsundarr@redhat.com > 
						
						
					 
					
						2025-06-04 21:59:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ef3f98b59f 
					 
					
						
						
							
							[Bugfix] fix v1 cpu worker fails on macOS ( #19121 )  
						
						
						
						
					 
					
						2025-06-04 20:17:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7ee2590478 
					 
					
						
						
							
							[TPU] Update dynamo dump file name in compilation test ( #19108 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-06-04 16:13:43 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						53a5a0ce30 
					 
					
						
						
							
							[Perf] Tunings for SM100 FP8 CUTLASS kernel ( #18778 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-04 10:46:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d459fae0a2 
					 
					
						
						
							
							[Bugfix][EP+DP] Fix internode check ( #19112 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-06-04 23:39:23 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c8dcc15921 
					 
					
						
						
							
							Allow AsyncLLMEngine.generate to target a specific DP rank ( #19102 )  
						
						... 
						
						
						
						Signed-off-by: Jon Swenson <jmswen@gmail.com > 
						
						
					 
					
						2025-06-04 08:26:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8f4ffbd373 
					 
					
						
						
							
							[Doc] Update V1 Guide for embedding models ( #19141 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-04 22:57:55 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5f2cd251d2 
					 
					
						
						
							
							Sm100 blockwise fp8 swap ab ( #18564 )  
						
						
						
						
					 
					
						2025-06-04 07:48:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						02658c2dfe 
					 
					
						
						
							
							Add DeepSeek-R1-0528 function call chat template ( #18874 )  
						
						... 
						
						
						
						Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com > 
						
						
					 
					
						2025-06-04 13:24:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						01dc9a76db 
					 
					
						
						
							
							[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 ( #18678 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-04 04:49:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						35cf32df30 
					 
					
						
						
							
							Improve the output precision of embedding models ( #19092 )  
						
						
						
						
					 
					
						2025-06-04 11:48:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8711bc5e68 
					 
					
						
						
							
							[Misc] Add packages for benchmark as extra dependency ( #19089 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-04 04:18:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2669a0d7b5 
					 
					
						
						
							
							Fix ValueError: Missing value for tag key(s): model_name,engine. ( #19113 )  
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-06-04 17:10:45 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8e972d9c44 
					 
					
						
						
							
							[TPU] Skip hanging tests ( #19115 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-06-04 01:43:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3336c8cfbe 
					 
					
						
						
							
							Fix   #19130  ( #19132 )  
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-06-04 01:42:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b124e1085b 
					 
					
						
						
							
							[Bugfix] Fix FA3 full cuda graph correctness ( #19106 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-03 23:10:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						41aa578428 
					 
					
						
						
							
							[NVIDIA] Add Cutlass MLA backend ( #17625 )  
						
						
						
						
					 
					
						2025-06-03 21:40:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d646c2e53 
					 
					
						
						
							
							[Cleanup][v1]:remote guided-decoding-backend for example ( #19059 )  
						
						... 
						
						
						
						Signed-off-by: calvin chen <120380290@qq.com > 
						
						
					 
					
						2025-06-04 04:23:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5d6d1adf15 
					 
					
						
						
							
							[KERNEL] Sampler. CUDA kernel for applying repetition penalty ( #18437 )  
						
						
						
						
					 
					
						2025-06-03 21:13:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1409ef9134 
					 
					
						
						
							
							[Core] Cast multimodal input in hf processor ( #18862 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-06-03 20:24:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4555143ea7 
					 
					
						
						
							
							[CPU] V1 support for the CPU backend ( #16441 )  
						
						
						
						
					 
					
						2025-06-03 18:43:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						52dceb172d 
					 
					
						
						
							
							[Docs] Add developer doc about CI failures ( #18782 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-06-04 01:09:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						abd7df2fca 
					 
					
						
						
							
							[Misc] Fix path and python alias errors in disagg_prefill exmaples ( #18919 )  
						
						
						
						
					 
					
						2025-06-03 17:15:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b712be98c7 
					 
					
						
						
							
							feat: add data parallel rank to KVEventBatch ( #18925 )  
						
						
						
						
					 
					
						2025-06-03 17:14:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a8da78eac9 
					 
					
						
						
							
							[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers ( #19029 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-06-04 00:14:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5d96533e22 
					 
					
						
						
							
							[Bugfix][P/D] Fix Prefix Cache Bug ( #18411 )  
						
						... 
						
						
						
						Signed-off-by: nicklucche <nlucches@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-06-03 23:53:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4de790fcad 
					 
					
						
						
							
							[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled ( #19075 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-06-03 23:27:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b5fd9506c1 
					 
					
						
						
							
							[Bugfix] get_num_blocks_to_allocate with null_block ( #19031 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-06-03 15:30:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						135cf55cd1 
					 
					
						
						
							
							[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix ( #18971 )  
						
						
						
						
					 
					
						2025-06-03 15:26:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6cac54f4d1 
					 
					
						
						
							
							[v1] Re-init input batch for multiple kv cache groups ( #18654 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-06-03 21:41:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6865fe0074 
					 
					
						
						
							
							Fix interaction between Optional and Annotated in CLI typing ( #19093 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Yikun Jiang <yikun@apache.org > 
						
						
					 
					
						2025-06-03 21:07:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e31446b6c8 
					 
					
						
						
							
							[Perf] Tune scaled_fp8_quant by increasing vectorization ( #18844 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-03 13:48:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bdf13965ab 
					 
					
						
						
							
							[V1] Support cross-layer KV sharing ( #18212 )  
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-06-03 20:33:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fa98d77773 
					 
					
						
						
							
							[Kernel] DeepEP dispatch-combine kernel integration ( #18434 )  
						
						... 
						
						
						
						Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-06-03 12:30:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						01eee40536 
					 
					
						
						
							
							[doc] update docker version ( #19074 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-03 19:08:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						19bdaf32b1 
					 
					
						
						
							
							[Doc] Readme standardization ( #18695 )  
						
						... 
						
						
						
						Co-authored-by: Soren Dreano <soren@numind.ai > 
						
						
					 
					
						2025-06-03 11:50:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						02f0c7b220 
					 
					
						
						
							
							[Misc] Add SPDX-FileCopyrightText  ( #19100 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-06-03 11:20:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d054da1992 
					 
					
						
						
							
							[Misc] fix: add miss best_of param validation ( #18555 )  
						
						... 
						
						
						
						Signed-off-by: googs1025 <googs1025@gmail.com > 
						
						
					 
					
						2025-06-03 11:02:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4b7817c119 
					 
					
						
						
							
							[Misc] Add missing _Backend enums ( #19081 )  
						
						... 
						
						
						
						Signed-off-by: nicklucche <nlucches@redhat.com > 
						
						
					 
					
						2025-06-03 16:15:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d00dd65cd4 
					 
					
						
						
							
							[Doc] Improve the Pull Request template with key components ( #19086 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-03 23:44:34 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d81edded69 
					 
					
						
						
							
							[Bugfix] disable processor cache  ( #19068 )  
						
						... 
						
						
						
						Signed-off-by: raushan <raushan@huggingface.co > 
						
						
					 
					
						2025-06-03 15:06:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						476844d44c 
					 
					
						
						
							
							Fix underscores in dict keys passed via CLI ( #19030 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-06-03 14:39:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e68ae5e59 
					 
					
						
						
							
							[CI/Build] Remove V0 LoRA test ( #19066 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-03 14:30:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e88723f32 
					 
					
						
						
							
							[doc] clarify windows support ( #19088 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-06-03 21:42:17 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						118ff92111 
					 
					
						
						
							
							[Doc] Update V1 user guide for embedding and enc-dec models ( #19060 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-03 02:29:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec2dcd80bc 
					 
					
						
						
							
							[Misc] Update WeightsMapper for qwen2-vl/qwen2.5-vl ( #19054 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-03 09:08:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						42243fbda0 
					 
					
						
						
							
							[Doc] Add InternVL LoRA support  ( #19055 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-03 09:08:03 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d18ed2a2e 
					 
					
						
						
							
							Update docker docs with ARM CUDA cross-compile ( #19037 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-06-03 08:21:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f32fcd9444 
					 
					
						
						
							
							[v1][KVCacheManager] Rename BlockHashType to BlockHash ( #19015 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-06-03 08:01:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d32aa2e670 
					 
					
						
						
							
							[Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure ( #19019 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-06-03 00:16:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cc977286e7 
					 
					
						
						
							
							Reduce logs in CLI scripts and plugin loader ( #18970 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-03 06:00:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						17430e3653 
					 
					
						
						
							
							[bugfix] small fix logic issue ( #18999 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-03 05:35:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1282bd812e 
					 
					
						
						
							
							Add tarsier model support ( #18985 )  
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-06-03 13:13:13 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bdce64f236 
					 
					
						
						
							
							[V1] Support DP with Ray ( #18779 )  
						
						
						
						
					 
					
						2025-06-02 21:15:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e6f61e8c3 
					 
					
						
						
							
							[ROCm][Build] Clean up the ROCm build ( #19040 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-06-02 20:47:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8655f47f37 
					 
					
						
						
							
							[CPU][CI] Re-enable the CPU CI tests ( #19046 )  
						
						... 
						
						
						
						Signed-off-by: jiang.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-06-02 20:46:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ce42f9204 
					 
					
						
						
							
							Adding "LoRA Test %N" to AMD production tests ( #18929 )  
						
						... 
						
						
						
						Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu > 
						
						
					 
					
						2025-06-02 20:46:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8a57872b2a 
					 
					
						
						
							
							[Bugfix][EP+DP] Use pplx-kernel internode instead of intranode ( #19034 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-06-03 11:36:51 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5bc1ad6cee 
					 
					
						
						
							
							[Doc] Remove duplicate TOCs during MkDocs migration ( #19021 )  
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-06-02 19:49:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9112b443a0 
					 
					
						
						
							
							[Hardware][TPU] Initial support of model parallelism with single worker using SPMD ( #18011 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-06-03 00:06:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c57d577e8d 
					 
					
						
						
							
							add an absolute path for run.sh ( #18258 )  
						
						... 
						
						
						
						Signed-off-by: calvin chen <120380290@qq.com > 
						
						
					 
					
						2025-06-02 19:38:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ca2f6b9c30 
					 
					
						
						
							
							[Bugfix][Model] Attempt to fix eagle in V0. ( #18978 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-06-02 08:15:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						20133cfee2 
					 
					
						
						
							
							[Frontend] enable custom logging for the uvicorn server (OpenAI API server) ( #18403 )  
						
						... 
						
						
						
						Signed-off-by: François Paupier <francois.paupier@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-06-02 15:04:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ebb1ec9318 
					 
					
						
						
							
							[Model] enable data parallel for Llama4 vision encoder ( #18368 )  
						
						... 
						
						
						
						Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com >
Co-authored-by: yZhen <yZhen@fb.com >
Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com > 
						
						
					 
					
						2025-06-02 19:22:54 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b168b6d7a 
					 
					
						
						
							
							[doc] add pytest tips ( #19010 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-02 11:07:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9760fd8f6a 
					 
					
						
						
							
							[Core] Support inplace model weights loading ( #18745 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-06-02 17:38:50 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b9f61e1387 
					 
					
						
						
							
							[Bugfix][Nixl] Fix DP Metadata Handshake ( #19008 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-06-02 03:30:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6fd3a33b8 
					 
					
						
						
							
							[Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context ( #18935 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-06-01 19:41:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						432ec9926e 
					 
					
						
						
							
							[doc] wrong output ( #19000 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-01 11:26:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2b102d51ad 
					 
					
						
						
							
							[BugFix] Fix incorrect metrics shutdown error log message ( #18992 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-06-01 11:42:23 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa54a7bf7b 
					 
					
						
						
							
							[BugFix] fix data parallel construct ipv6 url addres ( #18991 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-06-01 11:42:10 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ad6194a02 
					 
					
						
						
							
							Let max_num_batched_tokens use human_readable_int for large numbers ( #18968 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-01 11:41:29 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c594cbf565 
					 
					
						
						
							
							[doc] small fix -  mkdocs ( #18996 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-31 20:23:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a35ca765a5 
					 
					
						
						
							
							[LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components ( #18987 )  
						
						... 
						
						
						
						Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-01 11:06:57 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6aa8f9a4e7 
					 
					
						
						
							
							[Core] Rework dtype resolution ( #18751 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-06-01 11:04:23 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1bc86a3da1 
					 
					
						
						
							
							[Bugfix] Fix EAGLE3 broken logits ( #18909 )  
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-05-31 19:58:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bbfa0c61d1 
					 
					
						
						
							
							[Misc][Benchmark] Add support for CustomDataset ( #18511 )  
						
						
						
						
					 
					
						2025-05-31 19:07:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						20079c6e36 
					 
					
						
						
							
							[Misc] add return token strs for tokenize ( #18941 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-31 18:00:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9a1b9b99d7 
					 
					
						
						
							
							[BugFix] Fix multi-node offline data-parallel ( #18981 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com > 
						
						
					 
					
						2025-05-31 08:34:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8bf507d766 
					 
					
						
						
							
							[P/D] NixlConnector use cache device index for memory registration ( #18969 )  
						
						... 
						
						
						
						Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com > 
						
						
					 
					
						2025-05-31 11:19:18 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						306d60401d 
					 
					
						
						
							
							[ROCm][Kernel] Add gfx950 support for skinny gemms ( #18010 )  
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-05-31 07:40:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f2c3f66d59 
					 
					
						
						
							
							[Bugfix] Fix for issue 17396 ( #18773 )  
						
						... 
						
						
						
						Signed-off-by: Fred Reiss <frreiss@us.ibm.com > 
						
						
					 
					
						2025-05-31 11:58:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f5e0d567e 
					 
					
						
						
							
							[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 ( #18825 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-05-31 03:39:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c55d804672 
					 
					
						
						
							
							[BugFix] Pydantic part 2 ( #18911 )  
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-05-31 03:39:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						749f5bdd38 
					 
					
						
						
							
							[doc] fix the list rendering issue - security.md ( #18982 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-31 10:39:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2a50ef5760 
					 
					
						
						
							
							[Neuron] Add Multi-Modal model support for Neuron ( #18921 )  
						
						... 
						
						
						
						Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com >
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com >
Co-authored-by: FeliciaLuo <luof@amazon.com >
Co-authored-by: Elaine Zhao <elaineyz@amazon.com > 
						
						
					 
					
						2025-05-31 10:39:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b8b904795d 
					 
					
						
						
							
							fix security issue of logging llm output ( #18980 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com > 
						
						
					 
					
						2025-05-31 10:38:56 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ba5111f237 
					 
					
						
						
							
							[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled ( #18879 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-05-31 09:20:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1e123529d7 
					 
					
						
						
							
							[Misc] Fix estimated max model len msg ( #18966 )  
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-05-31 16:43:44 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dff80b0e42 
					 
					
						
						
							
							[Frontend] Add rerank support to run_batch endpoint ( #16278 )  
						
						... 
						
						
						
						Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io > 
						
						
					 
					
						2025-05-31 07:40:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7782464a17 
					 
					
						
						
							
							create util function for batched arange ( #18937 )  
						
						
						
						
					 
					
						2025-05-31 13:50:38 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f71e24034 
					 
					
						
						
							
							[Docs] Correct multiprocessing design doc ( #18964 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-05-31 01:30:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1dab4d5718 
					 
					
						
						
							
							Tool parser regex timeout handling ( #18960 )  
						
						... 
						
						
						
						Signed-off-by: Will Eaton <weaton@redhat.com > 
						
						
					 
					
						2025-05-30 21:02:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7f21e8052b 
					 
					
						
						
							
							[Misc] add group_size is -1 in awq quantization ( #18910 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-05-30 17:34:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5a8641638a 
					 
					
						
						
							
							[VLM] Add PP support and fix GPTQ inference for Ovis models ( #18958 )  
						
						... 
						
						
						
						Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-30 17:11:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f49239cb45 
					 
					
						
						
							
							Benchmark script for fp8 vs bf16 gemm ( #17126 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-30 10:56:11 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2dbe8c0774 
					 
					
						
						
							
							[Perf] API-server scaleout with many-to-many server-engine comms  ( #17546 )  
						
						
						
						
					 
					
						2025-05-30 08:17:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						84ec470fca 
					 
					
						
						
							
							Improve "failed to get the hash of the compiled graph" error ( #18956 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-05-30 15:00:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b29ca5c4d5 
					 
					
						
						
							
							[Docs] Update SECURITY.md with link to our security guide ( #18961 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-30 07:37:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec6833c5e9 
					 
					
						
						
							
							[doc] show the count for fork and watch ( #18950 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-30 06:45:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e1fadf1197 
					 
					
						
						
							
							[Feature] minicpm eagle support ( #18943 )  
						
						... 
						
						
						
						Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com >
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com > 
						
						
					 
					
						2025-05-30 06:45:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						43ff405b90 
					 
					
						
						
							
							[CI/Build] remove regex from build dependencies ( #18945 )  
						
						... 
						
						
						
						Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-05-30 04:02:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fba02e3bd1 
					 
					
						
						
							
							[Bugfix][TPU] Fix tpu model runner testcase failure ( #18810 )  
						
						... 
						
						
						
						Signed-off-by: Carol Zheng <cazheng@google.com > 
						
						
					 
					
						2025-05-30 18:04:03 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4577fc9abb 
					 
					
						
						
							
							[Misc]Fix typo ( #18947 )  
						
						
						
						
					 
					
						2025-05-30 02:21:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5f1d0c8118 
					 
					
						
						
							
							[Bugfix][Failing Test] Fix test_vllm_port.py ( #18618 )  
						
						... 
						
						
						
						Signed-off-by: rabi <ramishra@redhat.com > 
						
						
					 
					
						2025-05-30 17:13:47 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c3bb9f2331 
					 
					
						
						
							
							[Model] Use in-place adds in SigLIP ( #18922 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-05-30 17:12:59 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8f8900cee9 
					 
					
						
						
							
							[doc] add mkdocs doc ( #18930 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-30 07:58:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6acb7a6285 
					 
					
						
						
							
							[Misc]Fix benchmarks/README.md for speculative decoding ( #18897 )  
						
						... 
						
						
						
						Signed-off-by: rabi <ramishra@redhat.com > 
						
						
					 
					
						2025-05-30 07:58:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4f4a6b844a 
					 
					
						
						
							
							[Deprecation] Remove mean pooling default for Qwen2EmbeddingModel ( #18913 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-30 06:53:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4d0a1541be 
					 
					
						
						
							
							[Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy ( #18861 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-30 13:37:36 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						77b6e74fe2 
					 
					
						
						
							
							[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. ( #18938 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-05-29 22:33:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5acf828d99 
					 
					
						
						
							
							[docs] fix: fix markdown syntax ( #18927 )  
						
						
						
						
					 
					
						2025-05-30 05:20:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3987e2ae96 
					 
					
						
						
							
							[Model] Use AutoWeightsLoader for mamba2 ( #18918 )  
						
						... 
						
						
						
						Signed-off-by: iLeGend <824040212@qq.com > 
						
						
					 
					
						2025-05-30 04:50:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						77164dad5e 
					 
					
						
						
							
							[Bugfix] Consistent ascii handling in tool parsers ( #18883 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-05-30 04:44:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3de3eadf5b 
					 
					
						
						
							
							improve the robustness of parsing vlms config in AutoRound ( #18894 )  
						
						... 
						
						
						
						Signed-off-by: wenhuach21 <wenhua.cheng@intel.com > 
						
						
					 
					
						2025-05-29 19:24:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3132290a14 
					 
					
						
						
							
							[TPU][CI/CD] Clean up docker for TPU tests. ( #18926 )  
						
						... 
						
						
						
						Signed-off-by: Carol Zheng <cazheng@google.com > 
						
						
					 
					
						2025-05-30 10:24:19 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1aa2f81b43 
					 
					
						
						
							
							[Misc] Update type annotation for rotary embedding base ( #18914 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-30 10:17:01 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d54af615d5 
					 
					
						
						
							
							[Bugfix] Fix PP default fallback behavior for V1 ( #18915 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-30 10:13:17 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a1cc9f33a3 
					 
					
						
						
							
							[TPU] remove transpose ops in moe kernel ( #18923 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-05-29 23:00:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a521ef06e5 
					 
					
						
						
							
							Use standalone_compile by default in torch >= 2.8.0 ( #18846 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-05-30 06:41:58 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						64eaf5fe05 
					 
					
						
						
							
							[P/D] NixlConnector DP fixes ( #18903 )  
						
						... 
						
						
						
						Signed-off-by: Will Eaton <weaton@redhat.com > 
						
						
					 
					
						2025-05-29 18:08:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1d61f3351 
					 
					
						
						
							
							[BugFix] Make DP work with connector-delayed new requests ( #18559 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Will Eaton <weaton@redhat.com > 
						
						
					 
					
						2025-05-29 18:04:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32ce3cf7c9 
					 
					
						
						
							
							[V1] Allocate kv_cache with stride order for V1 ( #18775 )  
						
						... 
						
						
						
						Signed-off-by: nicklucche <nlucches@redhat.com > 
						
						
					 
					
						2025-05-29 17:54:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d58f9c7f7a 
					 
					
						
						
							
							[Misc] Remove duplicate init for self.vllm_config ( #18896 )  
						
						... 
						
						
						
						Signed-off-by: googs1025 <googs1025@gmail.com > 
						
						
					 
					
						2025-05-29 17:26:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c29034037d 
					 
					
						
						
							
							[Deprecation] Disallow pos-args other than model when initializing LLM ( #18802 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-29 09:36:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1b7cfd5a36 
					 
					
						
						
							
							[ROCm][V0][Attention] Revert to the previous FA triton kernel ( #18226 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-29 12:13:18 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						da4b69d0b4 
					 
					
						
						
							
							[Attention][V1] Toggle for v1 attention backend ( #18275 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-29 10:48:24 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c9479b2920 
					 
					
						
						
							
							[Bugfix] Fix the failing gte embedding test ( #18720 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-29 07:39:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6f2909405e 
					 
					
						
						
							
							[Doc]  Fix codeblocks formatting in LoRA adapters documentation ( #18907 )  
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-05-29 07:38:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b169d5f7b6 
					 
					
						
						
							
							[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp.  ( #18692 )  
						
						... 
						
						
						
						Signed-off-by: Duyi-Wang <duyi.wang@intel.com > 
						
						
					 
					
						2025-05-29 20:02:08 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f8977c233f 
					 
					
						
						
							
							Fix an error in dummy weight loading for quantization models ( #18855 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-05-29 03:07:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f274581f44 
					 
					
						
						
							
							[BugFix] Update pydantic to fix error on python 3.10 ( #18852 )  
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-05-29 03:05:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0b1447f890 
					 
					
						
						
							
							[Bugfix] Ensure tensors are contiguous during serialisation ( #18860 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-05-29 03:05:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						24d0ef8970 
					 
					
						
						
							
							[Misc] Replace TODO in serving transcription ( #18895 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-05-29 02:58:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7fcfd954ff 
					 
					
						
						
							
							[Bugfix] Fix misleading information in the documentation ( #18845 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-29 02:54:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e740d07f07 
					 
					
						
						
							
							[doc] add CLI doc ( #18871 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-29 09:51:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a652e71dd0 
					 
					
						
						
							
							[Doc] Remove redundant spaces from compatibility_matrix.md ( #18891 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-05-29 02:51:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						34d6c447c4 
					 
					
						
						
							
							[LoRA] Add LoRA support for InternVL  ( #18842 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-29 08:46:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						972eddf7c9 
					 
					
						
						
							
							[Neuron] Add multi-LoRA support for Neuron. ( #18284 )  
						
						... 
						
						
						
						Signed-off-by: Satyajith Chilappagari <satchill@amazon.com > 
						
						
					 
					
						2025-05-29 16:41:22 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fd7bb88d72 
					 
					
						
						
							
							Fixes a dead link in nightly benchmark readme ( #18856 )  
						
						... 
						
						
						
						Signed-off-by: Brent Salisbury <bsalisbu@redhat.com > 
						
						
					 
					
						2025-05-29 04:41:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c49dbdd03 
					 
					
						
						
							
							Skip device and quant Pydantic validation to make plugin device work ( #18843 )  
						
						... 
						
						
						
						Signed-off-by: Yikun Jiang <yikunkero@gmail.com > 
						
						
					 
					
						2025-05-28 20:12:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1661a9c28f 
					 
					
						
						
							
							[Doc][Neuron] Update documentation for Neuron ( #18868 )  
						
						... 
						
						
						
						Signed-off-by: Elaine Zhao <elaineyz@amazon.com > 
						
						
					 
					
						2025-05-28 19:44:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8e882ffdc0 
					 
					
						
						
							
							[Bugfix][TPU] fix moe custom kernel import ( #18853 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-05-28 19:34:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						26b4fa45be 
					 
					
						
						
							
							Add ability to use CUDAGraphs with use_inductor=False ( #17345 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-05-29 10:16:52 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						515b413ebf 
					 
					
						
						
							
							Prevent the cross-encoder logic from being applied to classification tasks ( #18838 )  
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-05-28 19:16:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						269d901734 
					 
					
						
						
							
							[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix ( #18100 )  
						
						... 
						
						
						
						Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-05-29 07:21:46 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7951d78738 
					 
					
						
						
							
							[Core] Enable CUDA graphs for DP + All2All kernels  ( #18724 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-05-28 22:55:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6dbe5b5c93 
					 
					
						
						
							
							Remove checks for None for fields which should never be None ( #17985 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-28 21:32:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						643622ba46 
					 
					
						
						
							
							[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend ( #15655 )  
						
						... 
						
						
						
						Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Signed-off-by: xihajun <junfan@krai.ai >
Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk >
Signed-off-by: Jorge de Freitas <jorge@krai.ai >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: xihajun <junfan@krai.ai >
Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk >
Co-authored-by: Jorge de Freitas <jorge@krai.ai > 
						
						
					 
					
						2025-05-28 19:59:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a09c7ca9f2 
					 
					
						
						
							
							[Chore][Spec Decode] Update check NoneType instead of assigning variables ( #18836 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-05-28 18:57:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0e98964e94 
					 
					
						
						
							
							[V1][Metrics] Remove metrics that were deprecated in 0.8 ( #18837 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-05-28 18:54:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c68b5c63eb 
					 
					
						
						
							
							[Misc] fix olmoe model layer can't laod in tp gt 1 ( #18828 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-05-28 17:36:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fced756923 
					 
					
						
						
							
							[Chore] update ty configuration ( #18839 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-05-28 08:59:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						321331b8ae 
					 
					
						
						
							
							[Core] Add Lora Support to Beam Search ( #18346 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-05-28 08:58:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e4cea1cc5 
					 
					
						
						
							
							decrement server_load on listen for disconnect ( #18784 )  
						
						... 
						
						
						
						Signed-off-by: Daniel Salib <danielsalib@meta.com > 
						
						
					 
					
						2025-05-28 22:15:12 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						435fa95444 
					 
					
						
						
							
							[Frontend] add run batch to CLI ( #18804 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-28 07:08:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4c2b38ce9e 
					 
					
						
						
							
							Enable Pydantic mypy checks and convert configs to Pydantic dataclasses ( #17599 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-28 12:46:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d781930f90 
					 
					
						
						
							
							[Platform][Dist] Make torch distributed process group extendable ( #18763 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-05-28 10:52:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce75efeecb 
					 
					
						
						
							
							[BugFix] FA2 MLA Accuracy Issue ( #18807 )  
						
						... 
						
						
						
						Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-28 08:59:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa42561e40 
					 
					
						
						
							
							Fix PiecewiseCompileInterpreter ( #17338 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-05-28 08:40:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						de65fc8e1e 
					 
					
						
						
							
							[CI] improve embed testing ( #18747 )  
						
						
						
						
					 
					
						2025-05-28 00:16:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0c492b7824 
					 
					
						
						
							
							[Deprecation] Remove fallbacks for Embeddings API ( #18795 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-28 15:09:04 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f0926b43f 
					 
					
						
						
							
							[Deprecation] Remove unused sync methods in async_timeout ( #18792 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-28 15:08:48 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7f2c1a87e9 
					 
					
						
						
							
							[Deprecation] Require overriding get_dummy_text and get_dummy_mm_data ( #18796 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-28 15:08:35 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b78f844a67 
					 
					
						
						
							
							[Bugfix][FailingTest]Fix test_model_load_with_params.py ( #18758 )  
						
						... 
						
						
						
						Signed-off-by: rabi <ramishra@redhat.com > 
						
						
					 
					
						2025-05-28 05:42:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5e13c07d00 
					 
					
						
						
							
							[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (2) ( #18781 )  
						
						... 
						
						
						
						Signed-off-by: Ronald Xu <ronaldxu@amazon.com > 
						
						
					 
					
						2025-05-28 05:09:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						774c5fde30 
					 
					
						
						
							
							[V1] fix torch profiling for V1 offline scenarios ( #18445 )  
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com > 
						
						
					 
					
						2025-05-28 04:16:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9a21e331ff 
					 
					
						
						
							
							[Bugfix]: correctly propagate errors message caught at the chat_templating step to the client ( #18769 )  
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-05-28 03:35:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e9ce609bd 
					 
					
						
						
							
							[Bugfix] Fix nomic max_model_len ( #18755 )  
						
						
						
						
					 
					
						2025-05-27 20:29:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						794ae1f551 
					 
					
						
						
							
							[rocm] Fix wrong attention log ( #18764 )  
						
						... 
						
						
						
						Signed-off-by: Felix Marty <felmarty@amd.com > 
						
						
					 
					
						2025-05-27 19:45:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d73a9457a5 
					 
					
						
						
							
							[Core] Improve Tensor serialisation ( #18774 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-05-28 09:46:21 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a3896c7f02 
					 
					
						
						
							
							[Build] Fixes for CMake install ( #18570 )  
						
						
						
						
					 
					
						2025-05-27 20:49:24 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						51e98e4ffd 
					 
					
						
						
							
							[Bugfix] Disable prefix caching by default for benchmark ( #18771 )  
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com > 
						
						
					 
					
						2025-05-28 08:18:09 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e56f44d9ec 
					 
					
						
						
							
							Support datasets in vllm bench serve and sync with benchmark_[serving,datasets].py ( #18566 )  
						
						
						
						
					 
					
						2025-05-27 19:59:48 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e0cbad4e30 
					 
					
						
						
							
							[Neuron] Support quantization on neuron ( #18283 )  
						
						... 
						
						
						
						Signed-off-by: Satyajith Chilappagari <satchill@amazon.com > 
						
						
					 
					
						2025-05-27 22:10:33 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b48d5cca16 
					 
					
						
						
							
							[CI/Build] [TPU] Fix TPU CI exit code ( #18282 )  
						
						... 
						
						
						
						Signed-off-by: Carol Zheng <cazheng@google.com > 
						
						
					 
					
						2025-05-27 14:54:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5873877241 
					 
					
						
						
							
							[Bugfix] Mistral tool calling when content is list ( #18729 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-27 09:05:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						696259ca01 
					 
					
						
						
							
							[Core] Automatically cast multi-modal input dtype ( #18756 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-27 23:45:48 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6b6d496114 
					 
					
						
						
							
							optimize get_kv_cache_torch_dtype ( #18531 )  
						
						... 
						
						
						
						Signed-off-by: idellzheng <idellzheng@tencent.com > 
						
						
					 
					
						2025-05-27 13:08:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aaa4ac1c95 
					 
					
						
						
							
							Disable prefix cache by default for benchmark ( #18639 )  
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com > 
						
						
					 
					
						2025-05-27 20:06:34 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						06a0338015 
					 
					
						
						
							
							[V1][Metrics] Add API for accessing in-memory Prometheus metrics ( #17010 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-05-27 09:37:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4318c0559d 
					 
					
						
						
							
							[CI/Build] Remove imports of built-in re ( #18750 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-27 09:19:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a68e293cb9 
					 
					
						
						
							
							[Doc]  Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking ( #18663 )  
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-05-27 01:44:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6881107948 
					 
					
						
						
							
							[BUG FIX] minicpm ( #18739 )  
						
						... 
						
						
						
						Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com >
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com > 
						
						
					 
					
						2025-05-27 01:04:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e0f0ff87b8 
					 
					
						
						
							
							[Build] fix cpu build missing libtbbmalloc.so ( #18744 )  
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-05-27 01:03:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c24b1572ac 
					 
					
						
						
							
							Minor fix about MooncakeStoreConnector ( #18721 )  
						
						... 
						
						
						
						Signed-off-by: baoloongmao <baoloongmao@tencent.com > 
						
						
					 
					
						2025-05-27 08:02:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4693a3438c 
					 
					
						
						
							
							[Doc] cleanup deprecated flag for doc ( #18715 )  
						
						... 
						
						
						
						Signed-off-by: calvin chen <120380290@qq.com > 
						
						
					 
					
						2025-05-27 07:12:02 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bbd9a84dc5 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh ( #18752 )  
						
						... 
						
						
						
						Signed-off-by: Lukasz Durejko <ldurejko@habana.ai > 
						
						
					 
					
						2025-05-27 00:10:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a547aeb828 
					 
					
						
						
							
							feat(rocm-support): support mamba2 on rocm ( #18565 )  
						
						... 
						
						
						
						Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai >
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai > 
						
						
					 
					
						2025-05-27 00:07:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc6d0c290f 
					 
					
						
						
							
							[Misc] improve docs ( #18734 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-27 07:07:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						753944fa9b 
					 
					
						
						
							
							[Doc] Update reproducibility doc and example ( #18741 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-27 07:03:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						25a817f202 
					 
					
						
						
							
							[Doc] Update OOT model docs ( #18742 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-27 06:30:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d260f799a9 
					 
					
						
						
							
							[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. ( #18271 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-05-26 23:14:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b50602d5f0 
					 
					
						
						
							
							[Model][Gemma3] Cast image pixel values already on CPU ( #18732 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-05-27 05:42:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1f1b1bc03b 
					 
					
						
						
							
							[V1][Quantization] Add CUDA graph compatible v1 GGUF support ( #18646 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-27 04:40:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1f88dbd2bb 
					 
					
						
						
							
							[Misc] improve web section group title display ( #18684 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-27 04:35:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0eebd74842 
					 
					
						
						
							
							[Model][Gemma3] Simplify image input validation ( #18710 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-05-27 11:13:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						27bebcd897 
					 
					
						
						
							
							Convert examples to ruff-format ( #18400 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-26 16:57:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e7523c2e03 
					 
					
						
						
							
							[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs ( #18608 )  
						
						
						
						
					 
					
						2025-05-26 11:49:36 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a869baca73 
					 
					
						
						
							
							[Bugfix] Fix Llama GGUF initialization ( #18717 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-26 07:49:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						82e2339b06 
					 
					
						
						
							
							[Doc] Move examples and further reorganize user guide ( #18666 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-26 07:38:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9553fdb41e 
					 
					
						
						
							
							[Doc] Improve API docs ( #18713 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-26 07:33:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						243eb9199f 
					 
					
						
						
							
							[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM ( #18701 )  
						
						
						
						
					 
					
						2025-05-26 07:10:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0665e29998 
					 
					
						
						
							
							[Misc] add AutoGen integration ( #18712 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-05-26 13:56:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e76be06550 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI ( #18709 )  
						
						... 
						
						
						
						Signed-off-by: Lukasz Durejko <ldurejko@habana.ai > 
						
						
					 
					
						2025-05-26 05:26:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0877750029 
					 
					
						
						
							
							[CI/Build] Split pooling and generation extended language models tests in CI ( #18705 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-26 04:00:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d68030f1c 
					 
					
						
						
							
							[Model] Add support for YARN in NemotronNAS models ( #18427 )  
						
						... 
						
						
						
						Signed-off-by: Nave Assaf <nassaf@nvidia.com > 
						
						
					 
					
						2025-05-26 10:31:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5a2c76cbe1 
					 
					
						
						
							
							[CI] fix dump_input for str type ( #18697 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-26 18:23:35 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						38b13dfe78 
					 
					
						
						
							
							[CI/Build] Replace math.isclose with pytest.approx ( #18703 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-26 02:05:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61a45e7a72 
					 
					
						
						
							
							[Bugfix] Fix Mistral-format models with sliding window ( #18693 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-26 01:44:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65523a0995 
					 
					
						
						
							
							[Doc] Fix issue template format ( #18699 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-26 00:45:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4b7740a105 
					 
					
						
						
							
							[GH] Add issue template for reporting CI failures ( #18696 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-26 00:42:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ea62c0ea0 
					 
					
						
						
							
							[CI] add missing argument ( #18694 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-26 00:22:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						561b77a0d6 
					 
					
						
						
							
							[Bugfix] Fix the lm_head in gpt_bigcode in lora mode ( #6357 )  
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com > 
						
						
					 
					
						2025-05-26 14:52:25 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						abd4030d94 
					 
					
						
						
							
							refactor: simplify request handler, use positive condition check for handler assignment ( #18690 )  
						
						... 
						
						
						
						Signed-off-by: googs1025 <googs1025@gmail.com > 
						
						
					 
					
						2025-05-26 06:32:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8820821b59 
					 
					
						
						
							
							[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example ( #18644 )  
						
						... 
						
						
						
						Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com >
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com >
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com > 
						
						
					 
					
						2025-05-26 13:51:27 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fba0642704 
					 
					
						
						
							
							[CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage ( #18683 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-25 20:27:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6071e989df 
					 
					
						
						
							
							[Core][Multimodal] Convert PIL Image to array without data copy when hashing ( #18682 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-05-25 17:33:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						57fd13a707 
					 
					
						
						
							
							[Bugfix] Fix profiling dummy data for Pixtral ( #18677 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-25 14:05:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a886bd58c 
					 
					
						
						
							
							[Misc] small improve ( #18680 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-25 06:05:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						35be8fad62 
					 
					
						
						
							
							[CI/build] fix no regex ( #18676 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-25 10:10:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f2faac745d 
					 
					
						
						
							
							[Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment ( #18674 )  
						
						... 
						
						
						
						Signed-off-by: zzzyq <zhangyuqi94@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-05-25 02:36:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						279f854519 
					 
					
						
						
							
							[doc] improve readability ( #18675 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-25 01:40:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						624b77a2b3 
					 
					
						
						
							
							[doc] fix broken links ( #18671 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-25 01:36:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						503f8487c2 
					 
					
						
						
							
							[Misc] Reduce logs on startup ( #18649 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-24 23:03:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						44073a7ac3 
					 
					
						
						
							
							[BUGFIX] catch subclass first for try...except ( #18672 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-25 05:34:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						63934543a0 
					 
					
						
						
							
							Speed up the kernels/quantization/ tests ( #18669 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-25 05:02:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						75f81750f3 
					 
					
						
						
							
							[VLM] Initialize video input support for InternVL models ( #18499 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-05-25 04:51:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6ab681bcbe 
					 
					
						
						
							
							[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE ( #18655 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-25 04:51:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cebc22f3b6 
					 
					
						
						
							
							[Misc]Replace cuda hard code with current_platform in Ray ( #14668 )  
						
						... 
						
						
						
						Signed-off-by: noemotiovon <757486878@qq.com > 
						
						
					 
					
						2025-05-24 20:26:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6c6dcd8611 
					 
					
						
						
							
							[MISC] correct signature for LoaderFunction ( #18670 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-24 20:17:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7891fdf0c6 
					 
					
						
						
							
							[V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... ( #18640 )  
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-05-24 20:07:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6825d9a998 
					 
					
						
						
							
							[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding ( #18668 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-05-24 17:33:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b554ab736e 
					 
					
						
						
							
							[CI/Build] fix permission denied issue ( #18645 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-24 16:09:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ea7f1abf3 
					 
					
						
						
							
							fix(regression): clone from reference items ( #18662 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-05-24 15:25:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2807271c86 
					 
					
						
						
							
							[CI] enforce import regex instead of re ( #18665 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-05-24 08:04:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b9018a3f9f 
					 
					
						
						
							
							[BugFix] Fix import error for fused_moe ( #18642 )  
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-05-24 07:53:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ceafb6299 
					 
					
						
						
							
							[MISC] typo fix and clean import ( #18664 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-24 07:52:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e6705784f 
					 
					
						
						
							
							[CI/Build] chmod +x to cleanup_pr_body.sh ( #18650 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-24 07:26:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1cb194a018 
					 
					
						
						
							
							[Doc] Reorganize user guide ( #18661 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-24 07:25:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2cd4d58df4 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for gpt2 ( #18625 )  
						
						... 
						
						
						
						Signed-off-by: zt2370 <ztang2370@gmail.com > 
						
						
					 
					
						2025-05-24 13:36:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d166a8d35 
					 
					
						
						
							
							[Doc] Add community links ( #18657 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-24 06:06:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ef1dd6870f 
					 
					
						
						
							
							[Doc] Fix indentation problems in V0 Paged Attention docs ( #18659 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-24 06:06:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e77dc4bad8 
					 
					
						
						
							
							[MISC][pre-commit] Add pre-commit check for triton import ( #17716 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-05-24 20:09:15 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						07458a51ce 
					 
					
						
						
							
							[Doc] Update README links, mark external links ( #18635 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-24 09:57:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c1e4a4052d 
					 
					
						
						
							
							[V1][Spec Decode] Support multi-layer eagle draft model ( #18030 )  
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com > 
						
						
					 
					
						2025-05-24 09:45:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a859320575 
					 
					
						
						
							
							[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) ( #18647 )  
						
						
						
						
					 
					
						2025-05-24 09:15:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						441dc63ac7 
					 
					
						
						
							
							[Frontend] improve vllm serve --help display ( #18643 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-24 07:53:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d55e446d13 
					 
					
						
						
							
							[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance ( #18424 )  
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com > 
						
						
					 
					
						2025-05-24 06:51:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec82c3e388 
					 
					
						
						
							
							FIX MOE issue in AutoRound format ( #18586 )  
						
						... 
						
						
						
						Signed-off-by: wenhuach21 <wenhua.cheng@intel.com > 
						
						
					 
					
						2025-05-23 22:01:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						45ab403a1f 
					 
					
						
						
							
							config.py: Clarify that only local GGUF checkpoints are supported. ( #18623 )  
						
						... 
						
						
						
						Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com > 
						
						
					 
					
						2025-05-24 08:46:34 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2b10ba7491 
					 
					
						
						
							
							[Bugfix][Nixl] Fix Preemption Bug ( #18631 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-05-23 23:30:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4fc1bf813a 
					 
					
						
						
							
							[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking ( #18454 )  
						
						... 
						
						
						
						Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com >
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com > 
						
						
					 
					
						2025-05-23 16:16:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f2036734fb 
					 
					
						
						
							
							[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation ( #18160 )  
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com > 
						
						
					 
					
						2025-05-23 15:52:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7d9216495c 
					 
					
						
						
							
							[Doc] Update references to doc files ( #18637 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-23 15:49:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0ddf88e16e 
					 
					
						
						
							
							[CI] Enable test_initialization to run on V1 ( #16736 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-23 15:09:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1645b60196 
					 
					
						
						
							
							Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI ( #18537 )  
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-05-23 21:17:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2628a69e35 
					 
					
						
						
							
							[V1] Support Deepseek MTP ( #18435 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-05-23 10:26:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						371f7e4ca2 
					 
					
						
						
							
							[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar ( #18627 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-23 10:22:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						15b45ffb9a 
					 
					
						
						
							
							[Doc] Avoid documenting dynamic / internal modules ( #18626 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-23 09:58:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						273cb3b4d9 
					 
					
						
						
							
							[Doc] Fix top-level API links/docs ( #18621 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-23 09:46:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8ddd1cf26a 
					 
					
						
						
							
							[Doc] fix list formatting ( #18624 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-05-23 09:41:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6550114c9c 
					 
					
						
						
							
							[v1] Redo "Support multiple KV cache groups in GPU model runner ( #17945 )" ( #18593 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-23 09:39:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9520a989df 
					 
					
						
						
							
							[Docs] Change mkdocs to not use directory urls ( #18622 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-23 09:33:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d28ad343f 
					 
					
						
						
							
							Fix figures in design doc ( #18612 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-23 09:09:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6a7988c55b 
					 
					
						
						
							
							Refactor pplx init logic to make it modular (prepare for deepep) ( #18200 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-05-23 23:43:43 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						022d8abe29 
					 
					
						
						
							
							[Doc] Use a different color for the announcement ( #18616 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-23 08:25:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5221815a00 
					 
					
						
						
							
							[Doc] Fix markdown list indentation for MkDocs rendering ( #18620 )  
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-05-23 08:23:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1068556b2c 
					 
					
						
						
							
							[Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS ( #18579 )  
						
						
						
						
					 
					
						2025-05-23 07:43:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2cd1fa4556 
					 
					
						
						
							
							[Misc] add Haystack integration ( #18601 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-23 06:21:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d4c2919760 
					 
					
						
						
							
							Include private attributes in API documentation ( #18614 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-23 06:18:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6220f3c6b0 
					 
					
						
						
							
							[Bugfix] Fix transformers model impl ignored for mixtral quant ( #18602 )  
						
						... 
						
						
						
						Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com > 
						
						
					 
					
						2025-05-23 05:54:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						52fb23f47e 
					 
					
						
						
							
							Fix examples with code blocks in docs ( #18609 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-23 05:53:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6dd51c7ef1 
					 
					
						
						
							
							[CI/Build] Fix V1 flag being set in entrypoints tests ( #18598 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-23 05:51:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2edb533af2 
					 
					
						
						
							
							Replace {func} with mkdocs style links ( #18610 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-23 05:51:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						38a95cb4a8 
					 
					
						
						
							
							[Doc] Fix indent of contributing to vllm ( #18611 )  
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-05-23 05:50:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cd821ea5d2 
					 
					
						
						
							
							[CI] fix kv_cache_type argument ( #18594 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-23 04:49:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7ab056c273 
					 
					
						
						
							
							[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to requirements/cpu.txt  ( #18542 )  
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-05-23 04:38:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6526e05111 
					 
					
						
						
							
							Add myself as docs code owner ( #18605 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-23 04:08:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e493e48524 
					 
					
						
						
							
							[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled ( #17731 )  
						
						... 
						
						
						
						Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-23 03:38:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ce64e2df4 
					 
					
						
						
							
							[Bugfix][Model] Fix baichuan model loader for tp ( #18597 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-05-23 02:39:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fbb13a2c15 
					 
					
						
						
							
							Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal ( #18034 )" ( #18600 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-23 02:18:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a1fe24d961 
					 
					
						
						
							
							Migrate docs from Sphinx to MkDocs ( #18145 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-23 02:09:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d0bc2f810b 
					 
					
						
						
							
							[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform ( #18430 )  
						
						... 
						
						
						
						Signed-off-by: Yuqi Zhang <yuqizhang@google.com >
Co-authored-by: Yuqi Zhang <yuqizhang@google.com > 
						
						
					 
					
						2025-05-23 01:41:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b046cf792d 
					 
					
						
						
							
							[Feature][V1]: suupports cached_tokens in response usage ( #18149 )  
						
						... 
						
						
						
						Co-authored-by: simon-mo <xmo@berkeley.edu > 
						
						
					 
					
						2025-05-23 01:41:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						54af915949 
					 
					
						
						
							
							[Doc] Update quickstart and install for cu128 using --torch-backend=auto ( #18505 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-23 08:36:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71ea614d4a 
					 
					
						
						
							
							[Feature]Add async tensor parallelism using compilation pass ( #17882 )  
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com > 
						
						
					 
					
						2025-05-23 01:03:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4c611348a7 
					 
					
						
						
							
							[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal ( #18034 )  
						
						... 
						
						
						
						Signed-off-by: Ronald Xu <ronaldxu@amazon.com > 
						
						
					 
					
						2025-05-23 00:37:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						60cad94b86 
					 
					
						
						
							
							[Hardware] correct method signatures for HPU,ROCm,XPU ( #18551 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-22 22:31:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9c1baa5bc6 
					 
					
						
						
							
							[Misc] Replace cuda hard code with current_platform ( #16983 )  
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-05-23 04:38:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4be2255c81 
					 
					
						
						
							
							[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key ( #17291 )  
						
						... 
						
						
						
						Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com > 
						
						
					 
					
						2025-05-23 12:30:47 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed5d408255 
					 
					
						
						
							
							[Neuron] Remove bypass on EAGLEConfig and add a test ( #18514 )  
						
						... 
						
						
						
						Signed-off-by: Elaine Zhao <elaineyz@amazon.com > 
						
						
					 
					
						2025-05-22 21:26:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						583507d130 
					 
					
						
						
							
							[Spec Decode] Make EAGLE3 draft token ID mapping optional ( #18488 )  
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-05-22 20:17:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e44d8ce8c7 
					 
					
						
						
							
							[Bugfix] Set KVTransferConfig.engine_id in post_init ( #18576 )  
						
						... 
						
						
						
						Signed-off-by: Linkun Chen <github@lkchen.net > 
						
						
					 
					
						2025-05-23 02:54:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93ecb8139c 
					 
					
						
						
							
							[BugFix] Increase TP execute_model timeout ( #18558 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-23 10:22:11 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fae453f8ce 
					 
					
						
						
							
							[Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs ( #18482 )  
						
						... 
						
						
						
						Signed-off-by: googs1025 <googs1025@gmail.com > 
						
						
					 
					
						2025-05-23 10:15:32 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4b0da7b60e 
					 
					
						
						
							
							Enable hybrid attention models for Transformers backend ( #18494 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-23 10:12:08 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c6b636f9fb 
					 
					
						
						
							
							[V1][Spec Decoding] Use model_loader.get_model() to load models ( #18273 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-05-23 02:05:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						04eb88dc80 
					 
					
						
						
							
							Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. ( #18569 )  
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-05-23 01:59:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						46791e1b4b 
					 
					
						
						
							
							[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh ( #18568 )  
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-05-22 18:45:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c32e249a23 
					 
					
						
						
							
							[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization ( #17926 )  
						
						... 
						
						
						
						Signed-off-by: Sanger Steel <sangersteel@gmail.com > 
						
						
					 
					
						2025-05-22 18:44:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c91fe7b1b9 
					 
					
						
						
							
							[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser ( #17917 )  
						
						... 
						
						
						
						Signed-off-by: Kai Wu <kaiwu@meta.com > 
						
						
					 
					
						2025-05-22 16:44:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a04720bc36 
					 
					
						
						
							
							[V1][Spec Decode][Bugfix] Load quantize weights for EAGLE ( #18290 )  
						
						
						
						
					 
					
						2025-05-22 15:17:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7b9d832c80 
					 
					
						
						
							
							[Tool] Add NIXL installation script ( #18172 )  
						
						... 
						
						
						
						Signed-off-by: Linkun <github@lkchen.net > 
						
						
					 
					
						2025-05-22 14:33:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e588da0f4 
					 
					
						
						
							
							[Build/CI] Fix CUDA 11.8 build ( #17679 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-22 12:13:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f8d2cc5f55 
					 
					
						
						
							
							[Compile][Platform] Make PiecewiseBackend pluggable and extendable ( #18076 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-05-22 12:11:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						721fb9b181 
					 
					
						
						
							
							[Platform] Move platform check to right place ( #18470 )  
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-05-22 12:11:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1f3a1200e4 
					 
					
						
						
							
							[Bugfix] make test_openai_schema.py pass ( #18224 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-22 18:34:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						54631f8262 
					 
					
						
						
							
							[Misc] Call ndarray.tobytes() directly instead of ndarray.data.tobytes() ( #18347 )  
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-05-22 09:00:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb506ecb5a 
					 
					
						
						
							
							[Misc] improve Automatic Prefix Caching example ( #18554 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-22 14:50:46 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93f71673ce 
					 
					
						
						
							
							[BugFix][CPU] Fix x86 SHM distributed module initialization ( #18536 )  
						
						... 
						
						
						
						Signed-off-by: jiang.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-05-22 07:35:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3f505233fd 
					 
					
						
						
							
							[Doc] Add stream flag for chat completion example ( #18524 )  
						
						... 
						
						
						
						Signed-off-by: calvin chen <120380290@qq.com > 
						
						
					 
					
						2025-05-22 14:07:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e04eceb58 
					 
					
						
						
							
							[Bugfix] Use random hidden states in dummy sampler run ( #18543 )  
						
						... 
						
						
						
						Signed-off-by: Bowen Wang <abmfy@icloud.com > 
						
						
					 
					
						2025-05-22 06:48:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71075029f2 
					 
					
						
						
							
							[Doc] Support --stream arg in openai_completion_client.py script ( #18388 )  
						
						... 
						
						
						
						Signed-off-by: googs1025 <googs1025@gmail.com > 
						
						
					 
					
						2025-05-22 13:20:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ca86a7cf6e 
					 
					
						
						
							
							[CI/Build] Update bamba test model location ( #18544 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-22 06:01:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a35a494745 
					 
					
						
						
							
							[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible ( #18513 )  
						
						... 
						
						
						
						Signed-off-by: Linkun <github@lkchen.net > 
						
						
					 
					
						2025-05-22 05:24:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f6037d1907 
					 
					
						
						
							
							[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18526 )  
						
						... 
						
						
						
						Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-22 05:22:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fa72f9a812 
					 
					
						
						
							
							Order sequence ids + config update to support specifying custom quantization layers ( #18279 )  
						
						... 
						
						
						
						Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Tailin Pan <tailinpa@amazon.com >
Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com >
Co-authored-by: Yishan McNabb <yishanm@amazon.com >
Co-authored-by: Patrick Lange <patlange@amazon.com >
Co-authored-by: Maxwell Goldberg <mgld@amazon.com >
Co-authored-by: Aakash Shetty <sheaak@amazon.com > 
						
						
					 
					
						2025-05-22 02:20:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ebed81fbf5 
					 
					
						
						
							
							Update default neuron config for speculation ( #18274 )  
						
						... 
						
						
						
						Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com >
Co-authored-by: Aakash Shetty <sheaak@amazon.com > 
						
						
					 
					
						2025-05-22 02:18:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e2d7d31244 
					 
					
						
						
							
							[Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) ( #18512 )  
						
						... 
						
						
						
						Signed-off-by: Satyajith Chilappagari <satchill@amazon.com > 
						
						
					 
					
						2025-05-22 02:17:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23b67b37b2 
					 
					
						
						
							
							[Doc] Fix invalid JSON in example args ( #18527 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-22 07:11:46 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						db5a29ba19 
					 
					
						
						
							
							[Bugfix] Fix LoRA test ( #18518 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-21 21:48:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						51797775c3 
					 
					
						
						
							
							[Bugfix][Model] Make Olmo2Model weight loading return loaded weights ( #18504 )  
						
						... 
						
						
						
						Signed-off-by: Shane A <shanea@allenai.org > 
						
						
					 
					
						2025-05-21 21:17:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cf5984b2fe 
					 
					
						
						
							
							[BugFix][DP] Send DP wave completion only from dp_rank==0 ( #18502 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-05-21 20:25:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d022115cc6 
					 
					
						
						
							
							[Bugfix] Inconsistent token calculation compared to HF in llava family ( #18479 )  
						
						... 
						
						
						
						Signed-off-by: jaycha <jaycha@ncsoft.com > 
						
						
					 
					
						2025-05-21 20:21:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						acb54ca8e1 
					 
					
						
						
							
							Intialize io_thread_pool attribute in the beginning. ( #18331 )  
						
						... 
						
						
						
						Signed-off-by: rabi <ramishra@redhat.com > 
						
						
					 
					
						2025-05-21 20:21:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e0fd34d3c 
					 
					
						
						
							
							[CI] Fix race condition with StatelessProcessGroup.barrier ( #18506 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-21 20:19:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						176d62e4ea 
					 
					
						
						
							
							[MISC] update project urls in pyproject.toml ( #18519 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-21 20:17:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						20bd6f4d2e 
					 
					
						
						
							
							[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) ( #18500 )  
						
						... 
						
						
						
						Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae >
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae > 
						
						
					 
					
						2025-05-21 19:23:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1f079540db 
					 
					
						
						
							
							[Bugfix] Consistent ascii handling in tool parsers ( #17704 )  
						
						... 
						
						
						
						Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com > 
						
						
					 
					
						2025-05-21 20:41:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						94d8ec8d2b 
					 
					
						
						
							
							[FEAT][ROCm] Upgrade AITER MLA v1 backend ( #18338 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-05-21 10:34:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bb0a311213 
					 
					
						
						
							
							Revert "[v1] Support multiple KV cache groups in GPU model runner ( #17945 ) ( #18459 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-05-21 10:25:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dd5fa7e04f 
					 
					
						
						
							
							[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 ( #17004 )  
						
						... 
						
						
						
						Signed-off-by: Hosang Yoon <hosang.yoon@amd.com > 
						
						
					 
					
						2025-05-21 08:35:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2b16104557 
					 
					
						
						
							
							[Misc] Update deprecation message for --enable-reasoning ( #18404 )  
						
						
						
						
					 
					
						2025-05-21 07:33:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						371376f996 
					 
					
						
						
							
							[Build] fix Dockerfile shell ( #18402 )  
						
						
						
						
					 
					
						2025-05-21 07:32:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c6c10ca920 
					 
					
						
						
							
							[Bugfix] Reduce moe_sum test size to avoid OOM ( #18484 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-05-21 06:46:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c154d89306 
					 
					
						
						
							
							[Doc] fix arg docstring in linear layers ( #18410 )  
						
						... 
						
						
						
						Signed-off-by: giantcroc <1204449533@qq.com > 
						
						
					 
					
						2025-05-21 06:45:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eca18691d2 
					 
					
						
						
							
							[MODEL] FalconH1 ( #18406 )  
						
						... 
						
						
						
						Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae >
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae > 
						
						
					 
					
						2025-05-21 04:59:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61acfc45bc 
					 
					
						
						
							
							[Bugfix][Failing Test] Fix test_events.py ( #18460 )  
						
						... 
						
						
						
						Signed-off-by: rabi <ramishra@redhat.com > 
						
						
					 
					
						2025-05-21 04:57:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						107f5fc4cb 
					 
					
						
						
							
							[Misc] refactor disaggregated-prefill-v1 example ( #18474 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-21 11:10:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						907f935de9 
					 
					
						
						
							
							[V1] Fix general plugins not loaded in engine for multiproc ( #18326 )  
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-05-21 01:21:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5d7f545204 
					 
					
						
						
							
							[Frontend] deprecate --device arg ( #18399 )  
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-05-21 01:21:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cd8dfc6dfc 
					 
					
						
						
							
							[Misc] MultiConnector._connectors type ( #18423 )  
						
						... 
						
						
						
						Signed-off-by: nicklucche <nlucches@redhat.com > 
						
						
					 
					
						2025-05-20 22:48:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d06dd72ba9 
					 
					
						
						
							
							[Bugfix][Failing Test] Fix nixl connector test when promt size < block size ( #18429 )  
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-05-20 22:41:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ad0012a0ac 
					 
					
						
						
							
							Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18407 )" ( #18456 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-20 22:39:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						92247c522e 
					 
					
						
						
							
							[Bug] Fix moe_sum signature ( #18440 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-05-20 22:37:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0c15c2e486 
					 
					
						
						
							
							[Bugfix] config.head_dim is now explicitly set to None ( #18432 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-20 21:04:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3b17ea26e4 
					 
					
						
						
							
							[TPU] Re-enable the Pallas MoE kernel ( #18025 )  
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-20 19:52:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23baa2180b 
					 
					
						
						
							
							fix:Build torch wheel inline rather than picking from nightly ( #18351 )  
						
						... 
						
						
						
						Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com > 
						
						
					 
					
						2025-05-20 22:22:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						980a172474 
					 
					
						
						
							
							[Kernel] update comment for KV shape in unified triton attn ( #18099 )  
						
						... 
						
						
						
						Signed-off-by: haochengxia <xhc_1007@163.com > 
						
						
					 
					
						2025-05-20 11:19:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e1f5a71ed7 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for bloom ( #18300 )  
						
						... 
						
						
						
						Signed-off-by: calvin chen <120380290@qq.com > 
						
						
					 
					
						2025-05-20 09:40:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f4a8a37465 
					 
					
						
						
							
							[Minor] Rename quantization nvfp4 to modelopt_fp4 ( #18356 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-20 09:08:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8f55962a7f 
					 
					
						
						
							
							[Misc] refactor prompt embedding examples ( #18405 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-20 15:26:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						be48360c1f 
					 
					
						
						
							
							[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18407 )  
						
						... 
						
						
						
						Co-authored-by: 松灵 <wpf272043@alibaba-inc.com > 
						
						
					 
					
						2025-05-20 06:59:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						86847700d7 
					 
					
						
						
							
							[CI] Add mteb testing to test the accuracy of the embedding model ( #17175 )  
						
						
						
						
					 
					
						2025-05-20 06:51:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6c86d09ae 
					 
					
						
						
							
							Update cpu.txt ( #18398 )  
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-05-20 10:53:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6b35cb10a0 
					 
					
						
						
							
							[Misc] Add LoRA code owner ( #18387 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-20 03:27:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1b1e8e05ff 
					 
					
						
						
							
							[doc] update env variable export ( #18391 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-20 08:53:27 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bca55b556f 
					 
					
						
						
							
							[Bugfix] fix adding bias twice in ipex GPTQ quantization ( #18363 )  
						
						... 
						
						
						
						Signed-off-by: rand-fly <randfly@outlook.com > 
						
						
					 
					
						2025-05-20 00:54:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d981396778 
					 
					
						
						
							
							[release] Change dockerhub username for TPU release ( #18389 )  
						
						
						
						
					 
					
						2025-05-19 23:49:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9609327fa4 
					 
					
						
						
							
							[Core] [Bugfix]: tensor parallel with prompt embeds ( #18171 )  
						
						... 
						
						
						
						Signed-off-by: Nan2018 <nan@protopia.ai >
Co-authored-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-05-19 20:21:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f07a673eb2 
					 
					
						
						
							
							[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name ( #18358 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-19 20:20:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d565e0976f 
					 
					
						
						
							
							[neuron] fix authorization issue ( #18364 )  
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com > 
						
						
					 
					
						2025-05-19 23:30:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						258bf621d5 
					 
					
						
						
							
							fix CUDA_check redefinition in  #17918  ( #18287 )  
						
						... 
						
						
						
						Signed-off-by: Lucia Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com > 
						
						
					 
					
						2025-05-19 13:42:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc1440cf9f 
					 
					
						
						
							
							Neuron up mistral ( #18222 )  
						
						... 
						
						
						
						Signed-off-by: Satyajith Chilappagari <satchill@amazon.com > 
						
						
					 
					
						2025-05-19 09:54:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8171221834 
					 
					
						
						
							
							[Misc] Fix typo ( #18330 )  
						
						
						
						
					 
					
						2025-05-19 09:51:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7937c2fd52 
					 
					
						
						
							
							Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup ( #18337 )  
						
						
						
						
					 
					
						2025-05-19 09:49:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e2ee1e8e9e 
					 
					
						
						
							
							[Feature]Add support for models quantized with AutoRound ( #17850 )  
						
						... 
						
						
						
						Signed-off-by: wenhuach21 <wenhua.cheng@intel.com > 
						
						
					 
					
						2025-05-19 09:38:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						20d8ce81eb 
					 
					
						
						
							
							[Frontend] add --quick option for vllm chat/complete ( #18297 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-19 09:36:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						84ab4feb7e 
					 
					
						
						
							
							[Doc] Fix typo ( #18355 )  
						
						
						
						
					 
					
						2025-05-19 16:05:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6781af5608 
					 
					
						
						
							
							[Quantization] Pool model support bitsandbytes ( #18087 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-19 09:03:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1b15df2546 
					 
					
						
						
							
							[BugFix] Fix handling of num_computed_tokens with connector ( #18232 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com > 
						
						
					 
					
						2025-05-19 09:03:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						43b5f61dce 
					 
					
						
						
							
							[Doc] Move input-related docs to Features ( #18353 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-19 15:08:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c5bb0ebdc6 
					 
					
						
						
							
							[Doc] Fix prompt embedding examples ( #18350 )  
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com > 
						
						
					 
					
						2025-05-19 06:48:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d637b96099 
					 
					
						
						
							
							[BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS ( #18319 )  
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com >
Signed-off-by: shaoyuyoung <shaoyuyoung@gmail.com >
Co-authored-by: cascade <cascade812@outlook.com > 
						
						
					 
					
						2025-05-19 01:31:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						275c5daeb0 
					 
					
						
						
							
							fix: Add type specifications for CLI arguments in tensorizer options ( #18314 )  
						
						
						
						
					 
					
						2025-05-18 23:42:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						47fda6d089 
					 
					
						
						
							
							[Build] Supports CUDA 12.6 and 11.8 after Blackwell Update ( #18316 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-05-18 23:19:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						27d0952600 
					 
					
						
						
							
							[Misc] extract parser.parse_args() ( #18323 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-19 04:06:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						221cfc2fea 
					 
					
						
						
							
							Feature/vllm/input embedding completion api ( #17590 )  
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: Nan2018 <nan@protopia.ai >
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com >
Co-authored-by: Bryce1010 <bryceyx@gmail.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-05-18 20:18:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9da1095daf 
					 
					
						
						
							
							[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa ( #18175 )  
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-05-18 19:49:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1211f8794 
					 
					
						
						
							
							[Doc] Add doc to explain the usage of Qwen3 thinking ( #18291 )  
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-05-18 23:04:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6a6e7a529 
					 
					
						
						
							
							[Misc] add litellm integration ( #18320 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-18 15:32:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4fb349f66a 
					 
					
						
						
							
							Fix copy-paste error in phi4mm image processing ( #18315 )  
						
						... 
						
						
						
						Signed-off-by: Lifu Huang <lifu.hlf@gmail.com > 
						
						
					 
					
						2025-05-18 07:00:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						908733aca7 
					 
					
						
						
							
							[Model] Use sigmoid for single-label classification ( #18313 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-05-18 07:00:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1a8f68bb90 
					 
					
						
						
							
							[doc] update reasoning doc ( #18306 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-18 06:59:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ab2c02ff8 
					 
					
						
						
							
							Support sequence parallelism combined with pipeline parallelism ( #18243 )  
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com > 
						
						
					 
					
						2025-05-17 22:47:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						66e63e86ec 
					 
					
						
						
							
							[MISC] fix typo ( #18305 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-17 10:52:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9214e60631 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for solar ( #18113 )  
						
						
						
						
					 
					
						2025-05-17 00:24:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f880d42582 
					 
					
						
						
							
							Fixed build on ppc64le due to openssl conflicts ( #18262 )  
						
						... 
						
						
						
						Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com > 
						
						
					 
					
						2025-05-17 00:23:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dcfe95234c 
					 
					
						
						
							
							Update Dockerfile to build for Blackwell ( #18095 )  
						
						
						
						
					 
					
						2025-05-17 00:23:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						48ac2bed5b 
					 
					
						
						
							
							[Hardware][TPU] Optionally import for TPU backend ( #18269 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
Co-authored-by: Carol Zheng <cazheng@google.com >
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com >
Co-authored-by: Hongmin Fan <fanhongmin@google.com > 
						
						
					 
					
						2025-05-17 15:23:12 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e0d435027 
					 
					
						
						
							
							[P/D][V1] Support dynamic loading of external KV connector implementations ( #18142 )  
						
						... 
						
						
						
						Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com > 
						
						
					 
					
						2025-05-17 06:40:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ee4826ede 
					 
					
						
						
							
							[BugFix] Correct max_model_len derivation from config.json for Mistral format  ( #17937 )  
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
Co-authored-by: tracelogfb <48808670+tracelogfb@users.noreply.github.com >
Co-authored-by: Stephen Chen <tracelog@meta.com > 
						
						
					 
					
						2025-05-17 04:20:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						60017dc841 
					 
					
						
						
							
							[Misc] reformat the collect-env output ( #18285 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-16 19:46:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55f1a468d9 
					 
					
						
						
							
							Move cli args docs to its own page ( #18228 ) ( #18264 )  
						
						... 
						
						
						
						Signed-off-by: Trevor Royer <troyer@redhat.com > 
						
						
					 
					
						2025-05-16 19:43:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fd195b194e 
					 
					
						
						
							
							[V1][P/D] Local attention optimization for NIXL ( #18170 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-16 21:16:33 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fabe89bbc4 
					 
					
						
						
							
							[Spec Decode] Don't fall back to V0 when spec decoding is enabled ( #18265 )  
						
						
						
						
					 
					
						2025-05-16 16:10:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e73b7dfd69 
					 
					
						
						
							
							[Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order  ( #18245 )  
						
						
						
						
					 
					
						2025-05-16 16:02:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7fdfa01530 
					 
					
						
						
							
							[Sampler] Adapt to FlashInfer 0.2.3 sampler API ( #15777 )  
						
						... 
						
						
						
						Signed-off-by: Bowen Wang <abmfy@icloud.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-16 15:14:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aef94c6d07 
					 
					
						
						
							
							[CI] Assign reviewer to mergify with changes to Tensorizer files ( #18278 )  
						
						
						
						
					 
					
						2025-05-16 12:04:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0ceaebf87b 
					 
					
						
						
							
							[BugFix] Fix ordering of KVConnector finished send/rcv sets ( #18211 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-16 09:20:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1db4f47f81 
					 
					
						
						
							
							[BugFix] Fix multi async save in MultiConnector ( #18246 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-16 08:13:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d3d91b6f71 
					 
					
						
						
							
							[Misc][MacOS] fix bfloat16 error ( #18249 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-16 15:05:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						87d871470d 
					 
					
						
						
							
							[Model] Use autoweightloader for dbrx ( #18251 )  
						
						... 
						
						
						
						Signed-off-by: learner0810 <zhongjun.li@daocloud.io > 
						
						
					 
					
						2025-05-16 07:54:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a5f8c111c2 
					 
					
						
						
							
							[Fix] Fix typo in resolve_hf_chat_template ( #18259 )  
						
						... 
						
						
						
						Signed-off-by: Felix Marty <felmarty@amd.com > 
						
						
					 
					
						2025-05-16 14:52:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e23564cb70 
					 
					
						
						
							
							use ceil_div in cutlass block scaling shape check ( #17918 )  
						
						
						
						
					 
					
						2025-05-16 03:02:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						390ec88905 
					 
					
						
						
							
							[Misc] Consolidate Audio tests into multimodal common generation tests ( #18214 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-16 09:18:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						541817670c 
					 
					
						
						
							
							[Misc] Add Ray Prometheus logger to V1 ( #17925 )  
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-05-16 01:02:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						67da5720d4 
					 
					
						
						
							
							[PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding ( #17973 )  
						
						... 
						
						
						
						Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai > 
						
						
					 
					
						2025-05-15 23:31:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5c04bb8b86 
					 
					
						
						
							
							[doc] fix multimodal example script ( #18089 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-05-16 06:05:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d2779c29a 
					 
					
						
						
							
							[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 ( #17827 )  
						
						... 
						
						
						
						Signed-off-by: Lucia Fang <fanglu@fb.com > 
						
						
					 
					
						2025-05-15 22:28:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6b31c84aff 
					 
					
						
						
							
							Throw better error for when running into k8s service discovery issue ( #18209 )  
						
						... 
						
						
						
						Signed-off-by: Will Eaton <weaton@redhat.com > 
						
						
					 
					
						2025-05-15 21:07:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b18201fe06 
					 
					
						
						
							
							Allow users to pass arbitrary JSON keys from CLI ( #18208 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-15 21:05:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f4937a51c1 
					 
					
						
						
							
							[Model] vLLM v1 supports Medusa ( #17956 )  
						
						... 
						
						
						
						Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com >
Signed-off-by: skylee-01 <497627264@qq.com >
Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com > 
						
						
					 
					
						2025-05-15 21:05:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ee659e3b60 
					 
					
						
						
							
							[Bugfix][ROCm] Use chunked_prefill_paged_decode as fallback for V1 attention on ROCm ( #18093 )  
						
						... 
						
						
						
						Signed-off-by: kf <kuanfu.liu@embeddedllm.com > 
						
						
					 
					
						2025-05-15 19:30:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e1c6a0264 
					 
					
						
						
							
							[Bugfix] fix rotary embedding test for _get_padded_tensor_shape ( #18229 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-16 01:32:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c7852a6d9b 
					 
					
						
						
							
							[Build] Allow shipping PTX on a per-file basis ( #18155 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-15 16:41:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8795eb9975 
					 
					
						
						
							
							[Bugfix] Fix test_eagle test ( #18223 )  
						
						... 
						
						
						
						Signed-off-by: Lucia Fang <fanglu@fb.com > 
						
						
					 
					
						2025-05-15 15:59:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0b34593017 
					 
					
						
						
							
							Adding "AMD: Tensorizer Test" to amdproduction. ( #18216 )  
						
						
						
						
					 
					
						2025-05-15 11:01:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e3f3aee6f4 
					 
					
						
						
							
							[Misc] Avoid cuda graph log when sizes still match ( #18202 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-05-15 09:59:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						92540529c0 
					 
					
						
						
							
							[Bugfix] [ROCm]: Remove assertion logic when using AITER fused moe in unquantizedMethod to reenable LLama4 BF16 ( #18205 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-05-15 09:53:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fadb8d5c2d 
					 
					
						
						
							
							[Bugfix]Change the exception thrown by call_hf_processor from RuntimeError to ValueError ( #18181 )  
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com > 
						
						
					 
					
						2025-05-15 09:01:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2aa5470ac5 
					 
					
						
						
							
							[Frontend] Fix chat template content format detection ( #18190 )  
						
						... 
						
						
						
						Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com > 
						
						
					 
					
						2025-05-15 09:00:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						51ff154639 
					 
					
						
						
							
							Improve examples rendering in docs and GitHub ( #18203 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-15 15:57:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						566ec04c3d 
					 
					
						
						
							
							Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline ( #18106 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-05-15 08:49:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						01c22335ba 
					 
					
						
						
							
							[Kernel] [V1] Fix performance regression for triton unified attention ( #18161 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-15 06:39:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						451da4bcbd 
					 
					
						
						
							
							add tools into TokenizeChatRequest ( #18187 )  
						
						... 
						
						
						
						Signed-off-by: yangxia <yangxiast@gmail.com > 
						
						
					 
					
						2025-05-15 04:01:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						07ad27121f 
					 
					
						
						
							
							Update deprecated type hinting in model_loader ( #18130 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-15 04:00:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a9944aabfa 
					 
					
						
						
							
							fix: typos ( #18151 )  
						
						... 
						
						
						
						Signed-off-by: omahs <73983677+omahs@users.noreply.github.com > 
						
						
					 
					
						2025-05-15 02:16:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a8f5aec20a 
					 
					
						
						
							
							[V1] Update zmq socket creation in nixl connector ( #18148 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-14 23:17:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						de71fec81b 
					 
					
						
						
							
							[CI] don't skip fixed test_kv_cache_events() ( #18183 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-05-14 23:17:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70f8b96724 
					 
					
						
						
							
							[Bugfix] Fix FusedMoEPrepareAndFinalize for cuda-disalike backends ( #18178 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-05-14 23:16:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dd2a94596a 
					 
					
						
						
							
							[Model] Allow the use of sliding window in Qwen2  ( #17772 )  
						
						... 
						
						
						
						Signed-off-by: inkcherry <mingzhi.liu@intel.com > 
						
						
					 
					
						2025-05-14 22:29:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						420caf7557 
					 
					
						
						
							
							[UT] Add ut for none hash ( #17892 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-05-15 13:28:11 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4f07a64075 
					 
					
						
						
							
							Support custom implementations of VideoLoader backends. ( #18091 )  
						
						
						
						
					 
					
						2025-05-15 13:26:49 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e6b8e65d2d 
					 
					
						
						
							
							[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 ( #18013 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-15 13:26:34 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						26d0419309 
					 
					
						
						
							
							Update deprecated type hinting in models ( #18132 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-14 22:06:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83f74c698f 
					 
					
						
						
							
							[Fix][ROCm] Enforce eager for all encoder-decoder models on ROCm ( #18154 )  
						
						... 
						
						
						
						Signed-off-by: Luka Govedič <lgovedic@redhat.com > 
						
						
					 
					
						2025-05-14 22:04:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2dff093574 
					 
					
						
						
							
							[Misc] add lobe-chat support ( #18177 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-15 05:02:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						afe3236e90 
					 
					
						
						
							
							[Chore] astral's ty ( #18116 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-05-15 05:00:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65334ef3b9 
					 
					
						
						
							
							[V1][Metrics] Remove unused code ( #18158 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-05-14 20:13:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e60f550b38 
					 
					
						
						
							
							[v1] Support multiple KV cache groups in GPU model runner ( #17945 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-14 18:54:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f25e0d1125 
					 
					
						
						
							
							[Bugfix]: make most of test_openai_schema.py pass ( #17664 )  
						
						
						
						
					 
					
						2025-05-14 17:04:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						09f106a91e 
					 
					
						
						
							
							Upload vllm index for the rc builds ( #18173 )  
						
						
						
						
					 
					
						2025-05-14 16:35:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2142035b51 
					 
					
						
						
							
							[V1] Support multiple kv connectors ( #17564 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-14 16:28:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						78aa341d12 
					 
					
						
						
							
							[CI] Fix race condition in test_kv_cache_events test ( #18169 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-14 16:27:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7974736740 
					 
					
						
						
							
							Add support for loading torchao models with AOPerModuleConfig ( #17826 )  
						
						... 
						
						
						
						Signed-off-by: Jerry Zhang <jerryzh168@gmail.com > 
						
						
					 
					
						2025-05-14 16:24:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2fc9075b82 
					 
					
						
						
							
							[V1] Structured Outputs + Thinking compatibility ( #16577 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-14 15:45:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d93c976a0d 
					 
					
						
						
							
							[Kernel] Have rotary embeddings support tensors ( #18046 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-14 15:43:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						749f792553 
					 
					
						
						
							
							[Frontend] decrease import time of vllm.multimodal ( #18031 )  
						
						... 
						
						
						
						Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com > 
						
						
					 
					
						2025-05-14 15:43:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						856865008e 
					 
					
						
						
							
							[CI] Disable Failing Tests ( #18165 )  
						
						
						
						
					 
					
						2025-05-14 13:49:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f9c069c85e 
					 
					
						
						
							
							Modularize fused experts and integrate PPLX kernels ( #15956 )  
						
						
						
						
					 
					
						2025-05-14 13:11:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						418d2f8bfb 
					 
					
						
						
							
							[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model ( #17326 )  
						
						... 
						
						
						
						Co-authored-by: root <root@ekagra-8xh100.us-east5-a .c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-05-14 12:31:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						964472b966 
					 
					
						
						
							
							[Doc] Update prefix cache metrics to counting tokens ( #18138 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-14 15:23:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						59dd311cf5 
					 
					
						
						
							
							[KVConnector] Keep KVTransferParams as a dict ( #18033 )  
						
						
						
						
					 
					
						2025-05-14 08:05:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d066e52013 
					 
					
						
						
							
							[Bugfix] Fix chat utils tests ( #18139 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-14 05:38:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c8ea982d9b 
					 
					
						
						
							
							Update deprecated type hinting in platform, plugins, triton_utils, vllm_flash_attn ( #18129 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-14 05:28:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc372b9c8a 
					 
					
						
						
							
							Update deprecated type hinting in vllm/device_allocator and vllm/distributed ( #18126 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-14 04:07:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9b5b39b650 
					 
					
						
						
							
							Update deprecated type hinting in vllm/lora ( #18128 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-14 03:57:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ccc6ded42 
					 
					
						
						
							
							[doc] add missing import ( #18133 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-14 10:57:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d62a076e84 
					 
					
						
						
							
							[Model] GritLM supports other attention backends ( #18109 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-14 03:33:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						259127f8b8 
					 
					
						
						
							
							[Bugfix] Fix LoRA test ( #18123 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-14 10:25:47 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						612c2edb4f 
					 
					
						
						
							
							[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support ( #17110 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-14 03:03:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						38fe728d60 
					 
					
						
						
							
							[Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile ( #17844 )  
						
						... 
						
						
						
						Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai > 
						
						
					 
					
						2025-05-14 09:39:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						82e7f9bb03 
					 
					
						
						
							
							[Misc] replace does not exist model ( #18119 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-05-14 02:13:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						63dc3426e0 
					 
					
						
						
							
							[Model] Add packed_modules_mapping for Qwen3-MOE ( #18118 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-14 02:13:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8f5dc41481 
					 
					
						
						
							
							[Bugfix] Fix entrypoints audio test failure ( #18111 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-14 09:08:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						63ad622233 
					 
					
						
						
							
							[New Model]: support GTE NewModel ( #17986 )  
						
						
						
						
					 
					
						2025-05-14 01:31:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e7ef61c1f0 
					 
					
						
						
							
							[Bugfix][Example] make lmcache v0 work. ( #18051 )  
						
						... 
						
						
						
						Signed-off-by: Ma, Jianpeng <jianpeng.ma@intel.com > 
						
						
					 
					
						2025-05-13 23:43:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d4154c35a2 
					 
					
						
						
							
							[Bugfix] fix moe marlin topk_weight loading ( #18080 )  
						
						... 
						
						
						
						Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-13 23:31:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6685890d11 
					 
					
						
						
							
							[Fix] Move "model_config" as keyword args in chat_utils.py ( #18098 )  
						
						... 
						
						
						
						Signed-off-by: Linkun <github@lkchen.net > 
						
						
					 
					
						2025-05-13 23:27:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						33011318c2 
					 
					
						
						
							
							Fix broken example: examples/offline_inference/profiling at scheduler_config  ( #18117 )  
						
						
						
						
					 
					
						2025-05-13 23:19:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4f8b373225 
					 
					
						
						
							
							[BugFix][AMD] Compatible patch for AITER lib after 04/20 ( #17912 )  
						
						... 
						
						
						
						Signed-off-by: Qiang Li <qiang.li2@amd.com > 
						
						
					 
					
						2025-05-13 23:05:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7b2f28deba 
					 
					
						
						
							
							[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm ( #18082 )  
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-05-13 22:13:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2d912fb66f 
					 
					
						
						
							
							[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 ( #17955 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-05-13 22:03:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						12e6c0b41c 
					 
					
						
						
							
							[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig ( #18086 )  
						
						
						
						
					 
					
						2025-05-13 20:36:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9a2a6357de 
					 
					
						
						
							
							[Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models ( #18026 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-13 19:48:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6266c57bae 
					 
					
						
						
							
							[core][distributed] add ep group and all2all interface ( #18077 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-05-14 10:46:49 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						754b699cbe 
					 
					
						
						
							
							[Bug]: Fix S3 model/tokenizer path resolution ( #18083 )  
						
						... 
						
						
						
						Signed-off-by: Jon Gill <jon@yurts.ai > 
						
						
					 
					
						2025-05-13 19:34:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e27c6d86b 
					 
					
						
						
							
							[Misc] Remove unused numpy tensor ( #18084 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-05-13 19:33:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d5af47a149 
					 
					
						
						
							
							[P/D] Add some more debug logs to NixlConnector ( #18102 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-13 19:33:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65f0f74b66 
					 
					
						
						
							
							[Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile ( #18101 )  
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com > 
						
						
					 
					
						2025-05-13 19:33:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						176a95c670 
					 
					
						
						
							
							[Fix] Support CUDAGraph capture for encoder-decoder on ROCm ( #18104 )  
						
						... 
						
						
						
						Signed-off-by: Luka Govedič <lgovedic@redhat.com > 
						
						
					 
					
						2025-05-13 19:31:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f2ae883b67 
					 
					
						
						
							
							[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager ( #18001 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-13 19:09:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						40de1ef455 
					 
					
						
						
							
							[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature ( #14968 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-05-13 19:08:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0189a65a2e 
					 
					
						
						
							
							[Docs] Expand security doc with firewall info ( #18081 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-13 19:36:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55aa7af994 
					 
					
						
						
							
							[V1] DP scale-out (2/N): Decouple engine process management and comms ( #15977 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-13 10:48:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0b217da646 
					 
					
						
						
							
							Update deprecated type hinting in vllm/adapter_commons ( #18073 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 08:32:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						19324d660c 
					 
					
						
						
							
							Update deprecated type hinting in vllm/compilation ( #18072 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 08:32:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc407a1425 
					 
					
						
						
							
							Give auto-merge label workflow permission to add labels to issues ( #18078 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 07:53:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						009d9e7590 
					 
					
						
						
							
							Convert benchmarks to ruff format ( #18068 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 13:43:29 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b922c2ebd2 
					 
					
						
						
							
							[Bugfix] Fix entrypoints metrics tests ( #18063 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-13 06:42:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						00b14e0f16 
					 
					
						
						
							
							[CI] set token permissions for pre-commit CI job ( #17729 )  
						
						... 
						
						
						
						Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-05-13 13:38:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						54e467e6f8 
					 
					
						
						
							
							[CI] Add token permissions for add-ready-label CI job ( #17730 )  
						
						... 
						
						
						
						Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-05-13 13:38:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						79a1d25bbd 
					 
					
						
						
							
							[CI] Add workflow permissions for helm CI job ( #17727 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-05-13 12:49:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9944011b30 
					 
					
						
						
							
							[CI] Set token permissions for reminder comment CI job ( #17728 )  
						
						... 
						
						
						
						Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-05-13 12:46:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8c946cecca 
					 
					
						
						
							
							Update deprecated type hinting in vllm/transformers_utils ( #18058 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 04:34:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ff334ca1cd 
					 
					
						
						
							
							Update deprecated type hinting in vllm/profiler ( #18057 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 04:34:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6223dd8114 
					 
					
						
						
							
							Update deprecated type hinting in model_executor/layers ( #18056 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 04:17:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						906f0598fc 
					 
					
						
						
							
							[doc] add download/list/delete HF model CLI usage ( #17940 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-13 11:15:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb528d0585 
					 
					
						
						
							
							[Fix] check to make sure processor has chat templates ( #18047 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-05-13 03:04:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						98fcba1575 
					 
					
						
						
							
							Convert .buildkite to ruff format ( #17656 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 09:28:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23b3134eb5 
					 
					
						
						
							
							[Benchmarks] Refactor run_structured_output_benchmarks.sh ( #17722 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-13 01:47:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ea6ae8cb45 
					 
					
						
						
							
							[Bugfix] Fix marlin moe fallback logic for llama4 ( #18042 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-13 07:53:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ff297dce9 
					 
					
						
						
							
							[BugFix] Set default random seed to 0 for V1 ( #17929 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-05-13 07:52:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8dd0671bac 
					 
					
						
						
							
							[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP ( #17916 )  
						
						... 
						
						
						
						Signed-off-by: Jin Huang <jinhun@amazon.com >
Co-authored-by: Jin Huang <jinhun@amazon.com > 
						
						
					 
					
						2025-05-13 15:10:07 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f0d610a8ae 
					 
					
						
						
							
							[v1][KVCacheManager] Avoid full cache hit by controlling max_length ( #17999 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-05-13 06:50:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e57e4d6e9e 
					 
					
						
						
							
							Fix Broken macro for cutlass moe ( #18049 )  
						
						... 
						
						
						
						Signed-off-by: drisspg <drisspguessous@gmail.com > 
						
						
					 
					
						2025-05-12 23:31:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ee5be834e7 
					 
					
						
						
							
							[BugFix] Fix 4-GPU RLHF tests ( #18007 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-12 23:03:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						48545728d8 
					 
					
						
						
							
							cleanup invalid prints ( #18050 )  
						
						... 
						
						
						
						Signed-off-by: calvin chen <120380290@qq.com > 
						
						
					 
					
						2025-05-12 23:01:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc1a821768 
					 
					
						
						
							
							[Feature][V1]  Support tool_choice: required when using Xgrammar as the StructuredOutputBackend. ( #17845 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-05-12 23:01:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61e0a506a3 
					 
					
						
						
							
							[Bugfix] Avoid repeatedly creating dummy data during engine startup ( #17935 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-12 22:40:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1df491c522 
					 
					
						
						
							
							[Bugfix] Fixes for new marlin moe usage ( #18017 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-13 03:50:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d8487ef557 
					 
					
						
						
							
							[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 ( #13779 )  
						
						... 
						
						
						
						Signed-off-by: Arjun Kathuria <arjun.kathuria8@gmail.com > 
						
						
					 
					
						2025-05-12 20:36:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c06af9a959 
					 
					
						
						
							
							[Misc] Slight spelling modification ( #18039 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-12 20:36:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						60f7624334 
					 
					
						
						
							
							Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support ( #11844 )  
						
						
						
						
					 
					
						2025-05-12 19:52:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f6518b2b48 
					 
					
						
						
							
							[ROCm] Skip tests for quantizations incompatible with ROCm ( #17905 )  
						
						... 
						
						
						
						Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com > 
						
						
					 
					
						2025-05-12 18:39:28 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d67085c2c8 
					 
					
						
						
							
							Remove noisy warnings from SchedulerConfig ( #17995 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-13 00:33:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						307939f299 
					 
					
						
						
							
							Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 ( #18000 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com >
Co-authored-by: Dipika <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-05-12 18:07:34 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9d7ea9dbbf 
					 
					
						
						
							
							Update some more deprecated type hinting ( #17998 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-12 23:49:33 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						acee8f48aa 
					 
					
						
						
							
							[Model] Support MiMo-7B inference with MTP ( #17433 )  
						
						... 
						
						
						
						Signed-off-by: wp-alpha <wangpeng66@xiaomi.com >
Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com > 
						
						
					 
					
						2025-05-12 23:25:33 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f065de4e88 
					 
					
						
						
							
							Fix FBGEMM integration ( #18002 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-12 23:02:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc9905368d 
					 
					
						
						
							
							[V1][Spec Decode] Eagle unit tests ( #17350 )  
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-05-12 23:01:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ebab1ac37c 
					 
					
						
						
							
							[CI] Make JSON output tests less likely to fail ( #17859 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-12 22:31:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2b0db9b0e2 
					 
					
						
						
							
							Enable standard language model for torhc nightly ( #18004 )  
						
						... 
						
						
						
						Signed-off-by: Yang Wang <elainewy@meta.com > 
						
						
					 
					
						2025-05-12 14:00:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						195adb47c0 
					 
					
						
						
							
							[Chore] Remove unused method ( #18024 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-05-12 13:59:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						302f3aca7e 
					 
					
						
						
							
							[v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens ( #18003 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-12 13:46:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e9c730c9bd 
					 
					
						
						
							
							Enabling "Weight Loading Multiple GPU Test - Large Models" ( #18020 )  
						
						
						
						
					 
					
						2025-05-12 13:05:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						289199feb6 
					 
					
						
						
							
							[Core] Use platform-agnostic device control for DP engine core ( #17245 )  
						
						... 
						
						
						
						Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com > 
						
						
					 
					
						2025-05-12 12:09:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b9fd0d7a69 
					 
					
						
						
							
							[CI/Build] Fix TPU V1 Test mixed use of & and && across tests ( #17968 )  
						
						
						
						
					 
					
						2025-05-12 12:06:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						72a3f6b898 
					 
					
						
						
							
							Construct KVTransferConfig properly from Python instead of using JSON blobs without CLI ( #17994 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-12 11:25:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						98ea35601c 
					 
					
						
						
							
							[Lora][Frontend]Add default local directory LoRA resolver plugin. ( #16855 )  
						
						... 
						
						
						
						Signed-off-by: jberkhahn <jaberkha@us.ibm.com > 
						
						
					 
					
						2025-05-12 10:39:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d19110204c 
					 
					
						
						
							
							[P/D] NIXL Integration ( #17751 )  
						
						... 
						
						
						
						Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: ApostaC <yihua98@uchicago.edu >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Brent Salisbury <bsalisbu@redhat.com > 
						
						
					 
					
						2025-05-12 09:46:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						05a4324f8e 
					 
					
						
						
							
							Initialize the delta tool call fields explicitly ( #17340 )  
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: igmainc <igmainc@icloud.com > 
						
						
					 
					
						2025-05-12 13:28:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7ea6cb28b2 
					 
					
						
						
							
							[Misc] Improve modelscope  import error  ( #17983 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-12 10:46:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9fbf2bfbd5 
					 
					
						
						
							
							Correcting testcases in builkite job for IBM Power ( #17675 )  
						
						... 
						
						
						
						Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com > 
						
						
					 
					
						2025-05-12 08:11:55 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a5ea75129 
					 
					
						
						
							
							[Feature] Support DeepSeekV3 Function Call ( #17784 )  
						
						... 
						
						
						
						Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
Signed-off-by: Xu Wenqing <xuwq1993@qq.com > 
						
						
					 
					
						2025-05-12 00:45:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						891b9d33de 
					 
					
						
						
							
							[Fix] Benchmark "EngineClient" has no attribute "model_config" ( #17976 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-05-11 22:55:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						430783018c 
					 
					
						
						
							
							[Bugfix][TPU] Use np array when updating cache slot_mapping ( #17971 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-05-12 12:58:33 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						19a3c78d1f 
					 
					
						
						
							
							[Bugfix] Fix pydantic.errors.PydanticUserError ( #17962 )  
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com > 
						
						
					 
					
						2025-05-12 12:58:23 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ada50aa295 
					 
					
						
						
							
							[bugfix] fix the wrong parser ( #17958 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-12 04:58:02 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						08bf784078 
					 
					
						
						
							
							[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails ( #17623 )  
						
						... 
						
						
						
						Signed-off-by: Jason Cheng <jasoncky96@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-12 09:06:10 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d45fe333fb 
					 
					
						
						
							
							[misc] add instructions on how to install nvshmem/pplx/deepep ( #17964 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-05-11 18:02:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						021c16c7ca 
					 
					
						
						
							
							[Model] Broadcast Ovis2 implementation to fit Ovis1.6 ( #17861 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-11 17:56:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7de18d541b 
					 
					
						
						
							
							[BUG] [ROCm] [MLA] Fix variable name bug due to change in variable name in PR  #17483  ( #17961 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-05-11 09:14:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a810b5b088 
					 
					
						
						
							
							[BugFix] [ROCm]: Bugfix and handle addition case of input for rocm_aiter_rms_norm ( #17857 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-05-11 04:17:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						009b3d5382 
					 
					
						
						
							
							[Misc] not show --model in vllm serve --help ( #16691 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-11 08:47:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e4b8713380 
					 
					
						
						
							
							[New Model]: nomic-embed-text-v2-moe ( #17785 )  
						
						
						
						
					 
					
						2025-05-11 00:59:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						06c0922a69 
					 
					
						
						
							
							[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 ( #17870 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-11 15:58:45 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cd3edfc908 
					 
					
						
						
							
							[Misc] Add compressed-tensors NVFP4A16 emulation support ( #17914 )  
						
						... 
						
						
						
						Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-05-11 15:58:38 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9cea90eab4 
					 
					
						
						
							
							[Frontend] Add /classify endpoint ( #17032 )  
						
						... 
						
						
						
						Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com > 
						
						
					 
					
						2025-05-11 07:57:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1110f5b5a 
					 
					
						
						
							
							[doc] update lora doc ( #17936 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-11 15:56:21 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8132365b74 
					 
					
						
						
							
							[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids ( #17855 )  
						
						... 
						
						
						
						Signed-off-by: Ben Browning <bbrownin@redhat.com > 
						
						
					 
					
						2025-05-11 00:53:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eea22a56ab 
					 
					
						
						
							
							fix amd triton mla path ( #17871 )  
						
						
						
						
					 
					
						2025-05-11 07:53:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9112155283 
					 
					
						
						
							
							[Perf] Use small max_num_batched_tokens for A100 ( #17885 )  
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu > 
						
						
					 
					
						2025-05-11 07:53:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90d0a74b60 
					 
					
						
						
							
							[Bugfix] Add revision to transformers.Auto*.from_pretrained processors ( #17948 )  
						
						... 
						
						
						
						Signed-off-by: Xin Li <xin@centml.ai > 
						
						
					 
					
						2025-05-11 07:52:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d74e5f37bc 
					 
					
						
						
							
							[Kernel] fp4 marlin kernel ( #17687 )  
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-05-10 19:58:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ca66a1674c 
					 
					
						
						
							
							[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py ( #17946 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-10 16:14:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						950751a987 
					 
					
						
						
							
							[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders ( #17483 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-10 16:12:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4c31218f80 
					 
					
						
						
							
							[Misc] remove --model from vllm serve usage ( #17944 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-10 13:23:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						68311891f5 
					 
					
						
						
							
							Don't default construct ModelConfig when default constructing VllmConfig ( #17943 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-10 13:23:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc4441a4ee 
					 
					
						
						
							
							Add missing content type headers to /ping and /health ( #17036 ) ( #17786 )  
						
						... 
						
						
						
						Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-10 07:13:32 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						246e3e0a36 
					 
					
						
						
							
							fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn ( #17873 )  
						
						... 
						
						
						
						Co-authored-by: Stephen Chen <tracelog@meta.com > 
						
						
					 
					
						2025-05-10 10:46:54 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7042cc96b0 
					 
					
						
						
							
							[V1][Spec Decoding] Log accumulated metrics after system goes idle ( #17913 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-05-09 18:23:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0c0fdae84f 
					 
					
						
						
							
							[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model ( #16362 )  
						
						
						
						
					 
					
						2025-05-09 16:24:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3b602cdea7 
					 
					
						
						
							
							AMD conditional all test execution // new test groups ( #17556 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu > 
						
						
					 
					
						2025-05-09 15:35:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4b2ed7926a 
					 
					
						
						
							
							Improve configs - the rest! ( #17562 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-09 15:18:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e3571134f 
					 
					
						
						
							
							[V1][Spec Decoding] Include bonus tokens in mean acceptance length ( #17908 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-05-09 13:32:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ea2236bf95 
					 
					
						
						
							
							Add option to use torch._inductor.standalone_compile ( #17057 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-05-09 12:59:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7d4aedae7c 
					 
					
						
						
							
							Handle error when str passed to /v1/audio/transcriptions ( #17909 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-09 19:23:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						22481fbfa3 
					 
					
						
						
							
							Update CT WNA16MarlinMoE integration ( #16666 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-09 13:19:45 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5c4c08f6f1 
					 
					
						
						
							
							[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config ( #17265 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-09 17:16:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c44c384b1c 
					 
					
						
						
							
							[Misc] Add references in ray_serve_deepseek example ( #17907 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-05-09 16:59:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						85b72cb7b1 
					 
					
						
						
							
							Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" ( #17910 )  
						
						
						
						
					 
					
						2025-05-09 08:58:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e5595ca39 
					 
					
						
						
							
							[CI/Build] Automatically retry flaky tests ( #17856 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-09 09:55:17 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						200da9a517 
					 
					
						
						
							
							[v1] Move block management logic from KVCacheManager to SpecializedManager ( #17474 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-09 15:25:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9f64e93415 
					 
					
						
						
							
							[BugFix][AMD] Compatible patch for latest AITER(05/07/2025) ( #17864 )  
						
						... 
						
						
						
						Signed-off-by: Qiang Li <qiang.li2@amd.com > 
						
						
					 
					
						2025-05-09 08:59:36 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec61ea20a8 
					 
					
						
						
							
							[Misc] add dify integration ( #17895 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-09 03:42:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c6798baa9c 
					 
					
						
						
							
							Change top_k to be disabled with 0 (still accept -1 for now) ( #17773 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-09 10:01:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b2dcbf0b8 
					 
					
						
						
							
							Fix Whisper crash caused by invalid`` max_num_batched_tokens`` config ( #17853 )  
						
						... 
						
						
						
						Signed-off-by: inkcherry <mingzhi.liu@intel.com > 
						
						
					 
					
						2025-05-09 09:16:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e4a93e3f7 
					 
					
						
						
							
							[Bugfix][CPU] Fix broken AVX2 CPU TP support ( #17252 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-09 08:55:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						217db4baa6 
					 
					
						
						
							
							[Bugfix][ROCm] Fix AITER MLA V1 ( #17880 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-05-09 08:38:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ff8c400502 
					 
					
						
						
							
							[Doc] remove visible token in doc ( #17884 )  
						
						... 
						
						
						
						Signed-off-by: yan <yanma1@habana.ai > 
						
						
					 
					
						2025-05-09 01:21:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						89a0315f4c 
					 
					
						
						
							
							[Doc] Update several links in reasoning_outputs.md ( #17846 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-05-09 01:20:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d1e387652 
					 
					
						
						
							
							[Docs] Add Slides from NYC Meetup ( #17879 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-05-08 21:46:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d310e6de98 
					 
					
						
						
							
							[BUGFIX]: return fast when request requires prompt logprobs ( #17251 )  
						
						
						
						
					 
					
						2025-05-08 21:25:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5e6f939484 
					 
					
						
						
							
							[Attention] MLA move rotary embedding to cuda-graph region ( #17668 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-09 11:14:42 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						760e3ecc8f 
					 
					
						
						
							
							[V1][Structured Output] Update llguidance (>= 0.7.11) to avoid AttributeError (no StructTag)  ( #17839 )  
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-05-08 20:14:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c9396a64f 
					 
					
						
						
							
							[FEAT][ROCm]: Support AITER MLA on V1 Engine ( #17523 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com > 
						
						
					 
					
						2025-05-09 10:42:05 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						376786fac1 
					 
					
						
						
							
							Add cutlass support for blackwell fp8 blockwise gemm ( #14383 )  
						
						... 
						
						
						
						Signed-off-by: Shu Wang <shuw@nvidia.com > 
						
						
					 
					
						2025-05-08 15:09:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4f605a6de5 
					 
					
						
						
							
							Fix noisy warning for uncalibrated q_scale/p_scale ( #17414 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-08 15:56:59 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8342e3abd1 
					 
					
						
						
							
							[CI] Prune down lm-eval small tests ( #17012 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-08 19:00:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a83a0f92b5 
					 
					
						
						
							
							[Test] Attempt all TPU V1 tests, even if some of them fail. ( #17334 )  
						
						... 
						
						
						
						Signed-off-by: Yarong Mu <ymu@google.com > 
						
						
					 
					
						2025-05-08 17:20:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						226a4272cf 
					 
					
						
						
							
							[V1] Improve VLLM_ALLOW_INSECURE_SERIALIZATION logging ( #17860 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-08 16:57:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec54d73c31 
					 
					
						
						
							
							[CI] Fix test_collective_rpc ( #17858 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-08 16:47:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a944f8ede7 
					 
					
						
						
							
							[Misc] Delete LoRA-related redundancy code ( #17841 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-08 06:02:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						015815fe01 
					 
					
						
						
							
							[Bugfix] use_fast failing to be propagated to Qwen2-VL image processor ( #17838 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-08 05:39:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e4ca6e3a99 
					 
					
						
						
							
							Fix transient dependency error in docs build ( #17848 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-08 03:42:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						53d0cb7423 
					 
					
						
						
							
							[Misc] add chatbox integration ( #17828 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-08 10:05:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f50dcb7c21 
					 
					
						
						
							
							[Easy] Eliminate c10::optional usage in vllm/csrc ( #17819 )  
						
						
						
						
					 
					
						2025-05-08 03:05:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a1e19b635d 
					 
					
						
						
							
							[Doc] Fix a typo in the file name ( #17836 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-08 18:04:18 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bb239a730f 
					 
					
						
						
							
							[Bugfix] Fix quark fp8 format loading on AMD GPUs ( #12612 )  
						
						... 
						
						
						
						Signed-off-by: Felix Marty <felmarty@amd.com >
Signed-off-by: kewang2 <kewang2@amd.com >
Co-authored-by: kewang2 <kewang2@amd.com > 
						
						
					 
					
						2025-05-08 02:53:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a463555dee 
					 
					
						
						
							
							[TPU] Fix the test_sampler ( #17820 )  
						
						
						
						
					 
					
						2025-05-08 05:51:33 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ca04b97c93 
					 
					
						
						
							
							[Bugfix] Fix tool call template validation for Mistral models ( #17644 )  
						
						... 
						
						
						
						Signed-off-by: Rick Yuan <yuan821120@gmail.com >
Signed-off-by: RIck Yuan <yuan821120@gmail.com >
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com > 
						
						
					 
					
						2025-05-08 09:47:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0a9bbaa104 
					 
					
						
						
							
							[Misc] support model prefix & add deepseek vl2 tiny fused moe config ( #17763 )  
						
						... 
						
						
						
						Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com >
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com > 
						
						
					 
					
						2025-05-08 07:50:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						39956efb3f 
					 
					
						
						
							
							[Bugfix] Fix bad words for Mistral models ( #17753 )  
						
						... 
						
						
						
						Signed-off-by: Qiong Zhou Huang <qiong@phonic.co > 
						
						
					 
					
						2025-05-07 23:32:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						597051e56f 
					 
					
						
						
							
							[Qwen3]add qwen3-235b-bf16 fused moe config on A100 ( #17715 )  
						
						
						
						
					 
					
						2025-05-07 23:09:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						96722aa81d 
					 
					
						
						
							
							[Frontend] Chat template fallbacks for multimodal models ( #17805 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-07 23:05:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						843b222723 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU ( #17648 )  
						
						... 
						
						
						
						Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai > 
						
						
					 
					
						2025-05-07 22:37:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e515668edf 
					 
					
						
						
							
							[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER ( #17153 )  
						
						... 
						
						
						
						Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-07 22:35:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5a499e70d5 
					 
					
						
						
							
							[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs ( #17071 )  
						
						... 
						
						
						
						Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-05-07 22:34:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6930a41116 
					 
					
						
						
							
							[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var ( #17490 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-08 13:34:02 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						998eea4a0e 
					 
					
						
						
							
							Only log non-default CLI args for online serving ( #17803 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-07 22:33:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c747d84576 
					 
					
						
						
							
							[Installation] OpenTelemetry version update ( #17771 )  
						
						... 
						
						
						
						Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com > 
						
						
					 
					
						2025-05-07 22:32:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b2da14a05a 
					 
					
						
						
							
							Improve exception reporting in MP engine ( #17800 )  
						
						... 
						
						
						
						Signed-off-by: Vadim Markovtsev <vadim@poolside.ai > 
						
						
					 
					
						2025-05-08 05:32:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7ea2adb802 
					 
					
						
						
							
							[Core] Support full cuda graph in v1 ( #16072 )  
						
						... 
						
						
						
						Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com >
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com > 
						
						
					 
					
						2025-05-07 22:30:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d13ca0e24 
					 
					
						
						
							
							[BugFix] Fix --disable-log-stats in V1 server mode ( #17600 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-08 04:08:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						66ab3b13c9 
					 
					
						
						
							
							Don't call the venv vllm ( #17810 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-08 04:06:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a8238bbdb0 
					 
					
						
						
							
							[Chore][Doc] uses model id determined from OpenAI client ( #17815 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-05-08 01:48:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d43f914d42 
					 
					
						
						
							
							[Core][Feature] Input metadata dump on crash ( #13407 )  
						
						... 
						
						
						
						Signed-off-by: Wallas Santos <wallashss@ibm.com > 
						
						
					 
					
						2025-05-07 22:15:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed5272cf21 
					 
					
						
						
							
							[BugFix] Avoid secondary missing MultiprocExecutor.workers error ( #17811 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-07 21:55:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c20ef40fd0 
					 
					
						
						
							
							[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend ( #14238 )  
						
						... 
						
						
						
						Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-05-07 16:28:47 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						db593aa67f 
					 
					
						
						
							
							[Quantization] Quark MXFP4 format loading  ( #16943 )  
						
						
						
						
					 
					
						2025-05-07 15:05:05 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f98e307588 
					 
					
						
						
							
							[Bugfix] Fix missing lora name mapping for lora without prefix ( #17793 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-07 16:17:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						646a31e51e 
					 
					
						
						
							
							Fix and simplify deprecated=True CLI kwarg ( #17781 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-07 16:51:06 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						be8ff88e66 
					 
					
						
						
							
							[Bugfix] Fix Video IO error for short video ( #17791 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-07 15:36:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1a6af1453d 
					 
					
						
						
							
							Only depend on importlib-metadata for Python < 3.10 ( #17776 )  
						
						... 
						
						
						
						Signed-off-by: Christian Heimes <christian@python.org > 
						
						
					 
					
						2025-05-07 07:51:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32aa74c09c 
					 
					
						
						
							
							[ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention ( #17139 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-07 07:12:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7377dd0307 
					 
					
						
						
							
							[doc] update the issue link ( #17782 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-07 20:29:05 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						98c89e16ff 
					 
					
						
						
							
							Make key optional for rotary embedding ( #17566 )  
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-05-07 00:11:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						324a3119b0 
					 
					
						
						
							
							Fix test_memory_usage_no_spec ( #17754 )  
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-05-07 00:10:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8a15c2603a 
					 
					
						
						
							
							[Frontend] Add missing chat templates for various MLLMs ( #17758 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-07 00:10:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						043e4c4955 
					 
					
						
						
							
							Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling ( #16357 )  
						
						... 
						
						
						
						Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
Co-authored-by: Aaron Dou <yzdou@amazon.com >
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com >
Co-authored-by: Chongming Ni <chongmni@amazon.com >
Co-authored-by: Amulya Ballakur <amulyaab@amazon.com >
Co-authored-by: Patrick Lange <patlange@amazon.com >
Co-authored-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Lin Lin Pan <tailinpa@amazon.com >
Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com >
Co-authored-by: Yishan McNabb <yishanm@amazon.com >
Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com > 
						
						
					 
					
						2025-05-07 00:07:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ba7703e659 
					 
					
						
						
							
							[Misc] Remove  qlora_adapter_name_or_path ( #17699 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-06 23:10:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f80ae5bdcf 
					 
					
						
						
							
							[Kernel] Use fused rmsnorm for some models like qwen3 series ( #17735 )  
						
						... 
						
						
						
						Signed-off-by: evian <eviantai@u.nus.edu >
Co-authored-by: evian <eviantai@u.nus.edu > 
						
						
					 
					
						2025-05-06 23:10:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1a45a61387 
					 
					
						
						
							
							[Kernel] GGUF MoeVec kernel ( #16780 )  
						
						... 
						
						
						
						Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com >
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-06 23:07:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c3e9d5060e 
					 
					
						
						
							
							[Misc] Use apply_rotary_emb from vllm_flash_attn for Qwen2-VL vision RoPE ( #17726 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-07 04:51:33 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						822de7fb94 
					 
					
						
						
							
							[Misc] Split model loader ( #17712 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-05-07 12:42:26 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d84d836d1 
					 
					
						
						
							
							[BugFix][Spec Decode] Fix hidden size mismatch between target and eagle head ( #17740 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-05-06 19:51:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						950b71186f 
					 
					
						
						
							
							Replace lm-eval bash script with pytest and use enforce_eager for faster CI ( #17717 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-06 18:00:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e50a1f1a9c 
					 
					
						
						
							
							[TPU] Add kernel test for moe_pallas ( #17496 )  
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-06 17:59:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a17cef70ea 
					 
					
						
						
							
							Removed unused marlin cuda code ( #17684 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-06 17:59:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						18dd5e01f2 
					 
					
						
						
							
							[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels ( #17146 )  
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com > 
						
						
					 
					
						2025-05-06 17:59:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6de3e13413 
					 
					
						
						
							
							Add logging for torch nightly version ( #17669 )  
						
						... 
						
						
						
						Signed-off-by: Yang Wang <elainewy@meta.com > 
						
						
					 
					
						2025-05-07 00:45:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed3a1d2106 
					 
					
						
						
							
							[ROCm] fix num_stages for default moe config to avoid triton OutOfResource error ( #17744 )  
						
						... 
						
						
						
						Signed-off-by: Hongxia Yang <hongxia.yang@amd.com > 
						
						
					 
					
						2025-05-07 00:39:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						022afbeb4e 
					 
					
						
						
							
							Fix doc build performance ( #17748 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-07 00:36:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2f925e5777 
					 
					
						
						
							
							[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode ( #16828 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-06 18:21:48 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						de906b95f9 
					 
					
						
						
							
							[Bugfix] Fix for the condition to accept empty encoder inputs for mllama ( #17732 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-06 19:59:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d456aea71f 
					 
					
						
						
							
							[Misc] Add Next Edit Prediction (NEP) datasets support in benchmark_serving.py ( #16839 )  
						
						... 
						
						
						
						Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b .c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b .c.jetbrains-grazie.internal> 
						
						
					 
					
						2025-05-06 15:38:45 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						621ca2c0ab 
					 
					
						
						
							
							[TPU] Increase block size and reset block shapes ( #16458 )  
						
						
						
						
					 
					
						2025-05-06 13:55:04 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6115b11582 
					 
					
						
						
							
							Make right sidebar more readable in "Supported Models" ( #17723 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-06 16:48:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b8c390747 
					 
					
						
						
							
							[Bugfix] Fix modality limits in vision language example ( #17721 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-06 16:12:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7525d5f3d5 
					 
					
						
						
							
							[doc] Add RAG Integration example ( #17692 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-06 16:10:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aabcd2cae3 
					 
					
						
						
							
							[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager ( #17479 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-06 08:50:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d115460a7 
					 
					
						
						
							
							[Docs] Use gh-file to add links to tool_calling.md ( #17709 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-05-06 15:27:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						175bda67a1 
					 
					
						
						
							
							[Feat] Add deprecated=True to CLI args ( #17426 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-05-06 08:11:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cba31c47c4 
					 
					
						
						
							
							[v1] AttentionMetadata for each layer ( #17394 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-05-06 07:58:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6fed02068 
					 
					
						
						
							
							[V1][PP] Support PP for MultiprocExecutor ( #14219 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: jiang.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-05-06 07:58:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d419aa5dc4 
					 
					
						
						
							
							[V1] Enable TPU V1 backend by default ( #17673 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-06 06:49:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f9bc5a0693 
					 
					
						
						
							
							[Bugfix] Fix triton import with local TritonPlaceholder ( #17446 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-05-06 17:53:09 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						05e1f96419 
					 
					
						
						
							
							Fix dockerfilegraph pre-commit hook ( #17698 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-06 08:56:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6eae34533a 
					 
					
						
						
							
							[Misc] Fix ScalarType float4 naming  ( #17690 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-06 01:07:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						63ced7b43f 
					 
					
						
						
							
							[Doc] Update notes for H2O-VL and Gemma3 ( #17219 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-06 07:51:02 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc47ba32f8 
					 
					
						
						
							
							[Bugfix] Fixed prompt length for random dataset ( #17408 )  
						
						... 
						
						
						
						Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com > 
						
						
					 
					
						2025-05-06 07:00:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						edbf2d609e 
					 
					
						
						
							
							[easy] Fix logspam on PiecewiseBackend errors ( #17138 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-05-05 23:46:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						999328be0d 
					 
					
						
						
							
							[Model] Add GraniteMoeHybrid 4.0 model ( #17497 )  
						
						... 
						
						
						
						Signed-off-by: Thomas Ortner <boh@zurich.ibm.com >
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com >
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-05-06 12:00:31 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						98834fefaa 
					 
					
						
						
							
							Update nm to rht in doc links + refine fp8 doc ( #17678 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-06 00:41:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90bd2ae172 
					 
					
						
						
							
							[Bugfix] LoRA - Retire unused maxnreg LoRA kernel argument ( #17677 )  
						
						
						
						
					 
					
						2025-05-05 17:34:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5941e0b7ea 
					 
					
						
						
							
							[TPU][V1] Add support for top-logprobs ( #17072 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-05-05 14:20:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9765940824 
					 
					
						
						
							
							[TPU] Enable gemma3-27b with TP>1 on multi-chips. ( #17335 )  
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-05-05 14:19:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5ea5c514da 
					 
					
						
						
							
							[BugFix] Increase timeout for startup failure test ( #17642 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-05-05 20:53:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d3efde8176 
					 
					
						
						
							
							[Benchmarks] Remove invalid option under V1 engine ( #17651 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-05 16:30:22 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aea302be6c 
					 
					
						
						
							
							Use git-path commit in hook ( #17616 )  
						
						... 
						
						
						
						Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com > 
						
						
					 
					
						2025-05-05 17:55:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cc05b90d86 
					 
					
						
						
							
							[Doc] Fix broken cuda installation doc rendering ( #17654 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-05 17:52:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1d0c9d6b2d 
					 
					
						
						
							
							[Kernel] some optimizations for dense marlin and moe marlin ( #16850 )  
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-05-05 09:39:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f62cad6431 
					 
					
						
						
							
							[Build/CI] Upgrade CUTLASS to 3.9.2 ( #17641 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-05-04 19:23:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5394ad7387 
					 
					
						
						
							
							[Bugfix] fix KeyError on top logprobs are special tokens ( #17637 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-05-04 19:22:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						68e1ee0072 
					 
					
						
						
							
							[Bugfix][Easy] Fix whitespace in shm_broadcast.py logging ( #17635 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-05-04 19:20:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2858830c39 
					 
					
						
						
							
							[Bugfix] Prioritize dtype in root config before checking text config ( #17629 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-04 12:43:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6484ef3c3 
					 
					
						
						
							
							Add full API docs and improve the UX of navigating them ( #17485 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-03 19:42:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						46fae69cf0 
					 
					
						
						
							
							[Misc] V0 fallback for --enable-prompt-embeds ( #17615 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-03 22:59:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f66f1e0fa3 
					 
					
						
						
							
							[Bugfix] Fix broken Qwen2.5-omni tests ( #17613 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-03 17:08:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						887d7af882 
					 
					
						
						
							
							[Core] Gate prompt_embeds behind a feature flag ( #17607 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-04 00:19:20 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a92842454c 
					 
					
						
						
							
							[Bugfix][ROCm] Using device_type because on ROCm the API is still torch.cuda ( #17601 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-02 22:25:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c8386fa61d 
					 
					
						
						
							
							[Build/CI] Upgrade CUTLASS to 3.9.1 ( #17602 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-05-02 22:25:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						87baebebd8 
					 
					
						
						
							
							[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name  ( #17508 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-05-02 21:42:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e3d0a1d190 
					 
					
						
						
							
							[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm ( #17558 )  
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-05-02 21:41:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d47b605eca 
					 
					
						
						
							
							Update test requirements to CUDA 12.8 ( #17576 )  
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-05-02 21:40:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						22c6f6397f 
					 
					
						
						
							
							[Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 ( #17603 )  
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com > 
						
						
					 
					
						2025-05-03 02:41:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3ec97e2cc5 
					 
					
						
						
							
							[release] Add command to clean up Docker containers/images in TPU release machine ( #17606 )  
						
						
						
						
					 
					
						2025-05-02 18:54:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9b103a1d76 
					 
					
						
						
							
							fix typo in logging ( #17605 )  
						
						
						
						
					 
					
						2025-05-02 18:04:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b90b0852e9 
					 
					
						
						
							
							[easy] Print number of needed GPUs in skip message ( #17594 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-05-02 15:27:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9352cdb56d 
					 
					
						
						
							
							[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning ( #16263 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-05-02 19:44:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						182f40ea8b 
					 
					
						
						
							
							Add NVIDIA TensorRT Model Optimizer in vLLM documentation ( #17561 )  
						
						
						
						
					 
					
						2025-05-02 11:36:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e887d2e0c 
					 
					
						
						
							
							permute/unpermute kernel for moe optimization ( #14568 )  
						
						... 
						
						
						
						Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn > 
						
						
					 
					
						2025-05-02 11:31:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0f87d8f7b2 
					 
					
						
						
							
							[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results ( #17574 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-02 11:01:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4c33d67321 
					 
					
						
						
							
							[Bugfix] fix tmp_out and exp_sums dimensions ( #17438 )  
						
						... 
						
						
						
						Signed-off-by: Hui Liu <96135754+hliuca@users.noreply.github.com > 
						
						
					 
					
						2025-05-02 16:44:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb234955df 
					 
					
						
						
							
							[Misc] Clean up input processing ( #17582 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-02 08:11:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a500cd0b6 
					 
					
						
						
							
							[doc] miss result ( #17589 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-02 07:04:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						868c546da4 
					 
					
						
						
							
							Support W8A8 INT8 MoE for compressed-tensors ( #16745 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-02 10:03:32 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						99404f53c7 
					 
					
						
						
							
							[Security] Fix image hash collision ( #17378 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-02 08:36:39 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						785d75a03b 
					 
					
						
						
							
							Automatically tell users that dict args must be valid JSON in CLI ( #17577 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-02 05:24:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d1479ca4b 
					 
					
						
						
							
							[doc] add the print result ( #17584 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-02 05:24:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b8b0859b5c 
					 
					
						
						
							
							add more pytorch related tests for torch nightly ( #17422 )  
						
						... 
						
						
						
						Signed-off-by: Yang Wang <elainewy@meta.com > 
						
						
					 
					
						2025-05-02 03:29:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d7543862bd 
					 
					
						
						
							
							[Misc] Rename assets for testing ( #17575 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-02 03:29:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c777df79f7 
					 
					
						
						
							
							[BugFix] Fix Memory Leak ( #17567 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-05-02 01:07:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cc2a77d7f1 
					 
					
						
						
							
							[Core] [Bugfix] Add Input Embeddings ( #15428 )  
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com >
Co-authored-by: Bryce1010 <bryceyx@gmail.com >
Co-authored-by: Nan2018 <nan@protopia.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-02 01:06:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e2de9b9e9 
					 
					
						
						
							
							[Bugifx] Remove TritonPlaceholder from sys.modules ( #17317 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-02 00:45:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						109e15a335 
					 
					
						
						
							
							Add pt_load_map_location to allow loading to cuda ( #16869 )  
						
						... 
						
						
						
						Signed-off-by: Jerry Zhang <jerryzh168@gmail.com > 
						
						
					 
					
						2025-05-01 23:23:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f192ca90e6 
					 
					
						
						
							
							Fix PixtralHF missing spatial_merge_size ( #17571 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-01 22:14:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f89d0e11bf 
					 
					
						
						
							
							[Misc] Continue refactoring model tests ( #17573 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-01 22:06:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b4003d11fc 
					 
					
						
						
							
							Check if bitblas is installed during support check ( #17572 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-02 04:32:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						292fc59d61 
					 
					
						
						
							
							[CI] Actually run tests/kv_transfer/test_disagg.py in CI ( #17555 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-02 04:05:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						afcb3f8863 
					 
					
						
						
							
							[Attention] MLA move o_proj q_proj into cuda-graph region ( #17484 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-02 03:16:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						afb12e4294 
					 
					
						
						
							
							[Doc] note that not all unit tests pass on CPU platforms ( #17554 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-05-02 02:57:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						24aebae177 
					 
					
						
						
							
							[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 ( #17541 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-05-01 17:59:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						39c0813a7f 
					 
					
						
						
							
							[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 ( #17504 )  
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com > 
						
						
					 
					
						2025-05-01 16:19:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9b70e2b4c1 
					 
					
						
						
							
							[Misc][Tools][Benchmark] Publish script to auto tune server parameters ( #17207 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-05-01 19:53:03 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						173daac19d 
					 
					
						
						
							
							[Bug]change the position of cuda_graph_sizes in dataclasses ( #17548 )  
						
						... 
						
						
						
						Signed-off-by: CXIAAAAA <cxia0209@gmail.com > 
						
						
					 
					
						2025-05-01 11:52:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						04f2cfc894 
					 
					
						
						
							
							Remove duplicate code from dbrx.py ( #17550 )  
						
						
						
						
					 
					
						2025-05-01 11:51:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						811a6c0972 
					 
					
						
						
							
							[ROCM] Add gfx950 to the custom attention archs ( #16034 )  
						
						... 
						
						
						
						Signed-off-by: jpvillam <Juan.Villamizar@amd.com >
Signed-off-by: seungrokjung <seungrok.jung@amd.com >
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: seungrokjung <seungrok.jung@amd.com >
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-05-01 11:18:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9b1769dd9a 
					 
					
						
						
							
							[Bugfix] Fix lint error ( #17547 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-01 11:12:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61c299f81f 
					 
					
						
						
							
							[Misc]add configurable cuda graph size ( #17201 )  
						
						... 
						
						
						
						Signed-off-by: CXIAAAAA <cxia0209@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-01 11:04:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4acfa3354a 
					 
					
						
						
							
							[ROCm] update installation guide to include build aiter from source instructions ( #17542 )  
						
						... 
						
						
						
						Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-05-01 11:01:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						88c8304104 
					 
					
						
						
							
							[Model] Refactor Ovis2 to support original tokenizer ( #17537 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-05-01 11:00:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6768ff4a22 
					 
					
						
						
							
							Move the last arguments in arg_utils.py to be in their final groups ( #17531 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-01 10:31:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f2e7af9b86 
					 
					
						
						
							
							[CI/Build] Remove awscli dependency ( #17532 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-01 09:20:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7423cf0a9b 
					 
					
						
						
							
							[Misc] refactor example - cpu_offload_lmcache ( #17460 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-01 15:05:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						460a2b1100 
					 
					
						
						
							
							[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations ( #10867 )  
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-05-01 07:59:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						28566d73b3 
					 
					
						
						
							
							[ROCm] remove unsupported archs from rocm triton flash-attention supported list ( #17536 )  
						
						... 
						
						
						
						Signed-off-by: Hongxia Yang <hongxia.yang@amd.com > 
						
						
					 
					
						2025-05-01 07:54:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						98060b001d 
					 
					
						
						
							
							[Feature][Frontend]: Deprecate --enable-reasoning ( #17452 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-05-01 06:46:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f5a3c655b2 
					 
					
						
						
							
							[FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config ( #17535 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-05-01 06:37:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7169f87ad0 
					 
					
						
						
							
							[doc] add streamlit integration ( #17522 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-05-01 13:34:02 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b74d888c63 
					 
					
						
						
							
							Fix more broken speculative decode tests ( #17450 )  
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-05-01 06:05:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2007d4d54f 
					 
					
						
						
							
							[FEAT] [ROCm]: Add Qwen/Qwen3-30B-A3B-FP8 fused moe config for MI300X ( #17530 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-05-01 06:03:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						48e925fab5 
					 
					
						
						
							
							[Misc] Clean up test docstrings and names ( #17521 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-01 05:19:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1903c0b8a3 
					 
					
						
						
							
							[Frontend] Show progress bar for adding requests ( #17525 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-05-01 05:15:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						86a1f67a3b 
					 
					
						
						
							
							[Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model ( #17285 )  
						
						... 
						
						
						
						Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com > 
						
						
					 
					
						2025-05-01 11:54:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a257d9bccc 
					 
					
						
						
							
							Improve configs - ObservabilityConfig ( #17453 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-05-01 03:52:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						015069b017 
					 
					
						
						
							
							[Misc] Optimize the Qwen3_ReasoningParser extract_reasoning_content ( #17515 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-05-01 03:29:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fbefc8a78d 
					 
					
						
						
							
							[Core] Enable IPv6 with vllm.utils.make_zmq_socket() ( #16506 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-05-01 09:38:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						26bc4bbcd8 
					 
					
						
						
							
							Avoid overwriting vllm_compile_cache.py ( #17418 )  
						
						... 
						
						
						
						Signed-off-by: Keyun Tong <tongkeyun@gmail.com > 
						
						
					 
					
						2025-05-01 07:30:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c3d767201 
					 
					
						
						
							
							[BugFix] Fix mla cpu - missing 3 required positional arguments ( #17494 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-05-01 14:36:52 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						13cf6b6236 
					 
					
						
						
							
							[BugFix] fix speculative decoding memory leak when speculation is disabled ( #15506 )  
						
						... 
						
						
						
						Signed-off-by: Noah Yoshida <noahcy117@gmail.com > 
						
						
					 
					
						2025-04-30 23:28:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90d0a54c4d 
					 
					
						
						
							
							[ROCm] Effort to reduce the number of environment variables in command line ( #17229 )  
						
						... 
						
						
						
						Signed-off-by: Hongxia Yang <hongxia.yang@amd.com > 
						
						
					 
					
						2025-04-30 23:27:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7a0a146c54 
					 
					
						
						
							
							[Build] Require setuptools >= 77.0.3 for PEP 639 ( #17389 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-30 23:25:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7ab643e425 
					 
					
						
						
							
							FIxing the AMD test failures caused by PR#16457 ( #17511 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-04-30 23:23:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						afb4429b4f 
					 
					
						
						
							
							[CI/Build] Reorganize models tests ( #17459 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-30 23:03:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa4502e7f3 
					 
					
						
						
							
							[CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg ( #17500 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-30 21:03:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						17b4d85f63 
					 
					
						
						
							
							[CI][TPU] Skip structured outputs+spec decode tests on TPU ( #17510 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-30 20:36:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1144a8efe7 
					 
					
						
						
							
							[Bugfix] Temporarily disable gptq_bitblas on ROCm ( #17411 )  
						
						... 
						
						
						
						Signed-off-by: Yan Cangang <nalanzeyu@gmail.com > 
						
						
					 
					
						2025-04-30 19:51:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						08fb5587b4 
					 
					
						
						
							
							[Bugfix][ROCm] Fix import error on ROCm ( #17495 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-30 19:51:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dbc18e7816 
					 
					
						
						
							
							[CI][TPU] Skip Multimodal test ( #17488 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-04-30 19:51:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						02bd654846 
					 
					
						
						
							
							[Misc] Rename Audios -> Audio in Qwen2audio Processing ( #17507 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-04-30 19:51:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						200bbf92e8 
					 
					
						
						
							
							Bump Compressed Tensors version to 0.9.4 ( #17478 )  
						
						... 
						
						
						
						Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-30 15:24:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						81ecf425f0 
					 
					
						
						
							
							[v1][Spec Decode] Make sliding window compatible with eagle prefix caching ( #17398 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-04-30 18:25:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						42d9a2c4c7 
					 
					
						
						
							
							doc: fix bug report Github template formatting ( #17486 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-04-30 10:03:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ac74d098e 
					 
					
						
						
							
							[doc] add install tips ( #17373 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-30 17:02:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						584f5fb4c6 
					 
					
						
						
							
							[Bugfix][ROCm] Restrict ray version due to a breaking release ( #17480 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-30 09:59:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d586ddc691 
					 
					
						
						
							
							[BugFix] Fix authorization of openai_transcription_client.py ( #17321 )  
						
						... 
						
						
						
						Signed-off-by: zh Wang <rekind133@outlook.com > 
						
						
					 
					
						2025-04-30 09:51:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0b7e701dd4 
					 
					
						
						
							
							[Docs] Update optimization.md doc ( #17482 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-30 09:34:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						947f2f5375 
					 
					
						
						
							
							[V1] Allow turning off pickle fallback in vllm.v1.serial_utils ( #17427 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-30 16:10:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						739e03b344 
					 
					
						
						
							
							[Bugfix] Fixed mistral tokenizer path when pointing to file ( #17457 )  
						
						... 
						
						
						
						Signed-off-by: Pete Savage <psavage@redhat.com > 
						
						
					 
					
						2025-04-30 08:08:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						da4e7687b5 
					 
					
						
						
							
							[Fix] Support passing args to logger ( #17425 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-04-30 08:06:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						39317cf42b 
					 
					
						
						
							
							[Docs] Add command for running mypy tests from CI ( #17475 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-30 08:06:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2990cee95b 
					 
					
						
						
							
							[Feature] The Qwen3 reasoning parser supports  guided decoding ( #17466 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-30 07:48:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0be6d05b5e 
					 
					
						
						
							
							[V1][Metrics] add support for kv event publishing ( #16750 )  
						
						... 
						
						
						
						Signed-off-by: alec-flowers <aflowers@nvidia.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-04-30 07:44:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						77073c77bc 
					 
					
						
						
							
							[Core] Prevent side-channel attacks via cache salting ( #17045 )  
						
						... 
						
						
						
						Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com > 
						
						
					 
					
						2025-04-30 20:27:21 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a7d5b016bd 
					 
					
						
						
							
							[TPU][V1][CI] Update regression test baseline for v6 CI ( #17064 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-30 04:03:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d803786731 
					 
					
						
						
							
							[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None ( #15755 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-30 18:20:39 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1534d389af 
					 
					
						
						
							
							[Misc] Remove deprecated files ( #17447 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-30 01:52:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ece5a8b0b6 
					 
					
						
						
							
							Make the _apply_rotary_emb compatible with dynamo ( #17435 )  
						
						
						
						
					 
					
						2025-04-30 07:52:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						54072f315f 
					 
					
						
						
							
							[MODEL ADDITION] Ovis2 Model Addition ( #15826 )  
						
						... 
						
						
						
						Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-04-30 07:33:29 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						be633fba0f 
					 
					
						
						
							
							[Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' ( #17434 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-30 00:11:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed6cfb90c8 
					 
					
						
						
							
							[Hardware][Intel GPU] Upgrade to torch 2.7 ( #17444 )  
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com > 
						
						
					 
					
						2025-04-30 00:03:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6ed9f6047e 
					 
					
						
						
							
							[Intel GPU] [CI]Fix XPU ci, setuptools >=80.0 have build issue ( #17298 )  
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-04-29 22:54:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a44c4f1d2f 
					 
					
						
						
							
							Support LoRA for Mistral3 ( #17428 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-29 21:10:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						88fcf00dda 
					 
					
						
						
							
							Fix some speculative decode tests with tl.dot ( #17371 )  
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-04-29 19:41:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1f569b1b9 
					 
					
						
						
							
							Fix call to logger.info_once ( #17416 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-29 19:39:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						13698db634 
					 
					
						
						
							
							Improve configs - ModelConfig ( #17130 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-30 10:38:22 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c4f59afc3 
					 
					
						
						
							
							Update PyTorch to 2.7.0 ( #16859 )  
						
						
						
						
					 
					
						2025-04-29 19:08:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1c2bc7ead0 
					 
					
						
						
							
							Truncation control for embedding models ( #14776 )  
						
						... 
						
						
						
						Signed-off-by: Gabriel Marinho <gmarinho@ibm.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-04-30 09:24:57 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4055130a85 
					 
					
						
						
							
							[release] Always git fetch all to get latest tag on TPU release ( #17322 )  
						
						
						
						
					 
					
						2025-04-29 17:52:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						34120f5acd 
					 
					
						
						
							
							[V1][Feature] Enable Speculative Decoding with Structured Outputs ( #14702 )  
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com > 
						
						
					 
					
						2025-04-30 00:02:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7489ec0bab 
					 
					
						
						
							
							Remove Bamba 9B from CI ( #17407 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-29 21:10:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70788bdbdc 
					 
					
						
						
							
							[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE ( #17211 )  
						
						... 
						
						
						
						Signed-off-by: Bryan Lu <yuzhelu@amazon.com > 
						
						
					 
					
						2025-04-29 21:10:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c9c1b59e59 
					 
					
						
						
							
							Fix: Python package installation for opentelmetry ( #17049 )  
						
						... 
						
						
						
						Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com > 
						
						
					 
					
						2025-04-29 20:20:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0350809f3a 
					 
					
						
						
							
							Remove Falcon3 2x7B from CI ( #17404 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-29 19:52:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6977dbd15 
					 
					
						
						
							
							Simplify (and fix) passing of guided decoding backend options ( #17008 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-29 19:02:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2fa2a50bf9 
					 
					
						
						
							
							[Bugfix] Fix Minicpm-O-int4 GPTQ model inference ( #17397 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-29 18:21:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						08e15defa9 
					 
					
						
						
							
							[CI/Build] Add retry mechanism for add-apt-repository ( #17107 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-29 10:40:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b37685afbb 
					 
					
						
						
							
							[CI] Uses Python 3.11 for TPU ( #17359 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-04-29 17:39:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						792595b59d 
					 
					
						
						
							
							[TPU][V1][CI] Replace python3 setup.py develop with standard pip install --e on TPU ( #17374 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-29 10:36:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0c1c788312 
					 
					
						
						
							
							[Doc][Typo] Fixing label in new model requests link in overview.md ( #17400 )  
						
						
						
						
					 
					
						2025-04-29 10:29:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						56d64fbe30 
					 
					
						
						
							
							[Docs] Propose a deprecation policy for the project ( #17063 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-29 10:29:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						608968b7c5 
					 
					
						
						
							
							Enabling multi-group kernel tests. ( #17115 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-04-29 10:27:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						06ffc7e1d3 
					 
					
						
						
							
							[Misc][ROCm] Exclude cutlass_mla_decode for ROCm build ( #17289 )  
						
						... 
						
						
						
						Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com > 
						
						
					 
					
						2025-04-29 10:26:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d3cf61b89b 
					 
					
						
						
							
							fix gemma3 results all zero ( #17364 )  
						
						... 
						
						
						
						Signed-off-by: mayuyuace <qiming1.zhang@intel.com > 
						
						
					 
					
						2025-04-29 09:40:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a39203f99e 
					 
					
						
						
							
							[Bugfix] add qwen3 reasoning-parser fix content is None when disable … ( #17369 )  
						
						... 
						
						
						
						Signed-off-by: mofanke <mofanke@gmail.com > 
						
						
					 
					
						2025-04-29 16:32:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						24e6ad3f16 
					 
					
						
						
							
							[V1] Remove num_input_tokens from attn_metadata ( #17193 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-04-29 09:28:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ef5d106bb 
					 
					
						
						
							
							Improve literal dataclass field conversion to argparse argument ( #17391 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-29 16:25:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0ed27ef66c 
					 
					
						
						
							
							Fix: Spelling of inference ( #17387 )  
						
						
						
						
					 
					
						2025-04-29 09:23:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						900edfa8d4 
					 
					
						
						
							
							Transformers backend tweaks ( #17365 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-29 09:08:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						88ad9ec6b2 
					 
					
						
						
							
							[Frontend] Support chat_template_kwargs in LLM.chat ( #17356 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-29 22:03:35 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						40896bdf3f 
					 
					
						
						
							
							pre-commit autoupdate (#17380 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-29 06:46:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						00ee37efa2 
					 
					
						
						
							
							[Bugfix] Clean up MiniMax-VL and fix processing ( #17354 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-29 20:42:16 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						890f104cdf 
					 
					
						
						
							
							[Doc] Fix QWen3MOE info ( #17381 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-29 12:38:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4a5e13149a 
					 
					
						
						
							
							Update docs requirements ( #17379 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-29 11:35:47 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						97cc8729f0 
					 
					
						
						
							
							[Model] Ignore rotary embed load for Cohere model ( #17319 )  
						
						
						
						
					 
					
						2025-04-29 00:30:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4464109219 
					 
					
						
						
							
							[Build][Bugfix] Restrict setuptools version to <80 ( #17320 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-29 00:17:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						193e78e35d 
					 
					
						
						
							
							[Fix] Documentation spacing in compilation config help text ( #17342 )  
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-04-29 00:16:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bdb2cddafc 
					 
					
						
						
							
							[Misc]Use a platform independent interface to obtain the device attributes ( #17100 )  
						
						
						
						
					 
					
						2025-04-29 06:59:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ebb3930d28 
					 
					
						
						
							
							[Misc] Move config fields to MultiModalConfig ( #17343 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-29 06:37:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cde384cd92 
					 
					
						
						
							
							[Model] support MiniMax-VL-01 model ( #16328 )  
						
						... 
						
						
						
						Signed-off-by: qingjun <qingjun@minimaxi.com > 
						
						
					 
					
						2025-04-29 12:05:50 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						96e06e3cb7 
					 
					
						
						
							
							[Misc] Add a Jinja template to support Mistral3 function calling ( #17195 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-28 19:53:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						17eb306fcc 
					 
					
						
						
							
							[Bugfix] Add contiguous call inside rope kernel wrapper ( #17091 )  
						
						... 
						
						
						
						Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn >
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn > 
						
						
					 
					
						2025-04-28 19:24:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						165cb56329 
					 
					
						
						
							
							Ignore '<string>' filepath ( #17330 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-28 19:23:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6da8a8ff2 
					 
					
						
						
							
							[Bugfix] Fix numel() downcast in fused_layernorm_dynamic_per_token_quant.cu ( #17316 )  
						
						
						
						
					 
					
						2025-04-28 19:23:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b4ac4fa04d 
					 
					
						
						
							
							[model] make llama4 compatible with pure dense layers ( #17315 )  
						
						... 
						
						
						
						Signed-off-by: Lucia Fang <fanglu@fb.com > 
						
						
					 
					
						2025-04-29 10:22:22 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e136000595 
					 
					
						
						
							
							[V1][Spec Decode] Make Eagle model arch config driven ( #17323 )  
						
						
						
						
					 
					
						2025-04-29 10:22:02 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						86d9fc29cb 
					 
					
						
						
							
							implement Structural Tag with Guidance backend ( #17333 )  
						
						... 
						
						
						
						Signed-off-by: Michal Moskal <michal@moskal.me > 
						
						
					 
					
						2025-04-29 02:21:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						506475de5f 
					 
					
						
						
							
							[Optim] Compute multimodal hash only once per item ( #17314 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-29 09:40:35 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cfe4532093 
					 
					
						
						
							
							[Benchmark] Add single turn MTBench to Serving Bench ( #17202 )  
						
						
						
						
					 
					
						2025-04-28 16:46:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8fc88d63f1 
					 
					
						
						
							
							[Model] Add tuned triton fused_moe configs for Qwen3Moe ( #17328 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-28 15:20:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6e74fd4945 
					 
					
						
						
							
							Support loading transformers models with named parameters ( #16868 )  
						
						... 
						
						
						
						Signed-off-by: Alex <alexwu@character.ai > 
						
						
					 
					
						2025-04-28 23:15:58 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dcbac4cb4b 
					 
					
						
						
							
							[Model] Qwen3 Dense FP8 Compat Fixes ( #17318 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <xmo@berkeley.edu > 
						
						
					 
					
						2025-04-28 14:12:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed2462030f 
					 
					
						
						
							
							[Bugfix] Fix moe weight losing all extra attrs after process_weights_after_loading. ( #16854 )  
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-04-28 21:05:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cc5befbced 
					 
					
						
						
							
							[BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) ( #17283 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-28 13:55:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c89cd96a8 
					 
					
						
						
							
							[Chore] cleanup license indicators in light of SPDX ( #17259 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-28 19:43:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a0304dc504 
					 
					
						
						
							
							[Security] Don't bind tcp zmq socket to all interfaces ( #17197 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-28 10:08:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c7941cca18 
					 
					
						
						
							
							Explicitly explain quant method override ordering and ensure all overrides are ordered ( #17256 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-28 16:55:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6dd32aa07 
					 
					
						
						
							
							Make name of compressed-tensors quant method consistent across vLLM ( #17255 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-28 16:28:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f94886946e 
					 
					
						
						
							
							Improve conversion from dataclass configs to argparse arguments ( #17303 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-28 16:22:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						72dfe4c74f 
					 
					
						
						
							
							[Docs] Add a security guide ( #17230 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-28 15:12:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8b464d9660 
					 
					
						
						
							
							[Misc] Clean up Qwen2.5-Omni code ( #17301 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-28 06:20:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						889ebb2638 
					 
					
						
						
							
							[Misc] Minor typo/grammar in platforms/interface.py ( #17307 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-28 05:45:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3ad986c28b 
					 
					
						
						
							
							[doc] update wrong model id ( #17287 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-28 04:20:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						344e193b7d 
					 
					
						
						
							
							[Bugfix] Add missing get_language_model to new MLLMs ( #17300 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-28 04:09:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fb1c933ade 
					 
					
						
						
							
							Add missing class docstring for PromptAdapterConfig ( #17302 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-28 04:06:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						72c5b97231 
					 
					
						
						
							
							Update tpu_worker.py 's typo ( #17288 )  
						
						
						
						
					 
					
						2025-04-28 04:01:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fa93cd9f60 
					 
					
						
						
							
							[Model] Add Granite Speech Support ( #16246 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com >
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-04-28 10:05:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aec9674dbe 
					 
					
						
						
							
							[Core] Remove legacy input mapper/processor from V0 ( #15686 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-28 15:38:48 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7fcc4223dc 
					 
					
						
						
							
							[Minor][Models] Pass partial_rotary_factor parameter to rope ( #17266 )  
						
						... 
						
						
						
						Signed-off-by: evian <eviantai@u.nus.edu >
Co-authored-by: evian <eviantai@u.nus.edu > 
						
						
					 
					
						2025-04-28 04:28:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8262a3e23b 
					 
					
						
						
							
							[Misc] Validate stop_token_ids contents ( #17268 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-28 03:54:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f211331c48 
					 
					
						
						
							
							[Doc] small fix ( #17277 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-28 03:53:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9053d0b134 
					 
					
						
						
							
							[Doc] Fix wrong github link in LMCache examples ( #17274 )  
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu > 
						
						
					 
					
						2025-04-28 03:09:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb3f2d8d10 
					 
					
						
						
							
							[Bugfix] Fix Mistral3 spatial merge error ( #17270 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-27 19:40:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c12df53b60 
					 
					
						
						
							
							[Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… ( #16751 )  
						
						... 
						
						
						
						Signed-off-by: Ther-LF <2639852836@qq.com > 
						
						
					 
					
						2025-04-27 19:38:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d1aeea7553 
					 
					
						
						
							
							[Bugfix] Fix missing ARG in Dockerfile for arm64 platforms ( #17261 )  
						
						... 
						
						
						
						Signed-off-by: lkm-schulz <44176356+lkm-schulz@users.noreply.github.com > 
						
						
					 
					
						2025-04-27 19:38:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d8bccde686 
					 
					
						
						
							
							[BugFix] Fix vllm_flash_attn install issues ( #17267 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-04-27 17:27:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						20e489eaa1 
					 
					
						
						
							
							[V1][Spec Decode] Make eagle compatible with prefix caching. ( #17137 )  
						
						... 
						
						
						
						Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com > 
						
						
					 
					
						2025-04-27 09:29:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4213475ec7 
					 
					
						
						
							
							[Metrics] Fix minor inconsistencies in bucket progression ( #17262 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-27 16:19:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d92879baf6 
					 
					
						
						
							
							[doc] Add feature status legend ( #17257 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-27 08:17:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						690fe019f0 
					 
					
						
						
							
							[Feature] support sequence parallelism using compilation pass ( #16155 )  
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-04-27 06:29:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed7a29d9f8 
					 
					
						
						
							
							[NVIDIA] Support Cutlass MLA for Blackwell GPUs ( #16032 )  
						
						... 
						
						
						
						Signed-off-by: kaixih <kaixih@nvidia.com > 
						
						
					 
					
						2025-04-27 06:29:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						756848e79e 
					 
					
						
						
							
							[Bugfix] Fix Lora Name Parsing ( #17196 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-27 20:33:09 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						18445edd0f 
					 
					
						
						
							
							[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens ( #17033 )  
						
						... 
						
						
						
						Signed-off-by: sfc-gh-zhwang <flex.wang@snowflake.com > 
						
						
					 
					
						2025-04-27 12:30:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						30215ca61f 
					 
					
						
						
							
							[MISC] Use string annotation types for class definitions ( #17244 )  
						
						... 
						
						
						
						Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com > 
						
						
					 
					
						2025-04-27 08:39:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						838cedade7 
					 
					
						
						
							
							[Bugfix] Get a specific type of layer from forward context ( #17222 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-04-27 00:58:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4283a28c2f 
					 
					
						
						
							
							[Bugfix] Fix QWen2 VL multimodal mapping ( #17240 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-27 05:53:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93a126fbc7 
					 
					
						
						
							
							[Misc] Make cached tokenizer pickle-compatible ( #17048 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-27 13:05:00 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8e4b351a0c 
					 
					
						
						
							
							[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel ( #12591 )  
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-04-27 00:35:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9869453c42 
					 
					
						
						
							
							Update test_flash_attn.py ( #17102 )  
						
						... 
						
						
						
						Signed-off-by: ShuaibinLi <lishuaibin@live.cn > 
						
						
					 
					
						2025-04-26 22:17:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3642c59aa8 
					 
					
						
						
							
							[CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh ( #16271 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-26 18:25:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						43eea2953b 
					 
					
						
						
							
							[Minor] Fix lint error in main branch ( #17233 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-26 11:10:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						de7eb10ce4 
					 
					
						
						
							
							[Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation ( #16878 )  
						
						... 
						
						
						
						Signed-off-by: imkero <kerorek@outlook.com > 
						
						
					 
					
						2025-04-26 10:41:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fd11a325b8 
					 
					
						
						
							
							[MISC] rename interval to max_recent_requests ( #14285 )  
						
						
						
						
					 
					
						2025-04-26 16:59:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4d17e20310 
					 
					
						
						
							
							Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 ( #16573 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-04-26 09:17:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						10fd1d7380 
					 
					
						
						
							
							[Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps ( #9276 )  
						
						... 
						
						
						
						Signed-off-by: changjun.lee <pord7457@gmail.com > 
						
						
					 
					
						2025-04-26 11:51:17 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						52b4f4a8d7 
					 
					
						
						
							
							[Docs] Update structured output doc for V1 ( #17135 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-26 15:12:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e782e0a170 
					 
					
						
						
							
							[Chore] added stubs for vllm_flash_attn during development mode ( #17228 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-04-26 07:45:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc2ceca5c5 
					 
					
						
						
							
							[BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set ( #17088 )  
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-04-26 14:34:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f8acd01ff7 
					 
					
						
						
							
							[V1] Add structural_tag support using xgrammar ( #17085 )  
						
						
						
						
					 
					
						2025-04-26 14:06:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c48334d405 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device ( #17186 )  
						
						... 
						
						
						
						Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai > 
						
						
					 
					
						2025-04-26 05:55:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						909fdaf152 
					 
					
						
						
							
							[Bugfix] Fix standard models tests ( #17217 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-26 02:26:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8c1c926d00 
					 
					
						
						
							
							[Bugfix] Fix missing int type for -n in multi-image example ( #17223 )  
						
						
						
						
					 
					
						2025-04-26 08:49:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						df6f3ce883 
					 
					
						
						
							
							[Core] Remove prompt string from engine core data structures ( #17214 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-25 23:41:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						513f074766 
					 
					
						
						
							
							[CI/test] Fix Eagle Correctness Test ( #17209 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-25 23:40:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b07bf83c7d 
					 
					
						
						
							
							[BugFix] Avoid race conditions in zero-copy tensor transmission ( #17203 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-26 06:00:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						53e8cf53a4 
					 
					
						
						
							
							[V1][Metrics] Allow V1 AsyncLLM to use custom logger ( #14661 )  
						
						... 
						
						
						
						Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-25 22:05:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						54271bb766 
					 
					
						
						
							
							[ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. ( #17011 )  
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-04-25 22:05:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e96f56efb 
					 
					
						
						
							
							Allocate kv_cache with stride order ( #16605 )  
						
						... 
						
						
						
						Signed-off-by: shuw <shuw@nvidia.com > 
						
						
					 
					
						2025-04-25 22:03:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b278911229 
					 
					
						
						
							
							[Minor][Models] Fix Return Types of Llama & Eagle ( #17220 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-25 21:54:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7bd0c7745c 
					 
					
						
						
							
							[Doc] Minor fix for the vLLM TPU setup page ( #17206 )  
						
						... 
						
						
						
						Signed-off-by: Yarong Mu <ymu@google.com > 
						
						
					 
					
						2025-04-26 04:39:56 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1cf0719ebd 
					 
					
						
						
							
							[Minor][Spec Decode] Add use_eagle to SpeculativeConfig ( #17213 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-25 21:08:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						537d5ee025 
					 
					
						
						
							
							[doc] add Anything LLM integration ( #17216 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-25 21:03:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c8e5be35f7 
					 
					
						
						
							
							[MISC][AMD] Add unused annotation to rocm kernel file ( #17097 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-04-25 20:33:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6e72e1e4f 
					 
					
						
						
							
							[Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env ( #17142 )  
						
						... 
						
						
						
						Signed-off-by: James Wu <jjwu@meta.com > 
						
						
					 
					
						2025-04-26 11:28:20 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5e83a7277f 
					 
					
						
						
							
							[v1] [P/D] Adding LMCache KV connector for v1 ( #16625 )  
						
						
						
						
					 
					
						2025-04-26 03:03:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						68af5f6c5c 
					 
					
						
						
							
							[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary ( #17215 )  
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-04-25 19:55:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8de2901fea 
					 
					
						
						
							
							[Bugfix] gemma[2,3] interleaved attention when sliding window is disabled ( #17180 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-04-25 19:53:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c53e0730cb 
					 
					
						
						
							
							[Misc] Refine ray_serve_deepseek example ( #17204 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-04-25 16:06:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a0e619e62a 
					 
					
						
						
							
							[V1][Spec Decode] EAGLE-3 Support ( #16937 )  
						
						... 
						
						
						
						Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Co-authored-by: Bryan Lu <yuzhelu@amazon.com > 
						
						
					 
					
						2025-04-25 15:43:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70116459c3 
					 
					
						
						
							
							[BugFix][Frontend] Fix LLM.chat() tokenization ( #16081 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-25 22:20:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65e262b93b 
					 
					
						
						
							
							Fix Python packaging edge cases ( #17159 )  
						
						... 
						
						
						
						Signed-off-by: Christian Heimes <christian@python.org > 
						
						
					 
					
						2025-04-26 06:15:07 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						43faa0461a 
					 
					
						
						
							
							[Bugfix] Fix hybrid model tests ( #17182 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-25 15:14:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						48cb2109b6 
					 
					
						
						
							
							[V1] Move usage stats to worker and start logging TPU hardware ( #16211 )  
						
						
						
						
					 
					
						2025-04-25 14:06:01 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a5450f11c9 
					 
					
						
						
							
							[Security] Use safe serialization and fix zmq setup for mooncake pipe ( #17192 )  
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-04-25 16:53:23 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9d98ab5ec6 
					 
					
						
						
							
							[Misc] Inline Molmo requirements ( #17190 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-25 16:41:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						df5c879527 
					 
					
						
						
							
							[doc] update wrong hf model links ( #17184 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-25 16:40:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						423e9f1cbe 
					 
					
						
						
							
							Use Transformers helper get_text_config() instead of checking for text_config ( #17105 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-25 08:47:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0bd7f8fca5 
					 
					
						
						
							
							Bump Transformers to 4.51.3 ( #17116 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-25 08:34:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d5615af9ae 
					 
					
						
						
							
							[Bugfix] Fix Mistral ChatCompletionRequest Body Exception ( #16769 )  
						
						... 
						
						
						
						Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-25 07:26:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						19dcc02a72 
					 
					
						
						
							
							[Bugfix] Fix mistral model tests ( #17181 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-25 06:03:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7feae92c1f 
					 
					
						
						
							
							[Doc] Move todo out of beam search docstring ( #17183 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-04-25 04:44:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f851b84266 
					 
					
						
						
							
							[Doc] Add two links to disagg_prefill.md ( #17168 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-25 10:23:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc966e9cc6 
					 
					
						
						
							
							Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 ( #17158 )  
						
						
						
						
					 
					
						2025-04-25 17:10:32 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ef19e67d2c 
					 
					
						
						
							
							[Doc] Add headings to improve gptqmodel.md ( #17164 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-25 01:13:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a41351f363 
					 
					
						
						
							
							[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization ( #15734 )  
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com > 
						
						
					 
					
						2025-04-25 00:45:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6aae216b4e 
					 
					
						
						
							
							[Bugfix] remove fallback in guided_json (int range, patterns) ( #16725 )  
						
						... 
						
						
						
						Signed-off-by: csy1204 <josang1204@gmail.com >
Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com > 
						
						
					 
					
						2025-04-25 06:54:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b22980a1dc 
					 
					
						
						
							
							[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance ( #16457 )  
						
						... 
						
						
						
						Signed-off-by: cynthieye <yexin93@qq.com >
Co-authored-by: MagnetoWang <magnetowang@outlook.com > 
						
						
					 
					
						2025-04-25 14:52:28 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						881f735827 
					 
					
						
						
							
							[Misc] Benchmark Serving Script Support Appending Results ( #17028 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-24 22:53:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2f54045508 
					 
					
						
						
							
							[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton ( #15099 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-04-24 22:51:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5aa6efb9a5 
					 
					
						
						
							
							[Misc] Clean up redundant code in uniproc_executor.py ( #16762 )  
						
						... 
						
						
						
						Signed-off-by: Lifu Huang <lifu.hlf@gmail.com > 
						
						
					 
					
						2025-04-24 22:49:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6ca0234478 
					 
					
						
						
							
							Move missed SchedulerConfig args into scheduler config group in EngineArgs ( #17131 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-24 22:48:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						649818995f 
					 
					
						
						
							
							[Docs] Fix True->true in supported_models.md ( #17141 )  
						
						
						
						
					 
					
						2025-04-25 04:20:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7a0a9da72b 
					 
					
						
						
							
							[Doc] V1 : Update LoRA status ( #17133 )  
						
						... 
						
						
						
						Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com >
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-04-24 20:17:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						69bff9bc89 
					 
					
						
						
							
							fix float16 support for kimi-vl ( #17156 )  
						
						... 
						
						
						
						Co-authored-by: zhouzaida <zhouzaida@msh.team > 
						
						
					 
					
						2025-04-24 20:16:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						41ca7eb491 
					 
					
						
						
							
							[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 ( #16864 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-24 20:12:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eef364723c 
					 
					
						
						
							
							[FEAT] [ROCm]: AITER Fused MOE V1 Support ( #16752 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-04-25 11:06:50 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d6e187e88 
					 
					
						
						
							
							Use custom address for listening socket ( #15988 )  
						
						... 
						
						
						
						Signed-off-by: Jens Glaser <glaserj@ornl.gov > 
						
						
					 
					
						2025-04-25 01:57:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9420a1fc30 
					 
					
						
						
							
							Better error message for missing mistral params.json ( #17132 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-24 23:43:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						583e900996 
					 
					
						
						
							
							[Misc] Add example to run DeepSeek with Ray Serve LLM ( #17134 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-04-24 22:25:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						05e1fbfc52 
					 
					
						
						
							
							Add chat template for Llama 4 models ( #16428 )  
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-04-24 20:19:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fe92176321 
					 
					
						
						
							
							Add collective_rpc to llm engine ( #16999 )  
						
						... 
						
						
						
						Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai > 
						
						
					 
					
						2025-04-24 20:16:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d0df0ebeb 
					 
					
						
						
							
							[Docs] Generate correct github links for decorated functions ( #17125 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-24 10:39:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0fa939e2d1 
					 
					
						
						
							
							Improve configs - LoRAConfig + PromptAdapterConfig ( #16980 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-24 10:29:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0422ce109f 
					 
					
						
						
							
							Add :markdownhelp: to EngineArgs docs so markdown docstrings render properly ( #17124 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-24 10:28:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						47bdee409c 
					 
					
						
						
							
							Molmo Requirements ( #17026 )  
						
						... 
						
						
						
						Signed-off-by: Eyshika Agarwal <eyshikaengineer@gmail.com >
Signed-off-by: eyshika <eyshikaengineer@gmail.com > 
						
						
					 
					
						2025-04-24 10:08:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						49f189439d 
					 
					
						
						
							
							existing torch installation pip command fix for docs ( #17059 )  
						
						
						
						
					 
					
						2025-04-24 10:07:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5adf6f6b7f 
					 
					
						
						
							
							Updating builkite job for IBM Power  ( #17111 )  
						
						... 
						
						
						
						Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com > 
						
						
					 
					
						2025-04-24 10:06:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4115f19958 
					 
					
						
						
							
							[CI] Add automation for the tool-calling github label ( #17118 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-24 09:22:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						340d7b1b21 
					 
					
						
						
							
							[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics ( #16665 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-04-24 08:57:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1bcbcbf574 
					 
					
						
						
							
							[Misc] refactor example series - structured outputs ( #17040 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-24 07:49:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						82e43b2d7e 
					 
					
						
						
							
							Add missing rocm_skinny_gemms kernel test to CI ( #17060 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-24 07:49:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						67309a1cb5 
					 
					
						
						
							
							[Frontend] Using matryoshka_dimensions control the allowed output dimensions. ( #16970 )  
						
						
						
						
					 
					
						2025-04-24 07:06:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b724afe343 
					 
					
						
						
							
							[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning ( #16954 )  
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-24 06:15:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						21f4f1c9a4 
					 
					
						
						
							
							Improve static type checking in LoRAModelRunnerMixin ( #17104 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-24 06:14:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b0c1f6202d 
					 
					
						
						
							
							[Misc] Remove OLMo2 config copy ( #17066 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-24 06:14:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c0dfd97519 
					 
					
						
						
							
							[V1][PP] Optimization: continue scheduling prefill chunks ( #17080 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-04-24 05:27:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a9138e85b1 
					 
					
						
						
							
							Fix OOT registration test ( #17099 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-24 04:44:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0a05ed57e6 
					 
					
						
						
							
							Simplify TokenizerGroup ( #16790 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-24 04:43:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						14288d1332 
					 
					
						
						
							
							Disable enforce_eager for V1 TPU sampler and structured output tests ( #17016 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-24 02:50:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b411418ff0 
					 
					
						
						
							
							[Chore] Remove Sampler from Model Code ( #17084 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-24 02:49:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2bc0f72ae5 
					 
					
						
						
							
							Add docs for runai_streamer_sharded ( #17093 )  
						
						... 
						
						
						
						Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-24 01:03:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9c1244de57 
					 
					
						
						
							
							[doc] update to hyperlink ( #17096 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-24 00:58:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						db2f8d915c 
					 
					
						
						
							
							[V1] Update structured output ( #16812 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-23 23:57:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6167c0e5d2 
					 
					
						
						
							
							[Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… ( #16472 )  
						
						... 
						
						
						
						Signed-off-by: 开哲 <kaizhe.zy@alibaba-inc.com >
Co-authored-by: 开哲 <kaizhe.zy@alibaba-inc.com > 
						
						
					 
					
						2025-04-24 11:25:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed2e464653 
					 
					
						
						
							
							Addendum Fix to support FIPS enabled machines with MD5 hashing ( #17043 )  
						
						... 
						
						
						
						Signed-off-by: sydarb <areebsyed237@gmail.com > 
						
						
					 
					
						2025-04-23 19:55:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c8ed8ee48 
					 
					
						
						
							
							More informative error when using Transformers backend ( #16988 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-23 19:54:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed50f46641 
					 
					
						
						
							
							[Bugfix] Enable V1 usage stats ( #16986 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-23 19:54:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						46e678bcff 
					 
					
						
						
							
							[Minor] Use larger batch sizes for A100/B100/B200/MI300x ( #17073 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-23 19:18:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6b2427f995 
					 
					
						
						
							
							[Quantization]add prefix for commandA quantized model ( #17017 )  
						
						
						
						
					 
					
						2025-04-23 17:32:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b07d741661 
					 
					
						
						
							
							[CI/Build] workaround for CI build failure ( #17070 )  
						
						... 
						
						
						
						Signed-off-by: csy1204 <josang1204@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-23 16:14:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						41fb013d29 
					 
					
						
						
							
							[V1][Spec Decode] Always use argmax for sampling draft tokens  ( #16899 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-23 14:57:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32d4b669d0 
					 
					
						
						
							
							[BugFix][V1] Fix int32 token index overflow when preparing input ids ( #16806 )  
						
						
						
						
					 
					
						2025-04-23 12:12:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3cde34a4a4 
					 
					
						
						
							
							[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar ( #15949 )  
						
						... 
						
						
						
						Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com > 
						
						
					 
					
						2025-04-23 18:34:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bdb3660312 
					 
					
						
						
							
							Use @property and private field for data_parallel_rank_local ( #17053 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-23 08:50:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f3a21e9c68 
					 
					
						
						
							
							CacheConfig.block_size should always be int when used (#17052 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-23 08:50:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8e630d680e 
					 
					
						
						
							
							Improve Transformers backend model loading QoL ( #17039 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-23 07:33:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						af869f6dff 
					 
					
						
						
							
							[CI] Update structured-output label automation ( #17055 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-23 07:33:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						53c0fa1e25 
					 
					
						
						
							
							Ensure that pid passed to kill_process_tree is int for mypy ( #17051 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-23 07:32:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f7912cba3d 
					 
					
						
						
							
							[Doc] Add top anchor and a note to quantization/bitblas.md ( #17042 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-23 07:32:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6317a5174a 
					 
					
						
						
							
							Categorize tests/kernels/ based on kernel type ( #16799 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-23 09:21:07 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa72d9a4ea 
					 
					
						
						
							
							Mistral-format support for compressed-tensors ( #16803 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-23 08:46:23 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce17db8085 
					 
					
						
						
							
							[CI] Run v1/test_serial_utils.py in CI ( #16996 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-23 01:13:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8c87a9ad46 
					 
					
						
						
							
							[Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers ( #16964 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-23 07:24:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec69124eb4 
					 
					
						
						
							
							[Misc] Improve readability of get_open_port function. ( #17024 )  
						
						... 
						
						
						
						Signed-off-by: gitover22 <qidizou88@gmail.com > 
						
						
					 
					
						2025-04-23 06:16:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d0da99fb70 
					 
					
						
						
							
							[BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) ( #16998 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-22 21:49:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b2f195c429 
					 
					
						
						
							
							[V1] Avoid socket errors during shutdown when requests are in in-flight ( #16807 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-23 12:36:29 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						047797ef90 
					 
					
						
						
							
							[Bugfix] Triton FA function takes no keyword arguments ( #16902 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-04-22 21:35:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eb8ef4224d 
					 
					
						
						
							
							[doc] add download path tips ( #17013 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-23 04:06:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						56a735261c 
					 
					
						
						
							
							[INTEL-HPU][v0] Port delayed sampling to upstream ( #16949 )  
						
						... 
						
						
						
						Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai > 
						
						
					 
					
						2025-04-22 20:14:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e1cf90e099 
					 
					
						
						
							
							[misc] tune some env vars for GB200 ( #16992 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-23 10:59:48 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6bc1e30ef9 
					 
					
						
						
							
							Revert "[Misc] Add S3 environment variables for better support of MinIO." ( #17021 )  
						
						
						
						
					 
					
						2025-04-22 19:22:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e081ba7ca 
					 
					
						
						
							
							[BugFix] Revert ROCm Custom Paged Attention Env Flag Check ( #17022 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-04-22 19:17:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1e013fa388 
					 
					
						
						
							
							[V1][DP] More robust DP/EP dummy request coordination ( #16277 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-22 19:12:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bc7c4d206b 
					 
					
						
						
							
							[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 ( #13305 )  
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com >
Signed-off-by: maleksan85 <maleksan@amd.com >
Signed-off-by: <>
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com > 
						
						
					 
					
						2025-04-22 19:11:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f67e9e9f22 
					 
					
						
						
							
							add Dockerfile build vllm against torch nightly ( #16936 )  
						
						... 
						
						
						
						Signed-off-by: Yang Wang <elainewy@meta.com > 
						
						
					 
					
						2025-04-22 19:08:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						36fe78769f 
					 
					
						
						
							
							[Bugfix] validate urls object for multimodal content parts ( #16990 )  
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-23 09:43:06 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83d933718c 
					 
					
						
						
							
							[Core][V1][TPU] Enable structured decoding on TPU V1 ( #16499 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-04-22 18:05:23 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5175b884f7 
					 
					
						
						
							
							[BugFix] Remove default multiproc executor collective_rpc timeout ( #17000 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-22 23:27:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5536b30a4c 
					 
					
						
						
							
							Fencing Kernels Tests for enabling on AMD ( #16929 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-04-22 09:32:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7f58fb9718 
					 
					
						
						
							
							Add assertion for no objects while hashing hf_config ( #16930 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-22 09:32:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						30bc3e0f66 
					 
					
						
						
							
							[FEAT][ROCm]: Support AITER MLA ( #15893 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: qli88 <qiang.li2@amd.com > 
						
						
					 
					
						2025-04-22 09:31:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f34410715f 
					 
					
						
						
							
							[frontend] enhance tool_calls type check ( #16882 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-22 15:40:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						68d4c33202 
					 
					
						
						
							
							[Misc] Add S3 environment variables for better support of MinIO. ( #16977 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-22 14:27:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f961d7f6ef 
					 
					
						
						
							
							[BugFix] Pass in correct VLLM config in FlashInfer backend ( #13207 ) ( #16973 )  
						
						... 
						
						
						
						Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn >
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn > 
						
						
					 
					
						2025-04-22 06:44:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d059110498 
					 
					
						
						
							
							Improve configs - SpeculativeConfig ( #16971 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-22 12:55:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						571e8dd65e 
					 
					
						
						
							
							[Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni ( #16974 )  
						
						... 
						
						
						
						Signed-off-by: fyabc <suyang.fy@alibaba-inc.com > 
						
						
					 
					
						2025-04-22 12:23:17 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4b91c927f6 
					 
					
						
						
							
							[Misc] refactor example series ( #16972 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-22 11:44:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0e237f0035 
					 
					
						
						
							
							[FEAT][ROCm] Integrate Paged Attention Kernel from AITER ( #15001 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-04-22 02:46:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8f7bace7c3 
					 
					
						
						
							
							[Doc] Improve documentation for multimodal CLI args ( #16960 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-22 08:35:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e4d6144232 
					 
					
						
						
							
							[BugFix] Fix incremental detokenization perf issue ( #16963 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-22 08:16:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8d32dc603d 
					 
					
						
						
							
							[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS ( #6036 )  
						
						... 
						
						
						
						Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com >
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com > 
						
						
					 
					
						2025-04-22 09:01:36 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c4ab9f3e71 
					 
					
						
						
							
							[V1] Remove pre-allocation for KV cache ( #16941 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-22 00:52:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2689d5c027 
					 
					
						
						
							
							[Model] Use autoweightloader for mamba ( #16950 )  
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-04-22 07:48:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						acba33a0f1 
					 
					
						
						
							
							[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams ( #16767 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-22 06:02:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a114bf20a3 
					 
					
						
						
							
							[Perf] Optimize _update_states for GPU model runner ( #16910 )  
						
						... 
						
						
						
						Signed-off-by: snowcharm <snowcharmqq@gmail.com > 
						
						
					 
					
						2025-04-22 14:01:54 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3097ce3a32 
					 
					
						
						
							
							[Doc] Update ai_accelerator/hpu-gaudi.inc.md ( #16956 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-22 05:33:27 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6da9322c8 
					 
					
						
						
							
							[Bugfix] Fix f-string for Python 3.9-3.11 ( #16962 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-21 21:45:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71ce44047f 
					 
					
						
						
							
							Support S3 Sharded loading with RunAI Model Streamer ( #16317 )  
						
						... 
						
						
						
						Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-21 21:21:49 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						188b7f9b8c 
					 
					
						
						
							
							[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm ( #15830 )  
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-04-21 20:46:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b9b4746950 
					 
					
						
						
							
							[V1] Remove additional_config check ( #16710 )  
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-04-21 20:45:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7b8a2ab76f 
					 
					
						
						
							
							[Kernel] Add expert_map support to Cutlass FP8 MOE ( #16861 )  
						
						... 
						
						
						
						Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com >
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-04-21 20:44:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c9acbf1141 
					 
					
						
						
							
							[Misc] Remove the chunked prefill warning for LoRA  ( #16925 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-21 20:44:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b794cae8d 
					 
					
						
						
							
							[ROCm] Add aiter tkw1 kernel for Llama4 fp8 ( #16727 )  
						
						... 
						
						
						
						Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-04-21 20:42:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0e4254492f 
					 
					
						
						
							
							[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other ( #16863 )  
						
						... 
						
						
						
						Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com > 
						
						
					 
					
						2025-04-22 11:40:19 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1311913f55 
					 
					
						
						
							
							[BugFix][Spec Decode] No in-place update to draft probs ( #16952 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-21 19:54:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						29f395c97c 
					 
					
						
						
							
							[Doc] Remove unnecessary V1 flag ( #16924 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-21 21:04:38 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fa3bba2a53 
					 
					
						
						
							
							[TPU][V1] Enable Top-P ( #16843 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-22 00:46:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						986537f1c3 
					 
					
						
						
							
							[V1] V1 FlashInfer Attention ( #16684 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Aurick Qiao <qiao@aurick.net > 
						
						
					 
					
						2025-04-22 00:38:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						210207525e 
					 
					
						
						
							
							[TPU][V1] Capture multimodal encoder during model compilation ( #15051 )  
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-04-21 18:36:59 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71eda0bb76 
					 
					
						
						
							
							Update Qwen1.5-MoE-W4A16-compressed-tensors.yaml ( #16946 )  
						
						
						
						
					 
					
						2025-04-21 18:35:32 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						471fe65630 
					 
					
						
						
							
							[TPU][V1] Implicitly adjust page size when there's SMEM OOM ( #16871 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-21 15:43:13 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a0fba5cf4 
					 
					
						
						
							
							[V1][Spec Decode] Handle draft tokens beyond max_model_len ( #16087 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-21 12:38:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						299ebb62b2 
					 
					
						
						
							
							[Core] Speed up decode by remove synchronizing operation in sampler ( #16436 )  
						
						... 
						
						
						
						Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com >
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com > 
						
						
					 
					
						2025-04-21 18:18:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f728ab8e35 
					 
					
						
						
							
							[Doc] mention how to install in CPU editable mode ( #16923 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-04-21 17:45:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						63e26fff78 
					 
					
						
						
							
							[doc] install required python3-dev apt package ( #16888 )  
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-04-21 16:15:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fe3462c774 
					 
					
						
						
							
							[XPU][Bugfix] minor fix for XPU ( #15591 )  
						
						... 
						
						
						
						Signed-off-by: yan ma <yan.ma@intel.com > 
						
						
					 
					
						2025-04-22 00:02:57 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3b34fd5273 
					 
					
						
						
							
							Raise error for data-parallel with benchmark_throughput ( #16737 )  
						
						... 
						
						
						
						Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-21 23:51:43 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55d6d3fdb8 
					 
					
						
						
							
							[Bugfix] Fix GLM rotary_dim issue and support v1 ( #16912 )  
						
						... 
						
						
						
						Signed-off-by: isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-21 14:26:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7272bfae77 
					 
					
						
						
							
							[Misc] Refactor platform to get device specific stream and event ( #14411 )  
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-21 21:25:49 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d9ac9e3dc5 
					 
					
						
						
							
							[Misc] fix collect_env version parse ( #15267 )  
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-04-21 20:29:40 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d41faaf9df 
					 
					
						
						
							
							Restore buffers when wake up from level 2 sleep ( #16564 ) ( #16889 )  
						
						... 
						
						
						
						Signed-off-by: Han <zh950713@gmail.com > 
						
						
					 
					
						2025-04-21 20:18:28 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b34f33438a 
					 
					
						
						
							
							[Doc] Split dummy_processor_inputs() in Multimodal Docs ( #16915 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-04-21 11:10:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						26c0406555 
					 
					
						
						
							
							[Bugfix] Fix distributed bug in Qwen2.5-VL & Qwen2.5-Omni ( #16907 )  
						
						
						
						
					 
					
						2025-04-21 10:25:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4c41278b77 
					 
					
						
						
							
							[CI/CD][V1] Add spec decode tests to CI ( #16900 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-20 22:37:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bb3605db85 
					 
					
						
						
							
							[Bugfix] Fix v1/spec_decode/test_ngram.py ( #16895 )  
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com > 
						
						
					 
					
						2025-04-20 20:54:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fe742aef5a 
					 
					
						
						
							
							[easy] Pass compile_fx only the config patches ( #16845 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-20 12:25:19 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4b07d36891 
					 
					
						
						
							
							Improve configs - CacheConfig ( #16835 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-20 12:25:04 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						87aaadef73 
					 
					
						
						
							
							Serialize tensors using int8 views ( #16866 )  
						
						... 
						
						
						
						Signed-off-by: Staszek Pasko <staszek@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-19 10:28:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						682e0b6d2f 
					 
					
						
						
							
							Log how much time loading a compiled artifact takes ( #16848 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-19 16:50:46 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6195a748b 
					 
					
						
						
							
							[doc] update hyperlink ( #16877 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-19 16:40:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						205d84aaa9 
					 
					
						
						
							
							[VLM] Clean up models ( #16873 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-19 12:13:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5124f5bf51 
					 
					
						
						
							
							[Model] Qwen2.5-Omni Cleanup  ( #16872 )  
						
						
						
						
					 
					
						2025-04-19 09:37:02 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83f3c3bd91 
					 
					
						
						
							
							[Model] Refactor Phi-4-multimodal to use merged processor and support V1 ( #15477 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-19 02:26:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d9737ca1c6 
					 
					
						
						
							
							[V1][Misc] stop update prefix cache stats when logs_stats is disabled ( #16460 )  
						
						... 
						
						
						
						Signed-off-by: vie-serendipity <2733147505@qq.com > 
						
						
					 
					
						2025-04-19 02:25:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9d4ca19d50 
					 
					
						
						
							
							[Misc] Benchmarks for audio models ( #16505 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-19 02:24:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2ef0dc53b8 
					 
					
						
						
							
							[Frontend] Add sampling params to v1/audio/transcriptions endpoint ( #16591 )  
						
						... 
						
						
						
						Signed-off-by: Jannis Schönleber <joennlae@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Jannis Schönleber <joennlae@gmail.com > 
						
						
					 
					
						2025-04-19 07:03:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1d4680fad2 
					 
					
						
						
							
							[rocm][MI300] llama4 maverick fp8 moe config tp8 ( #16847 )  
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com > 
						
						
					 
					
						2025-04-19 06:21:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2c1bd848a6 
					 
					
						
						
							
							[Model][VLM] Add Qwen2.5-Omni model support (thinker only) ( #15130 )  
						
						... 
						
						
						
						Signed-off-by: fyabc <suyang.fy@alibaba-inc.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Xiong Wang <wangxiongts@163.com > 
						
						
					 
					
						2025-04-18 23:14:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5c9121203c 
					 
					
						
						
							
							[release] Publish neuron docker image ( #16733 )  
						
						... 
						
						
						
						Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com > 
						
						
					 
					
						2025-04-18 17:11:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						490b1698a5 
					 
					
						
						
							
							[Doc] Updated Llama section in tool calling docs to have llama 3.2 config info ( #16857 )  
						
						... 
						
						
						
						Signed-off-by: jmho <jaylenho734@gmail.com > 
						
						
					 
					
						2025-04-18 23:28:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5a5e29de88 
					 
					
						
						
							
							[Misc] refactor examples series - Chat Completion Client With Tools ( #16829 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-18 23:24:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d3ab3689f 
					 
					
						
						
							
							[New Model]: Snowflake Arctic Embed (Family)  ( #16649 )  
						
						
						
						
					 
					
						2025-04-18 08:11:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						686623c5e7 
					 
					
						
						
							
							Fix nullable_kvs fallback ( #16837 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-18 05:58:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aadb656562 
					 
					
						
						
							
							[Misc] Clean up Kimi-VL ( #16833 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-18 05:15:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						87e067de41 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for BigCode, GPT-J ( #16823 )  
						
						... 
						
						
						
						Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com > 
						
						
					 
					
						2025-04-18 10:42:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						26507f8973 
					 
					
						
						
							
							[Docs] Fix a link and grammar issue in production-stack.md ( #16809 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-18 06:42:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9c1d5b456d 
					 
					
						
						
							
							[Doc] add podman setup instructions for official image ( #16796 )  
						
						... 
						
						
						
						Signed-off-by: Nathan Weinberg <nweinber@redhat.com > 
						
						
					 
					
						2025-04-18 06:10:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e31045f95c 
					 
					
						
						
							
							[Bugfix] fix pp for llama4 ( #16746 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-04-18 13:51:30 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aaec845f8e 
					 
					
						
						
							
							[ROCm] [Attention] Cleanup ROCm output passing ( #16431 )  
						
						... 
						
						
						
						Signed-off-by: Luka Govedič <lgovedic@redhat.com > 
						
						
					 
					
						2025-04-18 05:46:45 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7bdfd29a35 
					 
					
						
						
							
							[Misc] add collect_env to cli and docker image ( #16759 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-17 22:13:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e78587a64c 
					 
					
						
						
							
							Improve-mm-and-pooler-and-decoding-configs ( #16789 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-17 22:13:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7eb4255628 
					 
					
						
						
							
							[BugFix] Accuracy fix for llama4 int4 - improperly casted scales ( #16801 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-17 22:13:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6a0f547561 
					 
					
						
						
							
							Add hardware print to TPU V1 test ( #16792 )  
						
						
						
						
					 
					
						2025-04-17 22:13:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						30ed81b7ca 
					 
					
						
						
							
							[V1][Structured Output] Minor modification to _validate_structured_output() ( #16748 )  
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-18 13:12:54 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7a4a5de729 
					 
					
						
						
							
							[Misc] Update outdated note: LMCache now supports chunked prefill ( #16697 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-18 05:12:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c16fb5dae8 
					 
					
						
						
							
							[Doc] Improve help examples for --compilation-config ( #16729 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-17 21:22:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e37073efd7 
					 
					
						
						
							
							Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema ( #16721 )  
						
						... 
						
						
						
						Signed-off-by: Tarun Kumar <takumar@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-17 21:08:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						183dad7a85 
					 
					
						
						
							
							[Attention] Update to lastest FA3 code ( #13111 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-17 15:14:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3408e47159 
					 
					
						
						
							
							[P/D][V1] KV Connector API V1 ( #15960 )  
						
						... 
						
						
						
						Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-04-17 13:22:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0377b8310b 
					 
					
						
						
							
							[MLA] Simplification to batch P/D reordering ( #16673 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-17 16:12:09 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e4755f7fac 
					 
					
						
						
							
							[V1][Metrics] Fix http metrics middleware ( #15894 )  
						
						
						
						
					 
					
						2025-04-17 19:52:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						92edf35826 
					 
					
						
						
							
							[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints ( #16674 )  
						
						
						
						
					 
					
						2025-04-17 11:44:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eb5819b2d9 
					 
					
						
						
							
							[V1][TPU] Enable Top K ( #15489 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com > 
						
						
					 
					
						2025-04-17 18:18:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5989f4684d 
					 
					
						
						
							
							[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even ( #16726 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-17 18:09:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5125d72f02 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small ( #16548 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-17 17:48:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a018e555fd 
					 
					
						
						
							
							[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 ( #16753 )  
						
						... 
						
						
						
						Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com >
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com > 
						
						
					 
					
						2025-04-18 00:01:30 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6211b92273 
					 
					
						
						
							
							[Bugfix]Fix index out of range error in api server log ( #16787 )  
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-04-17 09:01:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						05fcd1b430 
					 
					
						
						
							
							[V1][Perf] Faster incremental detokenization ( #15137 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-17 07:45:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7c02d6a137 
					 
					
						
						
							
							[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion ( #16784 )  
						
						... 
						
						
						
						Signed-off-by: insukim1994 <insu.kim@moreh.io > 
						
						
					 
					
						2025-04-17 14:10:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						11c3b98491 
					 
					
						
						
							
							[Doc] Document Matryoshka Representation Learning support ( #16770 )  
						
						
						
						
					 
					
						2025-04-17 13:37:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dbe7f07001 
					 
					
						
						
							
							[Doc] Make sure to update vLLM when installing latest code ( #16781 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-17 06:53:31 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c69bf4ee06 
					 
					
						
						
							
							fix: hyperlink ( #16778 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-17 11:34:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d27ea94034 
					 
					
						
						
							
							Improve configs - TokenizerPoolConfig + DeviceConfig ( #16603 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-17 11:19:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						99ed526101 
					 
					
						
						
							
							[Misc] refactor examples series - lmcache ( #16758 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-17 11:02:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						207da28186 
					 
					
						
						
							
							[Doc] Fix a 404 link in installation/cpu.md ( #16773 )  
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-17 10:46:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b1aca2ae3 
					 
					
						
						
							
							[Bugfix] Fix GLM4 model ( #16618 )  
						
						... 
						
						
						
						Signed-off-by: intervitens <intervitens@tutanota.com > 
						
						
					 
					
						2025-04-17 03:35:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d8e557b5e5 
					 
					
						
						
							
							[doc] add open-webui example ( #16747 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-17 18:27:32 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61a44a0b22 
					 
					
						
						
							
							[Doc] Add more tips to avoid OOM ( #16765 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-17 09:54:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6481525b8 
					 
					
						
						
							
							[misc] ignore marlin_moe_wna16 local gen codes ( #16760 )  
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-17 17:15:14 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8cac35ba43 
					 
					
						
						
							
							[Ray] Improve documentation on batch inference ( #16609 )  
						
						... 
						
						
						
						Signed-off-by: Richard Liaw <rliaw@berkeley.edu > 
						
						
					 
					
						2025-04-16 22:19:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9dbf7a2dc1 
					 
					
						
						
							
							[V1] Remove log noise when idle ( #16735 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-16 21:34:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						607029e515 
					 
					
						
						
							
							[Bugfix] Revert max_prompt_len validation for decoder-only models. ( #16741 )  
						
						... 
						
						
						
						Signed-off-by: David Heineman <david@davidheineman.com > 
						
						
					 
					
						2025-04-16 21:33:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb072ce93b 
					 
					
						
						
							
							[Bugfix] Update Florence-2 tokenizer to make grounding tasks work ( #16734 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-17 04:17:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						95aca283b4 
					 
					
						
						
							
							[rocm][V0] fix selection logic for custom PA in V0 ( #16426 )  
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com > 
						
						
					 
					
						2025-04-16 19:52:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2b05b8ce69 
					 
					
						
						
							
							[V1][Frontend] Improve Shutdown And Logs ( #11737 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-16 19:48:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c776dcefb 
					 
					
						
						
							
							Adding vllm buildkite job for IBM Power ( #16679 )  
						
						... 
						
						
						
						Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com > 
						
						
					 
					
						2025-04-17 10:47:47 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2cbd4d2999 
					 
					
						
						
							
							[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification ( #16636 )  
						
						... 
						
						
						
						Signed-off-by: Bryan Lu <yuzhelu@amazon.com > 
						
						
					 
					
						2025-04-16 19:47:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3092375e27 
					 
					
						
						
							
							[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] ( #16432 )  
						
						... 
						
						
						
						Signed-off-by: Staszek Pasko <staszek@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-16 19:28:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3cd91dc955 
					 
					
						
						
							
							Help user create custom model for Transformers backend remote code models ( #16719 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-17 01:05:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8a7368e069 
					 
					
						
						
							
							[Misc] Remove redundant comment ( #16703 )  
						
						... 
						
						
						
						Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com > 
						
						
					 
					
						2025-04-17 00:44:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93e561ec4d 
					 
					
						
						
							
							Improve error for structured output backend selection ( #16717 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-17 00:35:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e1b004839a 
					 
					
						
						
							
							[Hardware] Add processor inputs to platform validation ( #16680 )  
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-04-16 09:28:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ee378f3d49 
					 
					
						
						
							
							[Model] support modernbert  ( #16648 )  
						
						... 
						
						
						
						Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com >
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com > 
						
						
					 
					
						2025-04-16 05:30:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e82ee40de3 
					 
					
						
						
							
							[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel ( #16693 )  
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-16 03:31:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						facbe2a114 
					 
					
						
						
							
							[Doc] Improve OOM troubleshooting ( #16704 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-16 18:29:48 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7168920491 
					 
					
						
						
							
							[Misc] refactor examples series ( #16708 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-16 10:16:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						21378a2323 
					 
					
						
						
							
							[CI] Cleanup additional_dependencies: [toml] for pre-commit yapf hook ( #16405 )  
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-04-16 10:05:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						976711d9db 
					 
					
						
						
							
							[V1][Structured Output] Move xgrammar related utils to backend_xgrammar.py ( #16578 )  
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-16 17:01:36 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						44fa4d556c 
					 
					
						
						
							
							[ROCM] Bind triton version to 3.2 in requirements-built.txt  ( #16664 )  
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-04-16 14:05:28 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3ac98edcb1 
					 
					
						
						
							
							[Feature] add model aware kv ops helper ( #16020 )  
						
						... 
						
						
						
						Signed-off-by: billishyahao <bill.he@amd.com > 
						
						
					 
					
						2025-04-15 23:00:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						966c742ed2 
					 
					
						
						
							
							Disable remote caching when calling compile_fx ( #16611 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-15 22:18:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d7d05f4b6 
					 
					
						
						
							
							[Misc] Modify LRUCache touch ( #16689 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-16 04:51:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						96bb8aa68b 
					 
					
						
						
							
							[Bugfix] fix gpu docker image mis benchmarks dir ( #16628 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-15 21:21:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3badb0213b 
					 
					
						
						
							
							[Model] Add PLaMo2 ( #14323 )  
						
						... 
						
						
						
						Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
Signed-off-by: shemmi <shemmi@preferred.jp >
Co-authored-by: Kento Nozawa <nzw0301@preferred.jp >
Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp >
Co-authored-by: Calvin Metzger <metzger@preferred.jp > 
						
						
					 
					
						2025-04-15 19:31:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fdcb850f14 
					 
					
						
						
							
							[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server ( #10546 )  
						
						... 
						
						
						
						Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local >
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local > 
						
						
					 
					
						2025-04-15 22:31:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						54a66e5fee 
					 
					
						
						
							
							[Misc] Update compressed-tensors WNA16 to support zero-points ( #14211 )  
						
						
						
						
					 
					
						2025-04-15 07:33:51 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						280d62b8a2 
					 
					
						
						
							
							[Kernel] Remove redundant Exp calculations ( #16123 )  
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-15 12:58:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1666e66443 
					 
					
						
						
							
							Add "/server_info" endpoint in api_server to retrieve the vllm_config.  ( #16572 )  
						
						... 
						
						
						
						Signed-off-by: Xihui Cang <xihuicang@gmail.com > 
						
						
					 
					
						2025-04-15 11:50:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1575c1701a 
					 
					
						
						
							
							[CI/Build] Fix LoRA OOM ( #16624 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-15 16:38:19 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6ae996a873 
					 
					
						
						
							
							[Misc] refactor argument parsing in examples ( #16635 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-15 08:05:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b590adfdc1 
					 
					
						
						
							
							Fix vLLM x torch.compile config caching ( #16491 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-14 23:11:11 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b4fe16c75b 
					 
					
						
						
							
							Add vllm bench [latency, throughput] CLI commands ( #16508 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-14 23:10:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bc5dd4f669 
					 
					
						
						
							
							[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) ( #16631 )  
						
						... 
						
						
						
						Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io > 
						
						
					 
					
						2025-04-14 23:09:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dbb036cf61 
					 
					
						
						
							
							[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py ( #16623 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-04-15 05:35:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70e7ed841d 
					 
					
						
						
							
							[BugFix]: Update minimum pyzmq version ( #16549 )  
						
						... 
						
						
						
						Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
Co-authored-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-04-14 20:06:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d06ba4ed3f 
					 
					
						
						
							
							[Kernel] moe wna16 marlin kernel ( #14447 )  
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-14 20:05:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6b40996ae8 
					 
					
						
						
							
							[Core][Bugfix] Fix Offline MM Beam Search ( #16390 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-15 10:33:02 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d2020acac7 
					 
					
						
						
							
							config check sleep mode support oot platforms ( #16562 )  
						
						
						
						
					 
					
						2025-04-14 16:31:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1eb3c2ed48 
					 
					
						
						
							
							[DOC][TPU] Add core idea about avoiding recompilation after warmup ( #16614 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-14 21:56:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c64ee87267 
					 
					
						
						
							
							[Hardware][TPU] Add torchvision to tpu dependency file ( #16616 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-04-14 17:50:46 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b1308b84a3 
					 
					
						
						
							
							[Model][VLM] Add Kimi-VL model support ( #16387 )  
						
						... 
						
						
						
						Signed-off-by: courage17340 <courage17340@163.com > 
						
						
					 
					
						2025-04-14 21:41:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7b5ecf79bd 
					 
					
						
						
							
							s390x: Fix PyArrow build and add CPU test script for Buildkite CI ( #16036 )  
						
						... 
						
						
						
						Signed-off-by: Nishan Acharya <Nishan.Acharya@ibm.com > 
						
						
					 
					
						2025-04-14 10:55:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9883a18859 
					 
					
						
						
							
							Fix triton install condition on CPU ( #16600 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-14 17:06:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b3f2fddd17 
					 
					
						
						
							
							[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 ( #16596 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-14 17:01:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa29841ede 
					 
					
						
						
							
							[Bugfix] Multi-modal caches not acting like LRU caches ( #16593 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-14 09:24:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6bf27affb6 
					 
					
						
						
							
							[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet ( #16048 )  
						
						... 
						
						
						
						Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com > 
						
						
					 
					
						2025-04-14 17:08:39 +01:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1dd23386ec 
					 
					
						
						
							
							[Misc] Update usage with mooncake lib for kv transfer ( #16523 )  
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-04-14 11:31:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7cbfc10943 
					 
					
						
						
							
							[Misc] refactor examples ( #16563 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-14 09:59:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce4ddd2d1a 
					 
					
						
						
							
							[Misc] remove warning if triton>=3.2.0 ( #16553 )  
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-14 02:39:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e51929ebca 
					 
					
						
						
							
							Improve configs - SchedulerConfig ( #16533 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-14 17:24:16 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc1b4a6f13 
					 
					
						
						
							
							[Core][V0] Enable regex support with xgrammar ( #13228 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-14 10:13:38 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						63d2705edb 
					 
					
						
						
							
							[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py ( #16556 )  
						
						
						
						
					 
					
						2025-04-13 17:20:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d085a44082 
					 
					
						
						
							
							Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) ( #16537 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-13 14:55:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f49e5aff11 
					 
					
						
						
							
							[V1][Spec Decode] KV cache slots for eagle heads ( #16370 )  
						
						... 
						
						
						
						Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com > 
						
						
					 
					
						2025-04-12 19:42:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6c11ecf8d3 
					 
					
						
						
							
							[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine ( #16529 )  
						
						... 
						
						
						
						Signed-off-by: Ryan McConville <ryan@ryanmcconville.com > 
						
						
					 
					
						2025-04-12 20:19:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93e5f3c5fb 
					 
					
						
						
							
							[Perf] Optimize Preparing Inputs for GPU Model Runner ( #16484 )  
						
						... 
						
						
						
						Signed-off-by: snowcharm <snowcharmqq@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-12 22:54:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70363bccfa 
					 
					
						
						
							
							Fix syntaxWarning: invalid escape sequence '\s' ( #16532 )  
						
						... 
						
						
						
						Signed-off-by: Jie Fu <jiefu@tencent.com > 
						
						
					 
					
						2025-04-12 14:39:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3cdc57669f 
					 
					
						
						
							
							[Misc] Delete redundant code ( #16530 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-04-12 11:21:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						68bb122eb4 
					 
					
						
						
							
							[MISC] Make GroupCoordinator compatible with out-of-tree devices ( #16464 )  
						
						... 
						
						
						
						Signed-off-by: hzji210@gmail.com  <hzji210@gmail.com > 
						
						
					 
					
						2025-04-12 09:20:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d9fc8cd9da 
					 
					
						
						
							
							[V1] Enable multi-input by default ( #15799 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-12 08:52:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f069f3ea74 
					 
					
						
						
							
							[Misc] Openai transcription client example use same Whisper model ( #16487 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-12 07:27:03 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c5bc0e7fcc 
					 
					
						
						
							
							[Misc] Update chat utils tests ( #16520 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-12 06:48:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4a3a518722 
					 
					
						
						
							
							fix: spelling ( #16466 )  
						
						... 
						
						
						
						Signed-off-by: Tianer Zhou <ezhoureal@gmail.com > 
						
						
					 
					
						2025-04-11 23:24:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fbf722c6e6 
					 
					
						
						
							
							[Frontend] support matryoshka representation / support embedding API dimensions ( #16331 )  
						
						
						
						
					 
					
						2025-04-11 23:23:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e92d7085bf 
					 
					
						
						
							
							[Feature][V1] Add xgrammar to support minLength, maxLength with test ( #16516 )  
						
						... 
						
						
						
						Signed-off-by: Leon Seidel <leon.seidel@fau.de > 
						
						
					 
					
						2025-04-11 23:22:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bd6028d6b0 
					 
					
						
						
							
							Optimized topk for topk=1 (Llama-4) ( #16512 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-12 14:21:08 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						802329dee9 
					 
					
						
						
							
							[Doc] Update Llama4 Model Names in Supported Models ( #16509 )  
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-04-12 02:53:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						41cc883c29 
					 
					
						
						
							
							[BugFix] Handle non-contiguous tensors properly when serializing ( #16492 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-11 17:54:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						57504a4bcf 
					 
					
						
						
							
							[CI][Bugfix] Add mistral_tool_use to Ci ( #16517 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-11 17:52:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed4792c990 
					 
					
						
						
							
							[Doc] Fix link to vLLM blog ( #16519 )  
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-04-11 17:39:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						87b836ba77 
					 
					
						
						
							
							Bugfix for PixtralHF models without spatial_merge_size ( #16513 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-11 23:32:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						56c76c2e0e 
					 
					
						
						
							
							[Bugfix] clean up duplicated code ( #16485 )  
						
						... 
						
						
						
						Signed-off-by: Gogs <gogs@fake.local >
Co-authored-by: Gogs <gogs@fake.local > 
						
						
					 
					
						2025-04-11 23:19:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c09632a66c 
					 
					
						
						
							
							Update openai_compatible_server.md ( #16507 )  
						
						... 
						
						
						
						Signed-off-by: Christian Sears <csears@redhat.com > 
						
						
					 
					
						2025-04-11 22:54:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a3bf8d4a2b 
					 
					
						
						
							
							[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100  ( #16488 )  
						
						
						
						
					 
					
						2025-04-12 06:26:55 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						16eda8c43a 
					 
					
						
						
							
							[Frontend] Added chat templates for LLaMa4 pythonic tool calling ( #16463 )  
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Kai Wu <kaiwu@meta.com > 
						
						
					 
					
						2025-04-12 06:26:17 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cd77382ac1 
					 
					
						
						
							
							Improve configs - LoadConfig ( #16422 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-11 20:27:27 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						71b9cde010 
					 
					
						
						
							
							[Bugfix] handle alignment of encoder_seq_lens in mllama.py ( #14784 )  
						
						... 
						
						
						
						Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com > 
						
						
					 
					
						2025-04-11 19:59:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5285589f37 
					 
					
						
						
							
							[Doc] Document InternVL3 support ( #16495 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-11 19:41:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f41647ee6b 
					 
					
						
						
							
							[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel ( #16366 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-11 17:54:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4d022cbc75 
					 
					
						
						
							
							[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models ( #16483 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-11 17:06:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70de35a881 
					 
					
						
						
							
							Fix erroneous "model doesn't support compile" warning ( #16486 )  
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-11 16:24:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						34b2cf3b33 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU ( #12779 )  
						
						... 
						
						
						
						Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com > 
						
						
					 
					
						2025-04-11 07:38:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9e90c9f73f 
					 
					
						
						
							
							[Bugfix] Fix bugs of running Quark quantized models ( #16236 )  
						
						... 
						
						
						
						Signed-off-by: chaow <chaow@amd.com > 
						
						
					 
					
						2025-04-11 10:18:32 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e9528f6dc6 
					 
					
						
						
							
							[Kernel] support merge_attn_states CUDA kernel, 3x speedup ( #16173 )  
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-11 06:50:50 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						51baa9c333 
					 
					
						
						
							
							Don't install triton on ppc64le platform ( #16470 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-11 10:11:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						35e076b3a8 
					 
					
						
						
							
							[Misc] update api_client example ( #16459 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-11 10:05:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a26f59ccbc 
					 
					
						
						
							
							[Misc] Raise error for V1 not supporting Long LoRA. ( #16415 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-11 01:51:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa3b3d76e0 
					 
					
						
						
							
							Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True ( #16447 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-11 08:09:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f7030df3be 
					 
					
						
						
							
							[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner ( #15990 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-11 15:32:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						905e91e9ac 
					 
					
						
						
							
							Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" ( #16453 )  
						
						
						
						
					 
					
						2025-04-11 06:44:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f8f9c0ba62 
					 
					
						
						
							
							[Bugfix] Don't set an upper bound on repetition penalty ( #16403 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-11 14:19:40 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dda811021a 
					 
					
						
						
							
							[CPU][Bugfix] Fix CPU docker issues ( #16454 )  
						
						... 
						
						
						
						Signed-off-by: jiang.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-04-11 14:19:07 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93195146ea 
					 
					
						
						
							
							[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test ( #16424 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-11 04:57:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed37599544 
					 
					
						
						
							
							Update supported_hardware.md for TPU INT8 ( #16437 )  
						
						
						
						
					 
					
						2025-04-11 12:28:07 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						99ef59cf7f 
					 
					
						
						
							
							[Llama4] Enable attention temperature tuning by default for long context (>32k) ( #16439 )  
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-04-10 21:26:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d544d141ec 
					 
					
						
						
							
							update benchmark_serving_structured_output to include auto backend ( #16438 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-04-11 12:25:52 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e397a9484 
					 
					
						
						
							
							check input length of sonnet samples ( #16423 )  
						
						... 
						
						
						
						Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com > 
						
						
					 
					
						2025-04-11 10:15:06 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						268c325078 
					 
					
						
						
							
							Fix range_ratio Bug in RandomDataset ( #16126 )  
						
						... 
						
						
						
						Signed-off-by: jadewang21 <jadewangcn@outlook.com > 
						
						
					 
					
						2025-04-10 15:31:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3cc9af88ff 
					 
					
						
						
							
							[TPU][V1] Disable per-request seed/Generator ( #16172 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-10 17:05:44 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7cd0bd7212 
					 
					
						
						
							
							[Bugfix] Fix output token length check logic ( #16419 )  
						
						... 
						
						
						
						Signed-off-by: look <eeslook@163.com > 
						
						
					 
					
						2025-04-10 20:16:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						56d4aefa33 
					 
					
						
						
							
							[VLM] Avoid unnecessary dummy multimodal data during processing ( #16416 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-10 19:32:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dd143ef541 
					 
					
						
						
							
							[V1] Zero-copy tensor/ndarray serialization/transmission ( #13790 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-10 19:23:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						daefed052c 
					 
					
						
						
							
							[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B ( #15423 )  
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com > 
						
						
					 
					
						2025-04-10 19:07:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5fbab20e02 
					 
					
						
						
							
							[Bugfix] Fix bug when dataset is json ( #15899 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-04-10 18:35:41 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e8224f3dca 
					 
					
						
						
							
							[V1][Spec Decode] Eagle Model loading ( #16035 )  
						
						... 
						
						
						
						Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com > 
						
						
					 
					
						2025-04-10 11:21:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9665313c39 
					 
					
						
						
							
							[V1] Set structured output backend to auto by default ( #15724 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-10 17:53:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0c54fc7273 
					 
					
						
						
							
							Improve configs - ParallelConfig ( #16332 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-10 17:34:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c1b57855ec 
					 
					
						
						
							
							[TPU][V1] Use language_model interface for getting text backbone in MM ( #16410 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-10 17:32:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						83b824c8b4 
					 
					
						
						
							
							[VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item ( #16408 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-10 09:06:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7678fcd5b6 
					 
					
						
						
							
							Fix the torch version parsing logic ( #15857 )  
						
						
						
						
					 
					
						2025-04-10 07:37:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8661c0241d 
					 
					
						
						
							
							[CI] Add auto update workflow for Dockerfile graph ( #11879 )  
						
						... 
						
						
						
						Signed-off-by: wineandchord <guoqizhou19@gmail.com > 
						
						
					 
					
						2025-04-10 13:43:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce8d6b75fc 
					 
					
						
						
							
							[doc] update the wrong link ( #16401 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-10 21:02:37 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						61de3ef74b 
					 
					
						
						
							
							[Model] Remove image mm limit for LLaMa4  ( #16365 )  
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-04-10 09:36:27 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec1f9c8c91 
					 
					
						
						
							
							Update Numba to 0.61.2 ( #16376 )  
						
						... 
						
						
						
						Signed-off-by: cyy <cyyever@outlook.com > 
						
						
					 
					
						2025-04-10 07:59:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						65e09094c4 
					 
					
						
						
							
							[doc] add download model tips ( #16389 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-10 07:45:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c70cf0fe06 
					 
					
						
						
							
							[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models ( #16038 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-10 15:08:47 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a5d11a54dc 
					 
					
						
						
							
							[Bugfix] Fix validation error for text-only Mllama 3.2 ( #16377 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-10 14:19:42 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3d4c87758e 
					 
					
						
						
							
							[Misc] Update transformers version limits of multi-modal tests ( #16381 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-09 23:03:33 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a9bd832fc5 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for deepseek_v2, internlm2 ( #16383 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Ang <aaron.angyd@gmail.com > 
						
						
					 
					
						2025-04-09 23:01:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						417bcefbae 
					 
					
						
						
							
							fix sonnet dataset sample when prefix len is very small ( #16379 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-04-10 05:35:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						baada0e737 
					 
					
						
						
							
							[Bugfix][TPU] Fix TPU validate_request ( #16369 )  
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-10 12:55:12 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						82eb61dd4c 
					 
					
						
						
							
							[misc] use tqdm.auto where appropriate ( #16290 )  
						
						... 
						
						
						
						Signed-off-by: Benjamin Kitor <bkitor@gigaio.com > 
						
						
					 
					
						2025-04-09 21:54:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0d4d06fe2f 
					 
					
						
						
							
							[CI][Bugfix] Pin triton version for CPU ( #16384 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-10 04:35:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4aed0ca6a2 
					 
					
						
						
							
							[bugfix] Avoid the time consumption caused by creating dummy videos. ( #16371 )  
						
						
						
						
					 
					
						2025-04-10 04:30:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1621b25288 
					 
					
						
						
							
							[TPU] Fix dummy loading OOM ( #16372 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-10 04:06:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a564797151 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral ( #16325 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Ang <aaron.angyd@gmail.com > 
						
						
					 
					
						2025-04-09 20:07:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1da6a09274 
					 
					
						
						
							
							[Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer ( #14094 )  
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-09 19:43:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1e44ffc3ff 
					 
					
						
						
							
							Add GLM-4-0414 support ( #16338 )  
						
						... 
						
						
						
						Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: yihong <zouzou0208@gmail.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-10 09:19:42 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a454748544 
					 
					
						
						
							
							[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues ( #16275 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-09 18:51:51 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1bff42c4b7 
					 
					
						
						
							
							[Misc] refactor Structured Outputs example ( #16322 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-09 23:32:42 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb391d85dc 
					 
					
						
						
							
							[Hardware] add platform-specific request validation api ( #16291 )  
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-04-09 12:50:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fee5b8d37f 
					 
					
						
						
							
							[Build/CI] Add tracing deps to vllm container image ( #15224 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-09 19:14:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b2ce859bd2 
					 
					
						
						
							
							Fix benchmark_throughput.py --backend=hf ( #16352 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-09 19:09:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						566f10a929 
					 
					
						
						
							
							[CI]Fix hpu docker and numpy version for CI ( #16355 )  
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-04-09 17:52:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c3b5189137 
					 
					
						
						
							
							[Bugfix] catch AssertionError in MistralTokenizer as ValueError ( #16344 )  
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-09 17:33:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a25866ac8d 
					 
					
						
						
							
							[Bugfix] Fix profiling.py ( #16202 )  
						
						... 
						
						
						
						Signed-off-by: zh Wang <rekind133@outlook.com > 
						
						
					 
					
						2025-04-09 17:03:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						098900d7c2 
					 
					
						
						
							
							Revert "Update label-tpu mergify and remove removal bot" ( #16350 )  
						
						
						
						
					 
					
						2025-04-09 07:59:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						98d01d3ce2 
					 
					
						
						
							
							[Bugfix][Frontend] respect provided default guided decoding backend ( #15476 )  
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-09 05:11:10 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d55244df31 
					 
					
						
						
							
							[Model] Add SupportsMultiModal.get_language_model interface ( #16007 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-09 04:12:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						04149cce27 
					 
					
						
						
							
							[BugFix] fix some typos found by typos. ( #16314 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-09 03:43:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						24834f4894 
					 
					
						
						
							
							update neuron config ( #16289 )  
						
						... 
						
						
						
						Signed-off-by: Ajay Vohra <ajayvohr@amazon.com > 
						
						
					 
					
						2025-04-09 03:43:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ec7da6fcf3 
					 
					
						
						
							
							[BugFix] llama4 qknorm should be not shared across head ( #16311 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-04-09 00:59:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						819d548e8a 
					 
					
						
						
							
							[BugFix] logger is not callable ( #16312 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-09 00:59:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						477d2a8aa2 
					 
					
						
						
							
							Update label-tpu mergify and remove removal bot ( #16298 )  
						
						
						
						
					 
					
						2025-04-09 07:56:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e484e02857 
					 
					
						
						
							
							[Bugfix] Avoid transferring cached multi-modal items from P0 to P1 ( #16273 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-09 00:51:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						24f6b9a713 
					 
					
						
						
							
							[Misc] Fix test_sharded_state_loader.py( #16004 ) ( #16005 )  
						
						... 
						
						
						
						Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com > 
						
						
					 
					
						2025-04-09 14:47:30 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9cdde47289 
					 
					
						
						
							
							[BugFix] Fix fusion test and add them to CI ( #16287 )  
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-04-08 23:46:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b1eb4ca152 
					 
					
						
						
							
							[TPU] Update PyTorch/XLA ( #16288 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-09 14:46:32 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						87b4ac56c2 
					 
					
						
						
							
							[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding ( #16221 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-09 04:14:46 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cb84e45ac7 
					 
					
						
						
							
							[Core] Upgrade to xgrammar 0.1.18, add cache size limit ( #16283 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-08 19:13:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4716377fbc 
					 
					
						
						
							
							[Feature] Estimate max-model-len use available KV cache memory ( #16168 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-08 19:12:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e9cf8c1dd 
					 
					
						
						
							
							[Bugfix] fix gettid method is not define ( #16084 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-08 19:12:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2976dc27e9 
					 
					
						
						
							
							[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs ( #16198 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com > 
						
						
					 
					
						2025-04-08 19:12:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						102bf967f0 
					 
					
						
						
							
							[Model] Add smolvlm support ( #16017 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-08 19:12:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1f4b09b525 
					 
					
						
						
							
							Add support to modelopt quantization of Mixtral model ( #15961 )  
						
						... 
						
						
						
						Signed-off-by: Yue <yueshen@nvidia.com > 
						
						
					 
					
						2025-04-09 01:53:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						86c3369eb8 
					 
					
						
						
							
							[CI/Build] Fix CI LoRA failure ( #16270 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-09 09:13:56 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2755c34a8f 
					 
					
						
						
							
							[V1] Update structured output offline inference example ( #15721 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-08 22:34:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						db10422184 
					 
					
						
						
							
							[Bugfix] fix deepseek fp16 scale bug ( #14809 )  
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-08 16:56:09 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e1a2c699dd 
					 
					
						
						
							
							[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context ( #16209 )  
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-08 18:56:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0115ccd5c0 
					 
					
						
						
							
							Add warning that content below line in template will be removed ( #16276 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-08 18:18:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						40b4284fe3 
					 
					
						
						
							
							[Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear ( #15328 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-08 10:02:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ebc0b9640 
					 
					
						
						
							
							[Bugfix] Proper input validation for multi-modal encoder-decoder models ( #16156 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-08 09:45:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc96fd54c6 
					 
					
						
						
							
							[Misc] Avoid stripping meaningful whitespace from nvidia-smi topo -m output in collect_env.py ( #16272 )  
						
						... 
						
						
						
						Signed-off-by: imkero <kerorek@outlook.com > 
						
						
					 
					
						2025-04-08 16:08:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1f5d13ab9f 
					 
					
						
						
							
							[New Model]: jinaai/jina-embeddings-v3 ( #16120 )  
						
						
						
						
					 
					
						2025-04-08 08:39:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90cb44eb02 
					 
					
						
						
							
							Update to transformers==4.51.1 ( #16257 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-08 06:53:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e11880deea 
					 
					
						
						
							
							[Bugfix] Remove triton do_bench fast_flush arg ( #16256 )  
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-04-08 13:51:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9351f91be9 
					 
					
						
						
							
							[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm ( #16247 )  
						
						... 
						
						
						
						Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com > 
						
						
					 
					
						2025-04-08 05:10:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5a1e1c8353 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe ( #16203 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-08 04:05:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						69ecaa7c79 
					 
					
						
						
							
							[Misc] Add warning for multimodal data in LLM.beam_search ( #16241 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-04-08 04:05:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7f00899ff7 
					 
					
						
						
							
							[Misc] format and refactor some examples ( #16252 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-08 10:42:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						995e3d1f41 
					 
					
						
						
							
							[Docs] Add Slides from Singapore Meetup ( #16213 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-08 07:20:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b4ac449a83 
					 
					
						
						
							
							[Misc] Merge the logs of pp layers partitions ( #16225 )  
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-04-08 00:18:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8e5314a468 
					 
					
						
						
							
							[V1] Add disable_chunked_mm_input arg to disable partial mm input prefill ( #15837 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-07 23:24:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						87918e40c4 
					 
					
						
						
							
							[torch.compile][TPU] Make @support_torch_compile work for XLA backend ( #15782 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-08 14:23:53 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f6b32efb7f 
					 
					
						
						
							
							[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version ( #16194 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-08 13:38:13 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b99733d092 
					 
					
						
						
							
							[Bugfix] Do not skip "empty" parts of chats that are parsable ( #16219 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-08 05:14:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						05a015d6a5 
					 
					
						
						
							
							Add warning for Attention backends that do not support irope yet ( #16212 )  
						
						
						
						
					 
					
						2025-04-08 03:59:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ad971af8c7 
					 
					
						
						
							
							[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 ( #16161 )  
						
						
						
						
					 
					
						2025-04-07 20:48:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f2ebb6f541 
					 
					
						
						
							
							[V1] Scatter and gather placeholders in the model runner ( #16076 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com > 
						
						
					 
					
						2025-04-08 10:43:41 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1d01211264 
					 
					
						
						
							
							Update BASE_IMAGE to 2.22 release of Neuron ( #16218 )  
						
						
						
						
					 
					
						2025-04-07 19:11:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f94ab12f79 
					 
					
						
						
							
							[Misc] Update compressed-tensors to version 0.9.3 ( #16196 )  
						
						... 
						
						
						
						Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com > 
						
						
					 
					
						2025-04-07 19:09:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a865bc1ca6 
					 
					
						
						
							
							[core] do not send error across process ( #16174 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-07 19:09:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						21802c4b6d 
					 
					
						
						
							
							[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping ( #16031 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-04-07 21:28:14 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						652907b354 
					 
					
						
						
							
							Torchao ( #14231 )  
						
						... 
						
						
						
						Signed-off-by: drisspg <drisspguessous@gmail.com > 
						
						
					 
					
						2025-04-07 19:39:28 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						24f1c01e0f 
					 
					
						
						
							
							[Bugfix][V0] XGrammar structured output supports Enum ( #15878 )  
						
						... 
						
						
						
						Signed-off-by: Leon Seidel <leon.seidel@fau.de > 
						
						
					 
					
						2025-04-07 22:38:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fad6e2538e 
					 
					
						
						
							
							[Misc] add description attribute in CLI ( #15921 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-07 22:30:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7f6d47c1a2 
					 
					
						
						
							
							[V1][BugFix] Exit properly if engine core fails during startup ( #16137 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-07 15:30:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3147586ebd 
					 
					
						
						
							
							[Bugfix] Fix guidance backend for Qwen models ( #16210 )  
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-04-07 22:15:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ed636d99ca 
					 
					
						
						
							
							[Misc] Move Llama 4 projector call into encoder execution ( #16201 )  
						
						
						
						
					 
					
						2025-04-07 14:02:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						090c856d76 
					 
					
						
						
							
							[Misc] Human-readable max-model-len cli arg ( #16181 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 14:40:58 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ad434d4cfe 
					 
					
						
						
							
							Print the warning only once ( #16193 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-07 18:30:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						66d433b94f 
					 
					
						
						
							
							[V1] Revert the default max_num_seqs to V0 values for most hardware ( #16158 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 13:54:36 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						027b204ff1 
					 
					
						
						
							
							[Bugfix] Re-enable support for ChatGLMForConditionalGeneration ( #16187 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 23:15:58 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55dcce91df 
					 
					
						
						
							
							Upstream Llama4 Support to Main ( #16113 )  
						
						... 
						
						
						
						Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com >
Signed-off-by: Chris Thi <chris.c.thi@gmail.com >
Signed-off-by: drisspg <drisspguessous@gmail.com >
Signed-off-by: Jon Swenson <jmswen@gmail.com >
Signed-off-by: Keyun Tong <tongkeyun@gmail.com >
Signed-off-by: Lu Fang <fanglu@meta.com >
Signed-off-by: Xiaodong Wang <xdwang@meta.com >
Signed-off-by: Yang Chen <yangche@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Lu Fang <lufang@fb.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Lucia Fang <fanglu@fb.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 08:06:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8017c8db7f 
					 
					
						
						
							
							[Doc]Update image to latest version ( #16186 )  
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-04-07 14:17:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dc3529dbf6 
					 
					
						
						
							
							[Misc] improve example mlpspeculator and llm_engine_example ( #16175 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-07 11:53:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7699258ef0 
					 
					
						
						
							
							[Model] Add Qwen3 and Qwen3MoE ( #15289 )  
						
						... 
						
						
						
						Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-07 04:06:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e9ba99f296 
					 
					
						
						
							
							[V1][Structured Output] Add supports_structured_output() method to Platform ( #16148 )  
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-07 11:06:24 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7c80368710 
					 
					
						
						
							
							[VLM] Florence-2 supports online serving ( #16164 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-07 04:04:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						95d63f38c0 
					 
					
						
						
							
							doc: fix some typos in doc ( #16154 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-07 05:32:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bb8dab821e 
					 
					
						
						
							
							[CI] Set max transformers version for Ultravox model test  ( #16149 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-07 04:37:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fc0f87768a 
					 
					
						
						
							
							[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings ( #16129 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-07 04:07:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0a57386721 
					 
					
						
						
							
							[Misc] Update Mistral-3.1 example ( #16147 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 03:57:37 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3749e28774 
					 
					
						
						
							
							[V1][Minor] Minor simplification for get_computed_blocks  ( #16139 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-06 20:38:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						86fc2321ff 
					 
					
						
						
							
							[Metrics] Add bucket for request_latency, time_to_first_token and time_per_output_token ( #15202 )  
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-04-06 20:34:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2549c0dfef 
					 
					
						
						
							
							Fix requires-python ( #16132 )  
						
						
						
						
					 
					
						2025-04-06 19:22:25 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b10e519895 
					 
					
						
						
							
							[V1][Minor] Optimize get_cached_block ( #16135 )  
						
						
						
						
					 
					
						2025-04-06 20:48:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9bde5ba127 
					 
					
						
						
							
							[TPU] Update PyTorch/XLA ( #16130 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-06 18:25:55 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						72c8f1ad04 
					 
					
						
						
							
							[Misc] update requires-python in pyproject.toml ( #16116 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-06 14:56:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						da224daaa9 
					 
					
						
						
							
							[Bugfix] add hf_token to EngineArgs ( #16093 )  
						
						... 
						
						
						
						Signed-off-by: paolovic <paul-philipp.luley@uzh.ch >
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch > 
						
						
					 
					
						2025-04-06 14:47:33 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a100b9278 
					 
					
						
						
							
							[Bugfix] LoRA : Fix the order in which the kernels process LoRAs  ( #16040 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-04-06 14:04:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						242a637aea 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 ( #16103 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-06 05:52:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c2a9671510 
					 
					
						
						
							
							[Misc] Improve model redirect to accept json dictionary ( #16119 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-06 05:51:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d5ae4f7f42 
					 
					
						
						
							
							[Doc][Bugfix] Add missing EOF in k8s deploy doc ( #16025 )  
						
						
						
						
					 
					
						2025-04-06 12:10:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6c502a150 
					 
					
						
						
							
							[Misc] refactor example eagle ( #16100 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-06 09:42:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ca710e525 
					 
					
						
						
							
							[CI][V1] Fix passing tokenizer as kwarg to validate_guidance_grammar ( #16117 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-06 16:18:00 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						eb07c8cb5b 
					 
					
						
						
							
							[Frontend] Fix typo in tool chat templates for llama3.2 and toolace ( #14501 )  
						
						... 
						
						
						
						Signed-off-by: Ben Jackson <ben@ben.com > 
						
						
					 
					
						2025-04-06 07:44:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ba10801961 
					 
					
						
						
							
							[Benchmark] Add sampling parameters to benchmark_serving. ( #16022 )  
						
						... 
						
						
						
						Signed-off-by: Hyesoo Yang <hyeygit@gmail.com > 
						
						
					 
					
						2025-04-06 12:30:35 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						620fc2d09e 
					 
					
						
						
							
							[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 ( #16112 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-04-05 21:23:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						29283eaa7e 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for phi, gemma, deepseek ( #16088 )  
						
						... 
						
						
						
						Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com > 
						
						
					 
					
						2025-04-05 20:34:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2fa66ef713 
					 
					
						
						
							
							[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine ( #15946 )  
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-04-05 20:04:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						13affc432d 
					 
					
						
						
							
							[Misc] Remove redundant code ( #16098 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-05 20:03:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d8f094a92a 
					 
					
						
						
							
							[Misc] format output for encoder_decoder.py ( #16095 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-05 19:57:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						97ae6d777f 
					 
					
						
						
							
							Fix some capitalisations in generated examples doc titles ( #16094 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-05 13:44:03 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6baeee70d1 
					 
					
						
						
							
							Revert "doc: add info for macos clang errors ( #16049 )" ( #16091 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-05 11:51:51 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d2517a4939 
					 
					
						
						
							
							[doc] fix 404 ( #16082 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-05 11:39:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6342adc438 
					 
					
						
						
							
							fix: support clang17 for macos and fix the real libomp ( #16086 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-05 11:00:12 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0adba91547 
					 
					
						
						
							
							[CI] Fix benchmark script level ( #16089 )  
						
						
						
						
					 
					
						2025-04-05 03:36:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4285e423a6 
					 
					
						
						
							
							[Misc] Auto detect bitsandbytes pre-quantized models ( #16027 )  
						
						... 
						
						
						
						Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com > 
						
						
					 
					
						2025-04-04 23:30:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						63375f0cdb 
					 
					
						
						
							
							[V1][Spec Decode] Update N-gram Proposer Interface ( #15750 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-04 16:32:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70ad3f9e98 
					 
					
						
						
							
							[Bugfix][TPU] Fix V1 TPU worker for sliding window ( #16059 )  
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-04 23:31:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d6fc629f4d 
					 
					
						
						
							
							[Kernel][Minor] Re-fuse triton moe weight application ( #16071 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-04-04 23:27:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						af51d80fa1 
					 
					
						
						
							
							Revert "[V1] Scatter and gather placeholders in the model runner" ( #16075 )  
						
						
						
						
					 
					
						2025-04-04 14:50:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f5722a5052 
					 
					
						
						
							
							[V1] Scatter and gather placeholders in the model runner ( #15712 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-04 21:26:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						651cf0fec1 
					 
					
						
						
							
							[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue ( #15906 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-04 12:56:43 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4dc52e1c53 
					 
					
						
						
							
							[CI] Reorganize .buildkite directory ( #16001 )  
						
						... 
						
						
						
						Signed-off-by: kevin <kevin@anyscale.com > 
						
						
					 
					
						2025-04-04 12:16:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4708f13a9c 
					 
					
						
						
							
							[Bugfix] Fix default behavior/fallback for pp in v1 ( #16057 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-04 17:58:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a6d042df0a 
					 
					
						
						
							
							[ROCm][Bugfix] Bring back fallback to eager mode removed in  #14917 , but for ROCm only ( #15413 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-04 09:40:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						40a36ccfeb 
					 
					
						
						
							
							[ROCm][Bugfix] Use platform specific FP8 dtype ( #15717 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-04 09:40:20 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ef608c37a7 
					 
					
						
						
							
							[Distributed] [ROCM] Fix custom allreduce enable checks ( #16010 )  
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-04-04 09:39:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2386803f2a 
					 
					
						
						
							
							[CPU] Change default block_size for CPU backend ( #16002 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-04-04 09:39:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						95862f7b4d 
					 
					
						
						
							
							[Benchmark][Doc] Update throughput benchmark and README ( #15998 )  
						
						... 
						
						
						
						Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-04 09:39:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						230b131b54 
					 
					
						
						
							
							[Bugfix][kernels] Fix half2float conversion in gguf kernels ( #15995 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-04 09:38:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0812d8dd41 
					 
					
						
						
							
							[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe ( #15945 )  
						
						... 
						
						
						
						Signed-off-by: zhenwei <zhenweiliu@habana.ai > 
						
						
					 
					
						2025-04-04 09:38:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bf7e3c51ae 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt ( #15939 )  
						
						... 
						
						
						
						Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com > 
						
						
					 
					
						2025-04-04 09:38:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a35a8a8392 
					 
					
						
						
							
							[V1][Spec Decode] Avoid logging useless nan metrics ( #16023 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-04-04 08:52:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ef0bb1fcf 
					 
					
						
						
							
							doc: add info for macos clang errors ( #16049 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-04 14:58:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fadc59c0e6 
					 
					
						
						
							
							[TPU][V1] Remove ragged attention kernel parameter hard coding ( #16041 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-04 07:48:50 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						86cbd2eee9 
					 
					
						
						
							
							[Misc] improve gguf check ( #15974 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-04 01:33:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						092475f738 
					 
					
						
						
							
							[ROCm] Tweak the benchmark script to run on ROCm ( #14252 )  
						
						
						
						
					 
					
						2025-04-03 17:12:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dcc56d62da 
					 
					
						
						
							
							[Bugfix] Fix function names in test_block_fp8.py ( #16033 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-04-03 23:01:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f15e70d906 
					 
					
						
						
							
							[TPU] Switch Test to Non-Sliding Window ( #15981 )  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-04-03 14:28:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b6be6f8d1e 
					 
					
						
						
							
							[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. ( #15732 )  
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-04-03 14:23:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						03a70eacaf 
					 
					
						
						
							
							Re-enable the AMD Testing for the passing tests. ( #15586 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-04-03 11:05:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						45b1ff7a25 
					 
					
						
						
							
							[Misc][Performance] Advance tpu.txt to the most recent nightly torch … ( #16024 )  
						
						
						
						
					 
					
						2025-04-03 17:32:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						15ba07ef25 
					 
					
						
						
							
							[Minor] Fused experts refactor ( #15914 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-04-03 10:19:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d2b58ca203 
					 
					
						
						
							
							[Neuron][kernel] Fuse kv cache into a single tensor ( #15911 )  
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com > 
						
						
					 
					
						2025-04-03 09:51:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						82e7e19a6e 
					 
					
						
						
							
							[SupportsQuant] Chameleon, Chatglm, Commandr ( #15952 )  
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-04-03 08:25:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						421c462948 
					 
					
						
						
							
							[SupportsQuant] Bert, Blip, Blip2, Bloom ( #15573 )  
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-04-03 08:23:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						84884cd9ac 
					 
					
						
						
							
							fix: tiny fix make format.sh excutable ( #16015 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-03 15:18:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a43aa183dc 
					 
					
						
						
							
							[doc] update contribution link ( #15922 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-03 10:47:31 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						463bbb1835 
					 
					
						
						
							
							[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process ( #15367 )  
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-04-03 07:32:10 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5e125e74d1 
					 
					
						
						
							
							[misc] improve error message for "Failed to infer device type" ( #15994 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-03 14:45:03 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						06f21ce7a5 
					 
					
						
						
							
							[Benchmark] Add AIMO Dataset to Benchmark ( #15955 )  
						
						... 
						
						
						
						Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com >
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com > 
						
						
					 
					
						2025-04-03 06:09:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						57a810db9c 
					 
					
						
						
							
							[ROCM][V0] PA kennel selection when no sliding window provided ( #15982 )  
						
						... 
						
						
						
						Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com > 
						
						
					 
					
						2025-04-03 05:28:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8b664706aa 
					 
					
						
						
							
							[bugfix] add seed in torchrun_example.py ( #15980 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-03 12:25:01 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						37bfee92bf 
					 
					
						
						
							
							fix: better error message for get_config  close   #13889  ( #15943 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-03 03:53:19 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e73ff24e31 
					 
					
						
						
							
							[ROCM][KERNEL] Paged attention for V1 ( #15720 )  
						
						... 
						
						
						
						Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com > 
						
						
					 
					
						2025-04-02 19:48:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bd7599d34a 
					 
					
						
						
							
							[V1][TPU] Do not compile sampling more than needed ( #15883 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-03 01:36:01 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						01b6113659 
					 
					
						
						
							
							[TPU] optimize the all-reduce performance ( #15903 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-03 00:25:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1b84eff03a 
					 
					
						
						
							
							[V1][TPU] TPU-optimized top-p implementation (avoids scattering). ( #15736 )  
						
						... 
						
						
						
						Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b .c.tpu-prod-env-large-adhoc.internal> 
						
						
					 
					
						2025-04-02 17:18:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						55acf86bf8 
					 
					
						
						
							
							Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] ( #15969 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-02 23:37:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f021b97993 
					 
					
						
						
							
							[V1] Support Mistral3 in V1 ( #15950 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-02 15:36:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1cab43c2d2 
					 
					
						
						
							
							[misc] instruct pytorch to use nvml-based cuda check ( #15951 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-03 01:02:58 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8bd651b318 
					 
					
						
						
							
							Restricted cmake to be less than version 4 as 4.x breaks the build of… ( #15859 )  
						
						... 
						
						
						
						Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com > 
						
						
					 
					
						2025-04-02 16:19:39 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						58e234a754 
					 
					
						
						
							
							[Misc] V1 LoRA support CPU offload ( #15843 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-02 23:04:43 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e86c414d6a 
					 
					
						
						
							
							[Model] use AutoWeightsLoader in model load_weights ( #15770 )  
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-02 07:47:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						550b2801ad 
					 
					
						
						
							
							[CPU][Bugfix] Using custom allreduce for CPU backend ( #15934 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-04-02 07:46:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cefb9e5a28 
					 
					
						
						
							
							[Frontend] Implement Tool Calling with tool_choice='required' ( #13483 )  
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com >
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at >
Co-authored-by: Liangfu Chen <liangfc@amazon.com >
Co-authored-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-04-02 07:45:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						98d7367b61 
					 
					
						
						
							
							[Metrics] Hide deprecated metrics ( #15458 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-04-02 07:37:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						594a8b9030 
					 
					
						
						
							
							[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. ( #15938 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-02 06:33:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						44f990515b 
					 
					
						
						
							
							[CI] Remove duplicate entrypoints-test ( #15940 )  
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-04-02 02:44:01 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						252937806c 
					 
					
						
						
							
							[Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key ( #15926 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-04-02 02:19:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						51826d51fa 
					 
					
						
						
							
							Add minimum version for huggingface_hub to enable Xet downloads ( #15873 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-02 02:03:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						14e53ed11f 
					 
					
						
						
							
							[V1] Fix json_object support with xgrammar ( #15488 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-02 02:00:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ddb94c2605 
					 
					
						
						
							
							[core] Add tags parameter to wake_up() ( #15500 )  
						
						... 
						
						
						
						Signed-off-by: Eric <erictang000@gmail.com > 
						
						
					 
					
						2025-04-02 01:59:27 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						90969fb39a 
					 
					
						
						
							
							[Kernel] Add more dtype support for GGUF dequantization ( #15879 )  
						
						... 
						
						
						
						Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com > 
						
						
					 
					
						2025-04-02 01:58:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						101f1481f9 
					 
					
						
						
							
							[Build/CI] Update lm-eval to 0.4.8 ( #15912 )  
						
						... 
						
						
						
						Signed-off-by: Chris Thi <chris.c.thi@gmail.com > 
						
						
					 
					
						2025-04-02 01:47:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2edc87b161 
					 
					
						
						
							
							[Bugfix] Fix cache block size calculation for CPU MLA ( #15848 )  
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-04-02 01:45:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4203926f10 
					 
					
						
						
							
							[CI/Build] Further clean up LoRA tests ( #15920 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-02 01:39:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cdb57015a7 
					 
					
						
						
							
							[Misc] Replace print with logger ( #15923 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-02 01:37:38 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						aa557e6422 
					 
					
						
						
							
							[Benchmark]Fix error message ( #15866 )  
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com > 
						
						
					 
					
						2025-04-02 01:32:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0e00d40e4f 
					 
					
						
						
							
							[V1][Bugfix] Fix typo in MoE TPU checking ( #15927 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-01 23:46:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c920e01242 
					 
					
						
						
							
							[Doc] Update rocm.inc.md ( #15917 )  
						
						... 
						
						
						
						Signed-off-by: chun37 <chun.jb.37@gmail.com > 
						
						
					 
					
						2025-04-01 23:38:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						274d8e8818 
					 
					
						
						
							
							[V1][Minor] Enhance SpecDecoding Metrics Log in V1 ( #15902 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-01 23:38:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2039c6305b 
					 
					
						
						
							
							[Bugfix] Fix imports for MoE on CPU ( #15841 )  
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-04-02 03:33:55 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6efb195a6e 
					 
					
						
						
							
							[V1] Fix: make sure k_index is int64 for apply_top_k_only ( #15907 )  
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-04-01 19:06:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						24b7fb455a 
					 
					
						
						
							
							[Spec Decode] Fix input triton kernel for eagle ( #15909 )  
						
						
						
						
					 
					
						2025-04-01 18:15:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						58f5a59769 
					 
					
						
						
							
							[Docs] Add Intel as Sponsor ( #15913 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-01 17:16:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						db9dfcfa6a 
					 
					
						
						
							
							[Docs] Add Ollama meetup slides ( #15905 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-01 13:58:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ef98d527e 
					 
					
						
						
							
							[Model][MiniMaxText01] Support MiniMaxText01 model inference ( #13454 )  
						
						... 
						
						
						
						Signed-off-by: qscqesze <475517977@qq.com >
Co-authored-by: qingjun <qingjun@minimaxi.com >
Co-authored-by: qscqesze <475517977@qq.com > 
						
						
					 
					
						2025-04-01 16:23:55 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						93491aefc7 
					 
					
						
						
							
							[BugFix] make sure socket close ( #15875 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-01 13:10:24 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7acd539cd7 
					 
					
						
						
							
							[Docs] update usage stats language ( #15898 )  
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-01 12:54:13 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e75a6301bd 
					 
					
						
						
							
							[V1][Spec Decode] Implement Eagle Proposer [1/N] ( #15729 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-01 12:33:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a79cc68b3a 
					 
					
						
						
							
							[V1][Metrics] Initial speculative decoding metrics ( #15151 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-04-01 10:45:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e3f7a4ee7 
					 
					
						
						
							
							[CI] Disable flaky structure decoding test temporarily. ( #15892 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-01 17:42:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9ec8257914 
					 
					
						
						
							
							[Model] Add module name prefixes to gemma3 ( #15889 )  
						
						... 
						
						
						
						Signed-off-by: Bartholomew Sabat <bartek@recursal.ai >
Co-authored-by: Bartholomew Sabat <bartek@recursal.ai > 
						
						
					 
					
						2025-04-01 10:13:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						38327cf454 
					 
					
						
						
							
							[Model] Aya Vision ( #15441 )  
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-01 16:30:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dfa82e2a3d 
					 
					
						
						
							
							[CI/Build] Clean up LoRA tests ( #15867 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-01 16:28:50 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e59ca942f5 
					 
					
						
						
							
							Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. ( #13932 )  
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-04-01 12:07:43 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a57a3044aa 
					 
					
						
						
							
							[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork ( #15820 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-01 08:56:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e5a0f6ae2 
					 
					
						
						
							
							[Misc] Allow using OpenCV as video IO fallback ( #15055 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-01 15:55:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b63bd14999 
					 
					
						
						
							
							Reinstate format.sh and make pre-commit installation simpler ( #15890 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-01 15:41:30 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2041c0e360 
					 
					
						
						
							
							[Doc] Quark quantization documentation ( #15861 )  
						
						... 
						
						
						
						Signed-off-by: chaow <chaow@amd.com > 
						
						
					 
					
						2025-04-01 08:32:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						085cbc4f9f 
					 
					
						
						
							
							[New Model]: jinaai/jina-reranker-v2-base-multilingual  ( #15876 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-01 08:32:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2b93162fb0 
					 
					
						
						
							
							Remove format.sh as it's been unsupported >70 days ( #15884 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-01 22:27:46 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2e45bd29fe 
					 
					
						
						
							
							[Misc] remove unused script ( #15746 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-01 13:58:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						51d7c6a2b2 
					 
					
						
						
							
							[Model] Support Mistral3 in the HF Transformers format ( #15505 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-01 06:10:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f3aca1ee30 
					 
					
						
						
							
							setup correct nvcc version with CUDA_HOME ( #15725 )  
						
						... 
						
						
						
						Signed-off-by: Yang Chen <yangche@fb.com > 
						
						
					 
					
						2025-04-01 06:09:40 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8dd41d6bcc 
					 
					
						
						
							
							[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE ( #15831 )  
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-01 06:07:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0a298ea418 
					 
					
						
						
							
							[Bugfix] Fix no video/image profiling edge case for MultiModalDataParser ( #15828 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-01 18:17:11 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d330558bab 
					 
					
						
						
							
							[Docs] Fix small error in link text ( #15868 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-01 10:05:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						656fd72976 
					 
					
						
						
							
							[Misc] Fix speculative config repr string ( #15860 )  
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-04-01 02:26:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						79455cf421 
					 
					
						
						
							
							[Misc] Enable V1 LoRA by default ( #15320 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-04-01 16:53:56 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						30d6a015e0 
					 
					
						
						
							
							[Feature] specify model in config.yaml ( #15798 )  
						
						... 
						
						
						
						Signed-off-by: weizeng <weizeng@roblox.com > 
						
						
					 
					
						2025-04-01 01:20:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8af5a5c4e5 
					 
					
						
						
							
							fix: can not use uv run collect_env  close   #13888  ( #15792 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-01 07:45:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3a5f0afcd2 
					 
					
						
						
							
							[V1] Implement sliding window attention in kv_cache_manager ( #14097 )  
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-04-01 00:33:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c7e63aa4d8 
					 
					
						
						
							
							[ROCm] Use device name in the warning ( #15838 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-01 00:10:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4a9ce1784c 
					 
					
						
						
							
							[sleep mode] clear pytorch cache after sleep ( #15248 )  
						
						... 
						
						
						
						Signed-off-by: <villard@us.ibm.com > 
						
						
					 
					
						2025-03-31 22:58:58 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7e4e709b43 
					 
					
						
						
							
							[V1] TPU - Fix fused MOE ( #15834 )  
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-31 22:58:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						63d8eabed0 
					 
					
						
						
							
							[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding  ( #15824 )  
						
						... 
						
						
						
						Signed-off-by: alexwl <alexey.a.kiryushin@gmail.com > 
						
						
					 
					
						2025-03-31 22:57:59 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e830b01383 
					 
					
						
						
							
							[Bugfix] Fix extra comma ( #15851 )  
						
						... 
						
						
						
						Signed-off-by: haochengxia <xhc_1007@163.com > 
						
						
					 
					
						2025-03-31 22:57:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ff6473980d 
					 
					
						
						
							
							[Bugfix][Model] fix mllama multi-image ( #14883 )  
						
						... 
						
						
						
						Signed-off-by: yan ma <yan.ma@intel.com > 
						
						
					 
					
						2025-03-31 22:53:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a164aea35d 
					 
					
						
						
							
							[Frontend] Add Phi-4-mini function calling support ( #14886 )  
						
						... 
						
						
						
						Signed-off-by: Kinfey <kinfeylo@microsoft.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-31 22:50:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a76f547e11 
					 
					
						
						
							
							Rename fallback model and refactor supported models section ( #15829 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-31 22:49:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b7b7676d67 
					 
					
						
						
							
							[Distributed] Add custom allreduce support for ROCM ( #14125 )  
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-03-31 22:49:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e6e3c55ef2 
					 
					
						
						
							
							Move dockerfiles into their own directory ( #14549 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-31 13:47:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f98a4920f9 
					 
					
						
						
							
							[V1][Core] Remove unused speculative config from scheduler ( #15818 )  
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-31 19:15:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d4bfc23ef0 
					 
					
						
						
							
							Fix Transformers backend compatibility check ( #15290 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-31 10:27:07 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9a2160fa55 
					 
					
						
						
							
							[V1] TPU CI - Add basic perf regression test ( #15414 )  
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-31 13:25:20 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2de4118243 
					 
					
						
						
							
							fix: change GB to GiB in logging  close   #14979  ( #15807 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-03-31 10:00:50 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						239b7befdd 
					 
					
						
						
							
							[V1][Spec Decode] Remove deprecated spec decode config params ( #15466 )  
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-03-31 09:19:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						09e974d483 
					 
					
						
						
							
							[Bugfix] Check dimensions of multimodal embeddings in V1 ( #15816 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-31 09:01:35 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e5ef4fa99a 
					 
					
						
						
							
							Upgrade transformers to v4.50.3 ( #13905 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-31 08:59:37 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						037bcd942c 
					 
					
						
						
							
							[Bugfix] Fix missing return value in load_weights method of adapters.py ( #15542 )  
						
						... 
						
						
						
						Signed-off-by: noc-turne <2270929247@qq.com > 
						
						
					 
					
						2025-03-31 06:56:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c2e7507ad4 
					 
					
						
						
							
							[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats ( #15813 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-03-31 13:23:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3aa2b6a637 
					 
					
						
						
							
							[Model] Update support for NemotronNAS models ( #15008 )  
						
						... 
						
						
						
						Signed-off-by: Nave Assaf <nassaf@nvidia.com > 
						
						
					 
					
						2025-03-31 20:35:14 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						555aa21905 
					 
					
						
						
							
							[V1] Fully Transparent Implementation of CPU Offloading ( #15354 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-31 20:22:34 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e7ae3bf3d6 
					 
					
						
						
							
							fix: better install requirement for install in setup.py ( #15796 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-03-31 05:13:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b932c048ac 
					 
					
						
						
							
							Recommend developing with Python 3.12 in developer guide ( #15811 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-03-31 11:54:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e85829450d 
					 
					
						
						
							
							[Feature][ROCm]Enable fusion pass for torch.compile on ROCm ( #15050 )  
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-03-31 04:42:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						effc5d24fa 
					 
					
						
						
							
							[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup ( #15748 )  
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com > 
						
						
					 
					
						2025-03-31 15:38:58 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						18ed3132d2 
					 
					
						
						
							
							[Misc] update the comments ( #15780 )  
						
						... 
						
						
						
						Signed-off-by: chengyang liu <lcy4869@gmail.com >
Co-authored-by: chengyang liu <lcy4869@gmail.com > 
						
						
					 
					
						2025-03-30 19:39:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9b459eca88 
					 
					
						
						
							
							[V1][Scheduler] Avoid calling _try_schedule_encoder_inputs for every request ( #15778 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-30 14:10:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70fedd0f79 
					 
					
						
						
							
							fix: Comments to English for better dev experience ( #15768 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-03-30 10:47:57 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bb103b29bf 
					 
					
						
						
							
							[Bugfix] Added embed_is_patch mask for fuyu model ( #15731 )  
						
						... 
						
						
						
						Signed-off-by: Kyle Huang <kylhuang@nvidia.com > 
						
						
					 
					
						2025-03-30 03:45:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						248e76c4df 
					 
					
						
						
							
							fix: lint fix a ruff checkout syntax error ( #15767 )  
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-03-30 03:36:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						803d5c35f3 
					 
					
						
						
							
							[V1] Override mm_counts for dummy data creation ( #15703 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-30 03:20:42 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7fd8c0f85c 
					 
					
						
						
							
							fix test_phi3v ( #15321 )  
						
						... 
						
						
						
						Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com > 
						
						
					 
					
						2025-03-30 02:01:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						44c3a5abc3 
					 
					
						
						
							
							[doc] update conda to usage link in installation ( #15761 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-30 08:12:13 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6909a76201 
					 
					
						
						
							
							[Bugfix] Fix Mistral guided generation using xgrammar ( #15704 )  
						
						... 
						
						
						
						Signed-off-by: Julien Denize <julien.denize@mistral.ai > 
						
						
					 
					
						2025-03-29 20:20:19 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						045533716b 
					 
					
						
						
							
							[CI] xgrammar structured output supports Enum. ( #15757 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-03-29 20:20:02 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3c0ff914ac 
					 
					
						
						
							
							[Bugfix] Fix Mllama interleaved images input support ( #15564 )  
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-03-29 18:11:15 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2bc4be4e32 
					 
					
						
						
							
							[V1][Minor] Simplify rejection sampler's parse_output ( #15741 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-29 09:25:17 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c67abd614f 
					 
					
						
						
							
							[V1] Support interleaved modality items ( #15605 )  
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-29 06:30:09 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6fa7cd3dbc 
					 
					
						
						
							
							[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore ( #12957 )  
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-03-29 04:01:46 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						94744ba41a 
					 
					
						
						
							
							[V1] [Feature] Collective RPC ( #15444 )  
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-03-29 03:39:14 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4965ec42d2 
					 
					
						
						
							
							[FEAT] [ROCm] Add AITER int8 scaled gemm kernel ( #15433 )  
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-03-29 03:33:56 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						73aa7041bf 
					 
					
						
						
							
							[doc] update doc ( #15740 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-29 04:27:22 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7c1f760024 
					 
					
						
						
							
							[Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 ( #15659 )  
						
						... 
						
						
						
						Signed-off-by: Yarong Mu <ymu@google.com > 
						
						
					 
					
						2025-03-28 21:13:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						da461f3cbf 
					 
					
						
						
							
							[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K ( #15714 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-28 21:13:06 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5b800f0932 
					 
					
						
						
							
							[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server ( #15700 )  
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-03-28 21:12:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8427f70493 
					 
					
						
						
							
							Use numba 0.61 for python 3.10+ to support numpy>=2 ( #15692 )  
						
						... 
						
						
						
						Signed-off-by: cyy <cyyever@outlook.com > 
						
						
					 
					
						2025-03-29 12:11:51 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7a7992085b 
					 
					
						
						
							
							[CI] Speed up V1 structured output tests ( #15718 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-28 21:10:45 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1286211f57 
					 
					
						
						
							
							[Bugfix] LoRA V1: add and fix entrypoints tests ( #15715 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-28 21:10:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6d531ad7b8 
					 
					
						
						
							
							[Misc][V1] Misc code streamlining ( #15723 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-28 20:59:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						762b424a52 
					 
					
						
						
							
							[Docs] Document v0 engine support in reasoning outputs ( #15739 )  
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai > 
						
						
					 
					
						2025-03-29 03:46:57 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						de1cb38769 
					 
					
						
						
							
							[Model] Support Skywork-R1V ( #15397 )  
						
						... 
						
						
						
						Signed-off-by: jiacai.liu <932997367@qq.com >
Co-authored-by: jiacai.liu <932997367@qq.com > 
						
						
					 
					
						2025-03-28 20:39:21 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c802f5430d 
					 
					
						
						
							
							[ROCm][AMD][Build] Update AMD supported arch list ( #15632 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-03-28 20:39:18 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cff8991a50 
					 
					
						
						
							
							[Docs][V1] Optimize diagrams in prefix caching design ( #15716 )  
						
						
						
						
					 
					
						2025-03-29 03:33:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f3f8d8fff4 
					 
					
						
						
							
							implement prometheus fast-api-instrumentor for http service metrics ( #15657 )  
						
						
						
						
					 
					
						2025-03-29 00:12:02 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						26df46ee59 
					 
					
						
						
							
							[Misc] cli auto show default value ( #15582 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-28 22:23:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c3f687ac22 
					 
					
						
						
							
							[V1] TPU - Fix the chunked prompt bug ( #15713 )  
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-28 20:19:04 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						04437e313d 
					 
					
						
						
							
							[Bugfix] [torch.compile] Add Dynamo metrics context during compilation ( #15639 )  
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-03-28 14:01:09 -06:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						038bededba 
					 
					
						
						
							
							[TPU] [Perf] Improve Memory Usage Estimation ( #15671 )  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-03-28 17:37:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d03308be0c 
					 
					
						
						
							
							[Misc] Remove stale func in KVTransferConfig ( #14746 )  
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-03-28 17:33:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c6bc0034d0 
					 
					
						
						
							
							[Misc] Remove unused utils and clean up imports ( #15708 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-28 09:41:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70e132244a 
					 
					
						
						
							
							[Minor] Remove TGI launching script  ( #15646 )  
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-28 09:30:08 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						47e9038d23 
					 
					
						
						
							
							Fix cpu offload testing for gptq/awq/ct ( #15648 )  
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-29 00:29:32 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						432cf22a6a 
					 
					
						
						
							
							[Bugfix] Fix regex compile display format ( #15368 )  
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-03-28 08:58:44 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2914006fe0 
					 
					
						
						
							
							[doc] add missing imports ( #15699 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-28 15:56:48 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7329ff5468 
					 
					
						
						
							
							[V1] Support disable_any_whtespace for guidance backend ( #15584 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-28 23:46:45 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						541d1df486 
					 
					
						
						
							
							[Bugfix] embed_is_patch for Idefics3 ( #15696 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-28 08:27:52 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3b00ff9138 
					 
					
						
						
							
							[Bugfix][v1] xgrammar structured output supports Enum. ( #15594 )  
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-03-28 06:14:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						91276c5721 
					 
					
						
						
							
							[Model] Adding torch compile annotations to chatglm ( #15624 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-28 21:14:09 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0b4167526d 
					 
					
						
						
							
							[Docs] Add "Generation quality changed" section to troubleshooting ( #15701 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-28 13:03:21 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fd5fd26902 
					 
					
						
						
							
							[Frontend] update priority for --api-key and VLLM_API_KEY ( #15588 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-28 19:40:12 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3bbaacbe15 
					 
					
						
						
							
							[Bugfix][Frontend] Eliminate regex based check in reasoning full generator ( #14821 )  
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai > 
						
						
					 
					
						2025-03-28 11:20:35 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a10314c6b3 
					 
					
						
						
							
							[Misc] Fix test_sleep to use query parameters ( #14373 )  
						
						... 
						
						
						
						Signed-off-by: Lize Cai <lize.cai@sap.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-28 18:00:14 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						70f2c2a709 
					 
					
						
						
							
							[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' ( #15674 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-28 17:10:40 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						280d074103 
					 
					
						
						
							
							[CPU][CI] Improve CPU Dockerfile ( #15690 )  
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-03-28 01:36:31 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32b14baf8a 
					 
					
						
						
							
							[Refactor][Frontend] Keep all logic about reasoning into one class ( #14428 )  
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai > 
						
						
					 
					
						2025-03-28 00:23:30 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						2d9045fce8 
					 
					
						
						
							
							[TPU][CI] Fix TPUModelRunner Test ( #15667 )  
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-03-28 00:01:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						355f66348c 
					 
					
						
						
							
							[V1] Remove legacy input registry ( #15673 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 23:34:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8693e47e6a 
					 
					
						
						
							
							[Bugfix] Fix mm_hashes forgetting to be passed ( #15668 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-28 05:51:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cec8c7d7f8 
					 
					
						
						
							
							Refactor error handling for multiple exceptions in preprocessing ( #15650 )  
						
						... 
						
						
						
						Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com > 
						
						
					 
					
						2025-03-28 03:27:20 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4d0ec37267 
					 
					
						
						
							
							[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 ( #14578 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-03-28 02:58:16 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e7f720ea56 
					 
					
						
						
							
							[Misc]add coding benchmark for speculative decoding ( #15303 )  
						
						... 
						
						
						
						Signed-off-by: CXIAAAAA <cxia0209@gmail.com > 
						
						
					 
					
						2025-03-28 10:47:05 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ae17bf1e2 
					 
					
						
						
							
							Revert "Use Cache Hinting for fused_moe kernel ( #15511 )" ( #15645 )  
						
						... 
						
						
						
						Signed-off-by: Wes Medford <wryanmedford@gmail.com > 
						
						
					 
					
						2025-03-27 19:45:55 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8a49eea74b 
					 
					
						
						
							
							[CI][TPU] Temporarily Disable Quant Test on TPU ( #15649 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-27 19:45:05 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b4245a48df 
					 
					
						
						
							
							[Doc] Fix dead links in Job Board ( #15637 )  
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-03-28 02:43:40 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4e0f6076be 
					 
					
						
						
							
							[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. ( #14948 )  
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-28 10:13:41 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						726efc6a32 
					 
					
						
						
							
							[Quantization][V1]  BitsAndBytes support V1 ( #15611 )  
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-28 10:12:47 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						bd45912b99 
					 
					
						
						
							
							[TPU] Lazy Import ( #15656 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-28 09:57:01 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						15dac210f0 
					 
					
						
						
							
							[V1] AsyncLLM data parallel ( #13923 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-27 16:14:41 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						112b3e5b3b 
					 
					
						
						
							
							[CI] Update rules for applying tpu label. ( #15634 )  
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-27 22:15:26 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						32d669275b 
					 
					
						
						
							
							Correct PowerPC to modern IBM Power ( #15635 )  
						
						... 
						
						
						
						Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com > 
						
						
					 
					
						2025-03-27 15:04:32 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4098b72210 
					 
					
						
						
							
							[Bugfix][TPU][V1] Fix recompilation ( #15553 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-27 19:15:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						46450b8d33 
					 
					
						
						
							
							Use absolute placement for Ask AI button ( #15628 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-27 18:52:18 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						13ac9cab21 
					 
					
						
						
							
							[Misc] Avoid direct access of global mm_registry in compute_encoder_budget ( #15621 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 17:52:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						66aa4c0bf4 
					 
					
						
						
							
							[Feature] Add middleware to log API Server responses ( #15593 )  
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-03-27 17:49:38 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						247181536f 
					 
					
						
						
							
							[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs ( #15620 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 17:36:32 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						07bf813fb5 
					 
					
						
						
							
							[Doc] Link to onboarding tasks ( #15629 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 16:30:53 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8958217ad5 
					 
					
						
						
							
							[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 ( #15211 )  
						
						... 
						
						
						
						Signed-off-by: h-sugi <h.sugi@ieee.org >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-27 22:29:29 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ac5bc615b0 
					 
					
						
						
							
							[Model] MiniCPM-V/O supports V1 ( #15487 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 06:07:29 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8063dfc61a 
					 
					
						
						
							
							[Doc] update --system for transformers installation in docker doc ( #15616 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-27 20:38:46 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6278bc829e 
					 
					
						
						
							
							Fix incorrect filenames in vllm_compile_cache.py ( #15494 )  
						
						... 
						
						
						
						Signed-off-by: <zou3519@gmail.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-27 18:33:41 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3f532cb6a6 
					 
					
						
						
							
							[Misc] Use model_redirect to redirect the model name to a local folder. ( #14116 )  
						
						
						
						
					 
					
						2025-03-27 02:21:23 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e6c9053f9e 
					 
					
						
						
							
							[Misc] Clean up scatter_patch_features ( #15559 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 07:45:00 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						43ed4143c4 
					 
					
						
						
							
							[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM ( #15587 )  
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Co-authored-by: ElizaWszola <eliza@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-03-27 06:47:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						f4c98b4d4c 
					 
					
						
						
							
							[Misc] Consolidate LRUCache implementations ( #15481 )  
						
						... 
						
						
						
						Signed-off-by: Bella kira <2374035698@qq.com > 
						
						
					 
					
						2025-03-27 06:43:43 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e1e0fd7543 
					 
					
						
						
							
							[TPU] Avoid Triton Import ( #15589 )  
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-27 06:43:02 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						df8d3d1287 
					 
					
						
						
							
							[Misc] Restrict ray version dependency and update PP feature warning in V1 ( #15556 )  
						
						
						
						
					 
					
						2025-03-27 06:21:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						619d3de8bd 
					 
					
						
						
							
							[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS ( #15583 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-03-26 22:46:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ecff8309a3 
					 
					
						
						
							
							[ROCm] Env variable to trigger custom PA ( #15557 )  
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-03-26 22:46:12 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dcf2a590f5 
					 
					
						
						
							
							Allow torchao quantization in SiglipMLP ( #15575 )  
						
						
						
						
					 
					
						2025-03-26 22:45:51 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						54aa619459 
					 
					
						
						
							
							[V1] Refactor num_computed_tokens logic ( #15307 )  
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-27 04:54:36 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						fb22be5817 
					 
					
						
						
							
							[moe][quant] add weight name case for offset ( #15515 )  
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-03-27 04:50:29 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7f301dd8ef 
					 
					
						
						
							
							[Doc] Update V1 user guide for fp8 kv cache support ( #15585 )  
						
						... 
						
						
						
						Signed-off-by: weizeng <weizeng@roblox.com > 
						
						
					 
					
						2025-03-26 19:39:03 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						8095341a01 
					 
					
						
						
							
							[misc] LoRA: Remove unused long context test data ( #15558 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-27 10:04:51 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						69db16a46a 
					 
					
						
						
							
							add platform check back ( #15578 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <llccyy1212@gmail.com > 
						
						
					 
					
						2025-03-27 01:50:27 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ce78f9af4e 
					 
					
						
						
							
							Add automatic tpu label to mergify.yml ( #15560 )  
						
						
						
						
					 
					
						2025-03-26 21:39:58 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9239bf718e 
					 
					
						
						
							
							[Kernel] CUTLASS grouped gemm fp8 MoE kernel ( #13972 )  
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com > 
						
						
					 
					
						2025-03-27 00:54:44 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7a6d45bc8a 
					 
					
						
						
							
							Support FIPS enabled machines with MD5 hashing ( #15299 )  
						
						... 
						
						
						
						Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com > 
						
						
					 
					
						2025-03-26 20:19:46 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e74ff409e0 
					 
					
						
						
							
							[TPU] support disabling xla compilation cache ( #15567 )  
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-03-27 00:09:28 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						7a888271f5 
					 
					
						
						
							
							Use Cache Hinting for fused_moe kernel ( #15511 )  
						
						
						
						
					 
					
						2025-03-26 23:21:34 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						9d119a86ae 
					 
					
						
						
							
							[V1] TPU CI - Fix test_compilation.py ( #15570 )  
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-26 21:51:54 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						b2e85e26f4 
					 
					
						
						
							
							[V1] TPU - Revert to exponential padding by default ( #15565 )  
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-26 21:35:05 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						dd8a29da99 
					 
					
						
						
							
							Applying some fixes for K8s agents in CI ( #15493 )  
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-03-26 20:35:11 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						27df5199d9 
					 
					
						
						
							
							Support SHA256 as hash function in prefix caching ( #15297 )  
						
						... 
						
						
						
						Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com > 
						
						
					 
					
						2025-03-26 11:11:28 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						35fad35a48 
					 
					
						
						
							
							[V1][Sampler] Faster top-k only implementation ( #15478 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-26 10:56:47 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						733e7c9e95 
					 
					
						
						
							
							[Refactor] Remove unnecessary backend parameter in structured output interface ( #15317 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-03-26 17:51:56 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0af4d764d6 
					 
					
						
						
							
							Fix weight loading for some models in Transformers backend ( #15544 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-26 10:17:53 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e64afa455c 
					 
					
						
						
							
							multi-node offline DP+EP example ( #15484 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-26 23:54:24 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1711b929b6 
					 
					
						
						
							
							[Model] Add Reasoning Parser for Granite Models ( #14202 )  
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com >
Co-authored-by: Joe Runde <joe@joerun.de > 
						
						
					 
					
						2025-03-26 14:28:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						c091c0a588 
					 
					
						
						
							
							Improve validation of TP in Transformers backend ( #15540 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-26 07:26:48 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						1aa162e030 
					 
					
						
						
							
							Apply torchfix ( #15532 )  
						
						... 
						
						
						
						Signed-off-by: cyy <cyyever@outlook.com > 
						
						
					 
					
						2025-03-26 12:09:06 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						cf5c8f1686 
					 
					
						
						
							
							Separate base model from TransformersModel ( #15467 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-03-26 18:13:38 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4ec2cee000 
					 
					
						
						
							
							[Misc] improve example script output ( #15528 )  
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-26 10:12:47 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						99f536f830 
					 
					
						
						
							
							[Misc] Enhance warning information to user-defined chat template ( #15408 )  
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-03-26 02:21:15 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5ebf66748b 
					 
					
						
						
							
							[FEAT][ROCm] Integrate Fused MoE Kernels from AITER ( #14967 )  
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-03-26 16:30:30 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						781d056280 
					 
					
						
						
							
							[Feature] Enhance EAGLE Architecture with Proper RMS Norms ( #14990 )  
						
						... 
						
						
						
						Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-03-26 08:24:07 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5aefd6ac31 
					 
					
						
						
							
							Fix raw_request extraction in load_aware_call decorator ( #15382 )  
						
						... 
						
						
						
						Signed-off-by: Daniel Salib <danielsalib@meta.com > 
						
						
					 
					
						2025-03-25 22:29:54 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6c663dfd5e 
					 
					
						
						
							
							[misc] LoRA - Skip LoRA kernels when not required ( #15152 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-26 11:33:45 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						33437bc6e7 
					 
					
						
						
							
							[BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) ( #15492 )  
						
						... 
						
						
						
						Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-03-25 20:33:22 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						23114d3364 
					 
					
						
						
							
							[Misc] Warn about v0 in benchmark_paged_attn.py ( #15495 )  
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-25 20:31:04 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						997c8811d6 
					 
					
						
						
							
							[Model] Support multi-image for Molmo ( #15438 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-26 11:26:33 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e42389f9d7 
					 
					
						
						
							
							Transformers backend already supports V1 ( #15463 )  
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-25 20:26:16 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ff38f0a32c 
					 
					
						
						
							
							[CI/Build] LoRA: Delete long context tests ( #15503 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-25 17:18:34 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a5cfbab3c8 
					 
					
						
						
							
							[Core] LoRA: V1 Scheduler optimization ( #15422 )  
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-25 22:50:09 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						ac3cd6e83c 
					 
					
						
						
							
							[core] add bucket padding to tpu_model_runner ( #14995 )  
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <llccyy1212@gmail.com >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-25 17:27:22 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						082ab86f5f 
					 
					
						
						
							
							[V1] Support long_prefill_token_threshold in v1 scheduler ( #15419 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-03-25 14:22:26 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						6aa196c8dc 
					 
					
						
						
							
							[V1][Minor] Use SchedulerInterface type for engine scheduler field ( #15499 )  
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-25 14:21:36 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a0dd7dcd49 
					 
					
						
						
							
							[TPU][V1] Fix Sampler recompilation ( #15309 )  
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-25 16:43:54 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						e977c11111 
					 
					
						
						
							
							Add workaround for shared field_names in pydantic model class ( #13925 )  
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-03-25 20:31:08 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5f063a80bd 
					 
					
						
						
							
							[bugfix] add supports_v1 platform interface ( #15417 )  
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-03-25 15:00:32 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5d8e1c9279 
					 
					
						
						
							
							[Bugfix] Support triton==3.3.0+git95326d9f for RTX 5090 (Unsloth + vLLM compatibility) ( #15471 )  
						
						... 
						
						
						
						Co-authored-by: ServerAI <ai@exc-mad-ai.com > 
						
						
					 
					
						2025-03-25 17:59:25 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						0a049c7d86 
					 
					
						
						
							
							[CI/Build] Add tests for the V1 tpu_model_runner. ( #14843 )  
						
						... 
						
						
						
						Signed-off-by: Yarong Mu <ymu@google.com > 
						
						
					 
					
						2025-03-25 12:27:16 -04:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						d0cfec7ab9 
					 
					
						
						
							
							[bugfix] fix inductor cache on max_position_embeddings ( #15436 )  
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-25 07:05:39 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a608160027 
					 
					
						
						
							
							[Kernel] Fix conflicting macro names for gguf kernels ( #15456 )  
						
						... 
						
						
						
						Signed-off-by: SzymonOzog <szymon.ozog@gmail.com > 
						
						
					 
					
						2025-03-25 13:50:49 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3f04a7fbf2 
					 
					
						
						
							
							[Doc] Update V1 user guide for multi-modality ( #15460 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-25 11:01:58 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						5994430b84 
					 
					
						
						
							
							[Misc] Remove redundant num_embeds ( #15443 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-25 18:27:57 +08:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						a9e879b316 
					 
					
						
						
							
							[Misc] Clean up MiniCPM-V/O code ( #15337 )  
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-25 10:22:52 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						3e2f37a69a 
					 
					
						
						
							
							Dockerfile.ppc64le changes to move to UBI ( #15402 )  
						
						... 
						
						
						
						Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com > 
						
						
					 
					
						2025-03-25 10:15:14 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4f044b1d67 
					 
					
						
						
							
							[Kernel][CPU] CPU MLA ( #14744 )  
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-03-25 09:34:59 +00:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						4157f563b4 
					 
					
						
						
							
							[Hardware][TPU][Bugfix] Fix v1 mp profiler ( #15409 )  
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-03-25 01:43:00 -07:00 
						 
				 
			
				
					
						
					 
					
						
						
							
						
						051da7efe3 
					 
					
						
						
							
							Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 ( #15160 )  
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Richard Barnes <rbarnes@meta.com > 
						
						
					 
					
						2025-03-25 15:36:45 +08:00