728c365e4d 
					 
					
						
						
							
							Use uv to install python in Dockerfile  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-10-02 11:05:47 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be8921fbba 
					 
					
						
						
							
							Change size of single CUDA graph for CI to 4 ( #26089 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-10-02 14:14:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d4e7a1152d 
					 
					
						
						
							
							Update base image to 22.04 (jammy) ( #26065 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-10-02 05:48:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be22bb6f3d 
					 
					
						
						
							
							Run:ai model streamer add GCS package support ( #24909 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Schuurman <psch@google.com > 
						
						
					 
					
						2025-10-01 20:59:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						169313b9f8 
					 
					
						
						
							
							[Misc] Make handling of SamplingParams clearer in n>1 case ( #26032 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-10-01 19:31:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b018d8baf 
					 
					
						
						
							
							[ROCm][Bugfix] Add missing parameter to ROCm backend ( #26029 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-10-01 19:23:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c31246800c 
					 
					
						
						
							
							Support RL online quantization with torchao ( #23014 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jerry Zhang <jerryzh168@gmail.com > 
						
						
					 
					
						2025-10-01 16:39:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4134312b35 
					 
					
						
						
							
							[BugFix] ChunkedLocalAttention is currently not CG compatible ( #26034 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-10-01 16:28:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da554f932e 
					 
					
						
						
							
							[Bug] Fix Negative Cuda Memory Usage ( #25683 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-10-01 18:16:26 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aac622e0cd 
					 
					
						
						
							
							[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series ( #25908 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hosang Yoon <hosang.yoon@amd.com > 
						
						
					 
					
						2025-10-01 21:39:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1726e93ef1 
					 
					
						
						
							
							[BugFix][DP/EP] Fix CUTLASS MLA hang under load ( #26026 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com > 
						
						
					 
					
						2025-10-01 12:30:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ee04c0cd04 
					 
					
						
						
							
							[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability ( #26030 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-10-01 12:02:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c36f0aa300 
					 
					
						
						
							
							Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets ( #25995 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huamin Li <3ericli@gmail.com > 
						
						
					 
					
						2025-10-01 18:18:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5234dc7451 
					 
					
						
						
							
							[NVIDIA] Blackwell Family ( #24673 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Johnny <johnnynuca14@gmail.com >
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: Johnny <johnnync13@gmail.com >
Signed-off-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com >
Co-authored-by: Salvatore Cena <cena@cenas.it > 
						
						
					 
					
						2025-10-01 10:50:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b7c20a6b5 
					 
					
						
						
							
							[Bugfix] Apply same sampling parameters for both n=1 and n>1 ( #26005 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp > 
						
						
					 
					
						2025-10-01 14:37:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f9e714813a 
					 
					
						
						
							
							[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type ( #26007 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nathan Scott <nathans@redhat.com > 
						
						
					 
					
						2025-10-01 12:41:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2518230d3e 
					 
					
						
						
							
							[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 ( #25829 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: billishyahao <bill.he@amd.com >
Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-10-01 08:39:45 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a332b84578 
					 
					
						
						
							
							[CI] Only capture a single CUDA graph size in CI by default ( #25951 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-10-01 10:03:44 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1405f0c7ba 
					 
					
						
						
							
							[Misc] Factor out common _apply_feature_select_strategy ( #26003 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-10-01 01:31:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						84d57342b6 
					 
					
						
						
							
							[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker ( #26004 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-10-01 08:03:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57b46d769e 
					 
					
						
						
							
							[Doc] updating torch.compile doc link ( #25989 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: nadathurv <work.vnadathur@gmail.com >
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com > 
						
						
					 
					
						2025-10-01 07:04:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f48b6a03ba 
					 
					
						
						
							
							[Misc]allow disable pynccl ( #25421 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com > 
						
						
					 
					
						2025-10-01 06:04:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2a69ab4899 
					 
					
						
						
							
							Update to Transformers v4.56.2 ( #24638 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-30 22:07:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d7da92fd7 
					 
					
						
						
							
							[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 ( #25988 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-30 21:58:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e952eee698 
					 
					
						
						
							
							[Bugfix] Fix __syncwarp on ROCM ( #25996 )  
						
						 
						
						
						
						
					 
					
						2025-09-30 21:15:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66bca9b8bd 
					 
					
						
						
							
							[MM] Add text-only mode for Qwen3-VL ( #26000 )  
						
						 
						
						
						
						
					 
					
						2025-09-30 21:13:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99028fda44 
					 
					
						
						
							
							Fix INT8 quantization error on Blackwell GPUs (SM100+) ( #25935 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: padg9912 <phone.and.desktop@gmail.com > 
						
						
					 
					
						2025-09-30 19:19:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1244948885 
					 
					
						
						
							
							[Log] Optimize Log for FP8MOE ( #25709 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-30 19:18:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a73f6491c8 
					 
					
						
						
							
							Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning ( #25843 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-30 19:18:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						001e50c92c 
					 
					
						
						
							
							[Model] MTP fallback to eager for DeepSeek v32 ( #25982 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-10-01 01:53:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						96ebcaa3ad 
					 
					
						
						
							
							[Misc] Make EP kernels install script support uv ( #25785 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-30 23:38:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5db1870bb9 
					 
					
						
						
							
							[gpt-oss] use vLLM instead of openai types for streaming ( #25186 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com > 
						
						
					 
					
						2025-09-30 22:47:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ce26b9b5d 
					 
					
						
						
							
							[Docs] Remove API Reference from search index ( #25949 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 22:10:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a388252ac4 
					 
					
						
						
							
							Add explicit pooling classes for the Transformers backend ( #25322 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-30 23:07:06 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a9f48dff7 
					 
					
						
						
							
							[V1] [P/D] Add Support for KV Load Failure Recovery ( #19330 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com > 
						
						
					 
					
						2025-09-30 14:57:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67f3fb0844 
					 
					
						
						
							
							[Bench] Add DeepSeekV32 to MoE benchmark ( #25962 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-30 14:13:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						43b752c325 
					 
					
						
						
							
							[Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding ( #25889 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 20:35:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cfd302db9b 
					 
					
						
						
							
							OffloadingConnector: Fix GPU block tracking bug ( #25856 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-09-30 19:53:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb610ae684 
					 
					
						
						
							
							[Docs] Add moe kernel features doc  ( #25297 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 19:03:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f652e6cdf 
					 
					
						
						
							
							[Doc] Improve MM Pooling model documentation ( #25966 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-30 18:58:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e6a226efba 
					 
					
						
						
							
							[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' ( #25958 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-30 11:13:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a2e6fa7e03 
					 
					
						
						
							
							[bugfix][deepseek] fix flashmla kernel selection ( #25956 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-10-01 00:30:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f1c4ecaf2 
					 
					
						
						
							
							[Bugfix] Token type and position embeddings fail to be applied to inputs_embeds ( #25922 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-10-01 00:23:12 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ef283548f7 
					 
					
						
						
							
							[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging ( #25895 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com > 
						
						
					 
					
						2025-09-30 10:51:31 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4db5e6de1 
					 
					
						
						
							
							[Bugfix][Model] Fix inference for Hunyuan dense models ( #25354 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: anion <1005128408@qq.com >
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 14:38:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						099aaee536 
					 
					
						
						
							
							Add Hugging Face Inference Endpoints guide to Deployment docs ( #25886 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 14:35:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35fe398c7c 
					 
					
						
						
							
							[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 ( #25858 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 07:30:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb6d43047e 
					 
					
						
						
							
							[Fix] Improve CPU backend compatibility for RISC-V ( #25816 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com > 
						
						
					 
					
						2025-09-30 13:48:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc546f76a1 
					 
					
						
						
							
							[CI] Move applicable tests to CPU ( #24080 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 14:45:20 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						80608ba5af 
					 
					
						
						
							
							[NIXL] Add support for MLA caches with different latent dim ( #25902 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-30 12:18:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e184c9c510 
					 
					
						
						
							
							[perf] Use CPU tensor to reduce GPU->CPU sync ( #25884 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lehua Ding <lehuading@tencent.com > 
						
						
					 
					
						2025-09-30 19:51:16 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d7e34b4210 
					 
					
						
						
							
							[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs ( #25938 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-30 11:24:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ef6e0e7132 
					 
					
						
						
							
							[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 ( #25936 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangyafeng <wangyafeng@baidu.com > 
						
						
					 
					
						2025-09-30 19:11:21 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1ad3aca682 
					 
					
						
						
							
							Updated TRL integration docs ( #25684 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 03:10:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d0afa9b42 
					 
					
						
						
							
							[Doc] Add Cambricon MLU support ( #25942 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: a120092009 <zhaoty0121@gmail.com > 
						
						
					 
					
						2025-09-30 17:59:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa7e254a7f 
					 
					
						
						
							
							[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com >
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com > 
						
						
					 
					
						2025-09-30 17:14:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e23cacda35 
					 
					
						
						
							
							[Bugfix]: Clean up chunked prefill logging when using whisper ( #25075 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com > 
						
						
					 
					
						2025-09-30 08:17:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e1b8bc2b6 
					 
					
						
						
							
							[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not ( #25925 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhoukz <me@zhoukz.com > 
						
						
					 
					
						2025-09-30 08:15:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e47433b3c1 
					 
					
						
						
							
							[BugFix] Pass config_format via try_get_generation_config ( #25912 )  
						
						 
						
						
						
						
					 
					
						2025-09-30 05:09:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23194d83e8 
					 
					
						
						
							
							[BugFix] Fix DP/EP hang  ( #25906 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-30 04:18:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61aedb5ffe 
					 
					
						
						
							
							MoveVllmConfig from config/__init__.py to config/vllm.py ( #25271 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-29 19:49:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3bd171123 
					 
					
						
						
							
							[Benchmark] Support benchmark throughput for external launcher DP ( #25913 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhuohan Li <zhuohan123@gmail.com > 
						
						
					 
					
						2025-09-30 01:43:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89e4050af4 
					 
					
						
						
							
							[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 ( #25909 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-30 09:15:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78a47f87ce 
					 
					
						
						
							
							Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models  ( #25717 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-30 08:10:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a113d9aed 
					 
					
						
						
							
							[V0 Deprecation] Remove vllm.worker and update according imports ( #25901 )  
						
						 
						
						
						
						
					 
					
						2025-09-29 23:26:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e4fe48c37 
					 
					
						
						
							
							[NIXL] Increase default KV block eviction timeout on P ( #25897 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-29 21:35:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8eb0a1d906 
					 
					
						
						
							
							[Doc] Polish example for torchrun dp ( #25899 )  
						
						 
						
						
						
						
					 
					
						2025-09-29 21:31:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fea3e476aa 
					 
					
						
						
							
							[Kernel] Chunk-aligned mamba2 ( #24683 )  
						
						 
						
						
						
						
					 
					
						2025-09-29 23:18:25 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61a3431613 
					 
					
						
						
							
							[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so ( #25605 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-29 17:01:50 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9bedac9623 
					 
					
						
						
							
							[Doc] Add documentation for vLLM continuous benchmarking and profiling ( #25819 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Naman Lalit <nl2688@nyu.edu > 
						
						
					 
					
						2025-09-29 20:49:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c42ff4f4fd 
					 
					
						
						
							
							[BugFix][torch.compile] KV scale calculation issues with FP8 quantization ( #25513 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: adabeyta <aabeyta@redhat.com > 
						
						
					 
					
						2025-09-29 15:52:04 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d5ab28511c 
					 
					
						
						
							
							[Bugfix] Use correct key "ignore" for config.json non-quantized layers ( #25706 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lee Nau <lnau@nvidia.com > 
						
						
					 
					
						2025-09-29 15:07:29 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e61eb5e09d 
					 
					
						
						
							
							[Model] Remove MotifForCausalLM ( #25866 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-30 00:36:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0899ba5b42 
					 
					
						
						
							
							[CI/Build] Include Transformers backend test in nightly transformers test ( #25885 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-29 09:33:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						145ac73317 
					 
					
						
						
							
							[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue ( #25883 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rahul Tuli <rtuli@redhat.com > 
						
						
					 
					
						2025-09-29 11:37:20 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d0d138bc55 
					 
					
						
						
							
							[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) ( #24690 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenxi Yang <cxyang@fb.com >
Co-authored-by: Chenxi Yang <cxyang@fb.com > 
						
						
					 
					
						2025-09-29 14:31:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						43227236ec 
					 
					
						
						
							
							[torch.compile] serialize cudagraph_mode as its enum name instead of value ( #25868 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-09-29 13:54:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8616300ae2 
					 
					
						
						
							
							[Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models ( #25854 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhoukz <me@zhoukz.com > 
						
						
					 
					
						2025-09-29 10:59:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						edbaadd91f 
					 
					
						
						
							
							[Bugfix] Fix requirements paths in install instructions ( #25827 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yingjun-mou <renzomou@gmail.com > 
						
						
					 
					
						2025-09-29 03:49:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9360d34fa1 
					 
					
						
						
							
							update to latest deepgemm for dsv3.2 ( #25871 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-29 17:51:43 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b67b04656 
					 
					
						
						
							
							[Misc] Remove more get_input_embeddings_v0 ( #25857 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-29 08:03:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd51f78e39 
					 
					
						
						
							
							[V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge ( #25331 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-09-29 14:09:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65ecb4f134 
					 
					
						
						
							
							[Bugfix] Fallback ViT attn backend to SDPA for blackwell ( #25851 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-29 06:03:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						143844fa43 
					 
					
						
						
							
							[XPU]Fix xpu spec decoding UTs, avoid using cuda graph ( #25847 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-09-29 05:15:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						219cfbe7f6 
					 
					
						
						
							
							Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS ( #25832 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-09-29 05:08:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9b44a7d926 
					 
					
						
						
							
							[P/D] NIXL Updates ( #25844 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-09-29 04:46:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3ae45a38c 
					 
					
						
						
							
							[Misc] fix tests failure by using current_platform ( #25825 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Juechen Liu <jueliu@meta.com > 
						
						
					 
					
						2025-09-29 04:18:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0307428d65 
					 
					
						
						
							
							Remove redundant cudagraph dispatcher warning ( #25841 )  
						
						 
						
						
						
						
					 
					
						2025-09-28 17:12:42 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						471997adf6 
					 
					
						
						
							
							[Bugfix] fix Qwen3VLMoe load when pp > 1 ( #25838 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com > 
						
						
					 
					
						2025-09-28 17:56:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1ded114b9 
					 
					
						
						
							
							Update GLM-4.5 Doc transformers version ( #25830 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com > 
						
						
					 
					
						2025-09-28 12:05:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4e4088c99 
					 
					
						
						
							
							Fix random dataset mismatched token length with config. ( #24937 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Weiliang Liu <weiliangl@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-28 08:23:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0efd540dbc 
					 
					
						
						
							
							[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling ( #25557 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-28 04:21:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6144754014 
					 
					
						
						
							
							[Bugfix] Fix Qwen3-VL regression from  #24982  ( #25814 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-28 03:21:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69311446ba 
					 
					
						
						
							
							[MM] Optimize memory profiling for scattered multimodal embeddings ( #25810 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-28 02:17:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da63274d9f 
					 
					
						
						
							
							[Bugfix][NIXL] Fix Async Scheduler timeout issue ( #25808 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-27 15:17:35 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c216119d64 
					 
					
						
						
							
							[Core] GC Debug callback ( #24829 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <jialino@meta.com >
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Jialin Ouyang <jialino@meta.com > 
						
						
					 
					
						2025-09-27 17:53:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5546acb463 
					 
					
						
						
							
							[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location ( #25766 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Clayton Coleman <smarterclayton@gmail.com > 
						
						
					 
					
						2025-09-27 13:36:28 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c0ec81836f 
					 
					
						
						
							
							[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable ( #25651 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-27 16:09:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b65e56babe 
					 
					
						
						
							
							[Core] Refactor self.model() to call a helper for subclassing. ( #25084 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Patrick Toulme <ptoulme@meta.com >
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com > 
						
						
					 
					
						2025-09-27 08:40:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						49996cd597 
					 
					
						
						
							
							[env] default nixl side port conflicts with kv-event zmq port ( #25056 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-09-27 15:02:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ecb37e276a 
					 
					
						
						
							
							[docs] transcriptions API audio upload ( #25446 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zxw <1020938856@qq.com > 
						
						
					 
					
						2025-09-27 15:00:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a5354b3ed2 
					 
					
						
						
							
							[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models ( #24982 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com > 
						
						
					 
					
						2025-09-27 14:22:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f9df8b4ad7 
					 
					
						
						
							
							[Bugfix] Fix triton import precommit failure ( #25803 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com > 
						
						
					 
					
						2025-09-27 07:13:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec152c8748 
					 
					
						
						
							
							Fix GPTQ model loading in Transformers backend ( #25770 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-27 12:18:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7977e5027c 
					 
					
						
						
							
							Add filtering for chat template kwargs ( #25794 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-27 10:46:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3f5d902d2a 
					 
					
						
						
							
							Validate API tokens in constant time ( #25781 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com > 
						
						
					 
					
						2025-09-27 18:09:26 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27d7638b94 
					 
					
						
						
							
							[Bugfix] Merge MM embeddings by index instead of token IDs ( #16229 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-27 08:15:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						176173989a 
					 
					
						
						
							
							[Bugfix] Add missing image_size for phi4_multimodal ( #25796 )  
						
						 
						
						
						
						
					 
					
						2025-09-27 07:59:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23b8ee672d 
					 
					
						
						
							
							[Misc] Update openai client example file for multimodal ( #25795 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-27 07:57:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3939152069 
					 
					
						
						
							
							[Misc] Fix codeowners override for v1 sample and attention ( #25037 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-09-27 07:47:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd87bfbf37 
					 
					
						
						
							
							[CI/Build] Reorganize root-level V1 tests ( #25767 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-27 13:51:15 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3613e3ace 
					 
					
						
						
							
							[CI/Build] Add timing to Model Executor Test ( #25799 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-09-26 21:57:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d346ec695e 
					 
					
						
						
							
							[CI/Build] Consolidate model loader tests and requirements ( #25765 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-26 21:45:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c242c98031 
					 
					
						
						
							
							[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL ( #25788 )  
						
						 
						
						
						
						
					 
					
						2025-09-26 20:44:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f1d53d150c 
					 
					
						
						
							
							[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl ( #22872 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com > 
						
						
					 
					
						2025-09-27 03:35:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						92da847cf5 
					 
					
						
						
							
							Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile ( #25782 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-26 18:54:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3958b96bf5 
					 
					
						
						
							
							Add option to restrict media domains ( #25783 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-09-27 01:23:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8bf8f45822 
					 
					
						
						
							
							[Core] Don't count preempted tokens in prefix cache hit rate ( #25787 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhuohan Li <zhuohan123@gmail.com > 
						
						
					 
					
						2025-09-27 00:16:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6f5c0931c1 
					 
					
						
						
							
							[Spec decode] automatically disable mm for text-only draft models ( #25667 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jonas Kuebler <kuebj@amazon.com > 
						
						
					 
					
						2025-09-27 08:10:21 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e33a7ea85 
					 
					
						
						
							
							[Bugfix] Optimize CpuGpuBuffer initialization ( #25447 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Naman Lalit <nl2688@nyu.edu > 
						
						
					 
					
						2025-09-27 08:07:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc48ba0c75 
					 
					
						
						
							
							Kernel-override Determinism [1/n] ( #25603 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bram Wasti <bwasti@meta.com > 
						
						
					 
					
						2025-09-26 16:59:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4778b42660 
					 
					
						
						
							
							Reduce the Cuda Graph memory footprint when running with DBO ( #25779 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-09-26 22:29:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c70ac4b8ff 
					 
					
						
						
							
							[spec decode] Consolidate speculative decode method name for MTP ( #25232 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zixi-qi <qizixi@meta.com > 
						
						
					 
					
						2025-09-26 22:27:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf89202855 
					 
					
						
						
							
							[CI] Fix FlashInfer AOT in release docker image ( #25730 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-26 14:11:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f075693da7 
					 
					
						
						
							
							[V1] address post issues related to  #20059  (part 1) ( #23046 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-26 15:58:19 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f708bd4904 
					 
					
						
						
							
							[CI] Add E2E Blackwell Quantized MoE Test ( #25723 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-26 12:23:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0002b7f0d1 
					 
					
						
						
							
							[Docs] Add Toronto Meetup ( #25773 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-26 12:00:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11aafd9886 
					 
					
						
						
							
							[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition ( #25355 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-26 11:54:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b761df963c 
					 
					
						
						
							
							[Doc]: improve CPU(x86) build-wheel-from-source section ( #25617 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com > 
						
						
					 
					
						2025-09-26 10:26:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						33f6aaf972 
					 
					
						
						
							
							Eagle3 that supports the Minicpm3 model ( #24243 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: liudan <adan@minicpm.com >
Co-authored-by: liudan <liudan@qq.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com > 
						
						
					 
					
						2025-09-26 10:04:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56aafa8c0b 
					 
					
						
						
							
							[Misc] fix unique_filepath ( #25732 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-26 16:56:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d52f2b3a7 
					 
					
						
						
							
							[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray ( #25439 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com > 
						
						
					 
					
						2025-09-26 09:43:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						984d18498a 
					 
					
						
						
							
							[BugFix] Fix using dbo_decode_token_threshold always (and ignoring dbo_prefill_token_threshold) ( #25622 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-26 16:22:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d4d9899860 
					 
					
						
						
							
							[Quantization] Add field to skip unquantized modules for GPTQ config ( #25455 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-26 15:47:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						db1e42f627 
					 
					
						
						
							
							[CI/Build] Fix some V1 tests not being run ( #25569 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-26 20:52:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc9d7b5595 
					 
					
						
						
							
							[CI/Build] Split up Distributed Tests ( #25572 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-26 14:49:33 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe6b19c314 
					 
					
						
						
							
							[Bugfix] Properly abort pooling request. ( #25734 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-26 05:47:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2827b3f4a3 
					 
					
						
						
							
							[CI] Fix test_shared_storage_connector_hashes ( #25748 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-26 20:46:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b6b1d7809 
					 
					
						
						
							
							[Model] Mamba2 varlen refactor  ( #21467 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com > 
						
						
					 
					
						2025-09-26 11:31:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						633f943e30 
					 
					
						
						
							
							[Doc] Update Batch-level DP docs ( #25757 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-26 02:37:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b03b1b97f6 
					 
					
						
						
							
							Support LongCat-Flash-Chat tool call ( #24083 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com > 
						
						
					 
					
						2025-09-26 09:25:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dfb9af2014 
					 
					
						
						
							
							[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk ( #25698 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-09-26 01:25:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						19f76ee68e 
					 
					
						
						
							
							[misc] refactor speculative config ( #25657 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zxw <1020938856@qq.com > 
						
						
					 
					
						2025-09-26 01:22:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd70437a4f 
					 
					
						
						
							
							Remove cuda hard-code in compute_causal_conv1d_metadata ( #25555 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Icey <1790571317@qq.com > 
						
						
					 
					
						2025-09-26 01:19:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99b3a504c5 
					 
					
						
						
							
							[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. ( #25743 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-09-26 01:18:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e30010d2f 
					 
					
						
						
							
							fix: print outputt offline_inference/base/chat.py example ( #25744 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Iceber Gu <caiwei95@hotmail.com > 
						
						
					 
					
						2025-09-26 01:18:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						52621c8f5c 
					 
					
						
						
							
							[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X ( #25703 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com > 
						
						
					 
					
						2025-09-26 01:18:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d48f4d6daf 
					 
					
						
						
							
							perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled ( #25739 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-26 01:18:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e84e0735c7 
					 
					
						
						
							
							fix: revert cast to cpu in MsgpackEncoder._encode_tensor to avoid hidden performance regressions ( #25738 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-26 01:18:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3edf87d25f 
					 
					
						
						
							
							[CI/Build] fix doc build warning: Failed to get 'name: description' pair ( #25733 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io > 
						
						
					 
					
						2025-09-26 01:18:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						392edee34a 
					 
					
						
						
							
							EVS Support (Video tokens pruning) ( #22980 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-26 11:54:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						983056e456 
					 
					
						
						
							
							[Misc] Remove unnecessary memoryviews in shm_broadcast.py ( #25721 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-26 03:11:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						13dd93c667 
					 
					
						
						
							
							[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder ( #25701 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-25 18:21:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53a30845be 
					 
					
						
						
							
							Llamas 3.1 405B fp4 changes upstreaming from 355_wip ( #25135 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com > 
						
						
					 
					
						2025-09-25 19:16:53 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8b77328ffe 
					 
					
						
						
							
							[Misc] Don't log shm dequeue delay warning on worker side ( #25720 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-26 01:08:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9fe4c2bdb9 
					 
					
						
						
							
							[Refactor] Remove DeepGEMM OP Register ( #25710 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-25 20:13:41 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						081b5594a2 
					 
					
						
						
							
							Fix routing_bias dtype  ( #25711 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shu Wang. <shuw@nvidia.com > 
						
						
					 
					
						2025-09-25 23:35:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57329a8c01 
					 
					
						
						
							
							[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 ( #25708 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com > 
						
						
					 
					
						2025-09-25 16:10:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c435c9bce 
					 
					
						
						
							
							[Core] Enable command line logging for LLMEngine ( #25610 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com > 
						
						
					 
					
						2025-09-25 15:31:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e71b8e210d 
					 
					
						
						
							
							[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. ( #24986 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-25 15:22:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89fa54e6f7 
					 
					
						
						
							
							[Optimization] Use a cheaper cache key in get_model_architecture ( #25682 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-25 17:54:20 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d54bdcb73 
					 
					
						
						
							
							[Optimization] Streamline InputPreprocessor ( #25702 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-25 21:06:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6b0fcbbf43 
					 
					
						
						
							
							[Misc] Simplify test_argsort_mm_positions ( #25690 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-25 18:23:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0fa673af4c 
					 
					
						
						
							
							[V0 deprecation] Clean up LoRA  ( #25686 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-25 18:12:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3468f17ebe 
					 
					
						
						
							
							[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names ( #25489 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com > 
						
						
					 
					
						2025-09-25 17:37:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71b25b0d48 
					 
					
						
						
							
							[V0 deprecation] Clean up V0 fallback in compilation config ( #25675 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-25 17:29:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ea80c87d9 
					 
					
						
						
							
							[Model] Define merge_by_field_config MM interface ( #25676 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-25 17:13:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b8d9e4a326 
					 
					
						
						
							
							[Model] Add optional parameter to reasoning parser constructor ( #25554 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-26 01:12:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						13cc7f5370 
					 
					
						
						
							
							[BugFix] Fix DBO hang ( #25625 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-25 17:04:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						916bd9204d 
					 
					
						
						
							
							Revert "[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super()" ( #25681 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-25 09:45:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e04a1b6b21 
					 
					
						
						
							
							[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… ( #24662 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: AlonKejzman <alonkeizman@gmail.com > 
						
						
					 
					
						2025-09-25 15:40:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e5df88c92 
					 
					
						
						
							
							[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning ( #25532 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-09-25 15:16:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0754ac4c49 
					 
					
						
						
							
							[Misc] Remove cruft file in repo ( #25678 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-25 08:05:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						03858e6d1c 
					 
					
						
						
							
							[Bugfix] Fix InternS1 video processing after Transformers v4.56 ( #25644 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-25 14:46:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						532a6cfccb 
					 
					
						
						
							
							[ux] Switch a warning to debug about a pytorch fallback ( #23750 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-25 14:38:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb32335e35 
					 
					
						
						
							
							[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata ( #25652 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-25 13:29:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69a8c8e99a 
					 
					
						
						
							
							[torch.compile] Make Query Quantization Fusable ( #24914 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jonas Kuebler <kuebj@amazon.com > 
						
						
					 
					
						2025-09-25 09:25:12 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c340da4df 
					 
					
						
						
							
							[misc] log info messages by default for hanging / busy / idle ( #25627 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-25 21:14:57 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f17117606 
					 
					
						
						
							
							[mypy] Fix wrong type annotations related to tuple ( #25660 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-25 13:00:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e9a77e037 
					 
					
						
						
							
							[Hardware][RISC-V] Add riscv64 support for vLLM with scalar ( #22112 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chenlang <chen.lang5@zte.com.cn >
Co-authored-by: chenlang <10346245@zte.com.cn > 
						
						
					 
					
						2025-09-25 20:46:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d2af67441d 
					 
					
						
						
							
							[XPU][Triton]add xpu config in triton_reshape_and_cache_flash ( #25643 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-09-25 12:38:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0bcc3a160d 
					 
					
						
						
							
							[CI/Build] Fix flaky entrypoints test ( #25663 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-25 12:19:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70fbdb26e9 
					 
					
						
						
							
							Add backward compatibility for guided_... API ( #25615 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-09-25 19:45:25 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f570f1caa 
					 
					
						
						
							
							[V0 deprecation] Remove unreachable model_config.supported_tasks ( #25642 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-25 11:26:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eaeca3cd7f 
					 
					
						
						
							
							[Bugfix] Parse SpeculativeConfig Error ( #25142 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zxw <1020938856@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-25 11:09:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12c1287d64 
					 
					
						
						
							
							[mypy] Further improve MM type annotations ( #25654 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-25 10:57:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						17b4c6685c 
					 
					
						
						
							
							[Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling ( #25648 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-25 18:36:01 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c2b2ccece 
					 
					
						
						
							
							[Bugfix] Add triton.language.tensor placeholder ( #25649 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai > 
						
						
					 
					
						2025-09-25 10:31:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7be9ffcd9f 
					 
					
						
						
							
							[Misc] Fix Qwen3-VL video_grid_thw typing ( #25646 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-25 10:16:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						393de22d2e 
					 
					
						
						
							
							[fix] Update torch version in cpu-build.txt for AArch64/ppc64le and Darwin ( #25579 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com > 
						
						
					 
					
						2025-09-25 09:39:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1260180c67 
					 
					
						
						
							
							Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… ( #25607 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com > 
						
						
					 
					
						2025-09-25 08:05:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af4ee63e0e 
					 
					
						
						
							
							typo: remove duplicate is ( #25641 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: nicole-lihui <nicole.li@daocloud.io > 
						
						
					 
					
						2025-09-25 00:46:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc092ea873 
					 
					
						
						
							
							Map CwmForCausalLM to llama and LlamaForCausalLM ( #25611 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-25 07:37:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						755ed7b05b 
					 
					
						
						
							
							[Misc] Simplify PoolerOutput and move to v1/outputs ( #25629 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-25 06:47:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a676e668ee 
					 
					
						
						
							
							[Bugfix] fix apply_temperature to avoid nan in probs ( #24734 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: courage17340 <courage17340@163.com > 
						
						
					 
					
						2025-09-25 05:32:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c85be1f6dd 
					 
					
						
						
							
							optimize: eliminate duplicate split_enc_dec_inputs calls ( #25573 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: nicole-lihui <nicole.li@daocloud.io > 
						
						
					 
					
						2025-09-25 05:03:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						845adb3ec6 
					 
					
						
						
							
							[Model] Add LongCat-Flash  ( #23991 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yangxurui <yangxurui@meituan.com >
Co-authored-by: yangxurui <yangxurui@meituan.com > 
						
						
					 
					
						2025-09-24 21:53:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90b139cfff 
					 
					
						
						
							
							Enable Fbgemm NVFP4 on Dense models ( #25609 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Saman Keon <samanamp@outlook.com > 
						
						
					 
					
						2025-09-24 21:12:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4492e3a554 
					 
					
						
						
							
							[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() ( #25613 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-24 18:52:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05c19485a5 
					 
					
						
						
							
							[Kernel] Support DCP for Triton backend  ( #25132 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wei Wei <wwei6@meta.com > 
						
						
					 
					
						2025-09-24 18:09:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						52d0cb8458 
					 
					
						
						
							
							[Model] Improve DotsOCRForCausalLM ( #25466 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-25 07:58:08 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c1e496a75 
					 
					
						
						
							
							[MISC] replace c10::optional with std::optional ( #25602 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shiyan Deng <dsy842974287@meta.com > 
						
						
					 
					
						2025-09-24 16:56:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7f27ea648 
					 
					
						
						
							
							Improve --help for enhanced user experience ( #24903 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-24 23:08:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f29141258 
					 
					
						
						
							
							[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor ( #25517 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-24 18:52:36 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6160ba4151 
					 
					
						
						
							
							feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel ( #25503 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com > 
						
						
					 
					
						2025-09-24 18:50:04 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fea8006062 
					 
					
						
						
							
							[Logging] Improve log for when DeepEP HT disables CUDA Graphs ( #25531 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-09-24 22:43:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e6750d0b18 
					 
					
						
						
							
							[V0 Deprecation] Remove unused classes in attention ( #25541 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-24 13:24:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c853050e7 
					 
					
						
						
							
							[Docs] Enable fail_on_warning for the docs build in CI ( #25580 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-24 19:30:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f84a472a03 
					 
					
						
						
							
							Suppress benign cuBLAS warning when capturing cudagraphs with DBO ( #25596 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-09-24 19:02:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54e42b72db 
					 
					
						
						
							
							Support mnnvl all2allv from Flashinfer ( #21003 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com > 
						
						
					 
					
						2025-09-24 14:38:16 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2dda3e35d0 
					 
					
						
						
							
							[Bugfix] add cache model when from object storage get model ( #24764 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-09-24 18:11:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d83f3f7cb3 
					 
					
						
						
							
							Fixes and updates to bench_per_token_quant_fp8 ( #25591 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-24 08:30:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						302eb941f3 
					 
					
						
						
							
							[ROCm][Build][Bugfix] Fix ROCm base docker whls installation order ( #25415 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-24 11:25:10 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						487745ff49 
					 
					
						
						
							
							[ROCm][Bugfix] Only enable +rms_norm based on aiter if not explicitly disabled ( #25275 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-24 11:24:39 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9313be5017 
					 
					
						
						
							
							[Misc] Improve type annotations for jsontree ( #25577 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-24 22:49:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8938774c79 
					 
					
						
						
							
							Move DeviceConfig, ObservabilityConfig, SpeechToTextConfig to their own files ( #25564 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-24 13:59:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e18b714b2e 
					 
					
						
						
							
							[Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools in non-streaming output ( #25405 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: taohui <taohui3@gmail.com > 
						
						
					 
					
						2025-09-24 20:58:00 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1068903fd 
					 
					
						
						
							
							[docs] fix nixl kv_connector_extra_config.backends key ( #25565 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Peter Pan <peter.pan@daocloud.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-24 11:00:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						164299500b 
					 
					
						
						
							
							[Benchmark] Fix regression in structured output benchmark ( #25500 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-24 10:40:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58c360d9be 
					 
					
						
						
							
							[Bug] fix import and unit test ( #25558 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com > 
						
						
					 
					
						2025-09-24 10:17:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						42488dae69 
					 
					
						
						
							
							[Bugfix] Fix dummy video number of frames calculation ( #25553 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-24 09:47:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b67dece2d8 
					 
					
						
						
							
							[misc] update the warning message ( #25566 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-24 17:24:35 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2338daffd3 
					 
					
						
						
							
							[BugFix] Potential Fix for FA3 full-cudagraph IMA  ( #25490 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-24 02:04:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e19a848d4 
					 
					
						
						
							
							[V0 Deprecation] Remove max_seq_len_to_capture ( #25543 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-24 01:51:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						77a7fce1bb 
					 
					
						
						
							
							[CI/Build] add nightly prime-rl integration tests ( #25207 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jackmin801 <ongjackm@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-24 08:44:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6488f3481b 
					 
					
						
						
							
							[Misc]] Move processing context to multimodal directory ( #25548 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-24 08:15:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27ec3c78f3 
					 
					
						
						
							
							[CI/Build] Fix v1 OOT registration test ( #25547 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-24 08:03:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cbcfb94de 
					 
					
						
						
							
							[Bugfix][CPU] Skip unsupported custom op register on CPU ( #25534 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-24 06:21:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fed8a9b107 
					 
					
						
						
							
							[Misc] Retry HF processing if "Already borrowed" error occurs ( #25535 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-23 22:32:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						190c45a6af 
					 
					
						
						
							
							[TPU][Bugfix] fix the missing apply_model in tpu worker ( #25526 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-09-24 05:18:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5caaeb714c 
					 
					
						
						
							
							[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls ( #25514 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ben Browning <bbrownin@redhat.com > 
						
						
					 
					
						2025-09-24 03:20:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d747c2ef18 
					 
					
						
						
							
							[Perf] Fix jit compiles at runtime of fla gated delta rule ( #25432 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-24 11:16:13 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c30b405b8f 
					 
					
						
						
							
							[Spec Decode] Enable FlashInfer Spec Decoding ( #25196 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Co-authored-by: lhsjohn <huashuoli@tencent.com > 
						
						
					 
					
						2025-09-23 22:29:58 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						77d906995c 
					 
					
						
						
							
							[KV sharing] Re-land Gemma3n model changes from  #22628  ( #24357 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-09-23 19:25:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						359d293006 
					 
					
						
						
							
							[fix]: add Arm 4bit fused moe support ( #23809 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com > 
						
						
					 
					
						2025-09-24 01:32:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9df8da548e 
					 
					
						
						
							
							[BugFix] Fix MLA assert with CUTLASS MLA ( #25478 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-23 21:09:43 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf68fd76a9 
					 
					
						
						
							
							[Compile] Fix AMD Compile Error ( #25518 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-24 00:42:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de94289a98 
					 
					
						
						
							
							[Core] Support weight_loader_v2 for UnquantizedLinearMethod ( #23036 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-09-23 18:30:26 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1983609239 
					 
					
						
						
							
							[Bugfix] Use a separate FlashInfer workspace buffer for trtllm-gen ( #25520 )  
						
						 
						
						
						
						
					 
					
						2025-09-24 00:19:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d06b5a95cb 
					 
					
						
						
							
							[V1][Metrics] Add per-request TPOT histogram ( #24015 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: baxingpiaochong <771405853@qq.com > 
						
						
					 
					
						2025-09-23 18:19:04 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be0bb568c9 
					 
					
						
						
							
							[Model] Support SeedOss Reason Parser ( #24263 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yan Lu <luyan@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 18:15:51 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c8bde93367 
					 
					
						
						
							
							[BUG] Allows for RunAI Streamer and Torch.compile cache to be used together ( #24922 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ahao-anyscale <ahao@anyscale.com > 
						
						
					 
					
						2025-09-23 18:13:32 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88d7bdbd23 
					 
					
						
						
							
							[Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv' ( #25519 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-24 00:07:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d235b874a 
					 
					
						
						
							
							Add CUTLASS FP8 MOE benchmark scripts and kernel config ( #25302 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenxi Yang <cxyang@fb.com >
Co-authored-by: Chenxi Yang <cxyang@fb.com > 
						
						
					 
					
						2025-09-23 18:07:42 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ad5e50adf 
					 
					
						
						
							
							Improve output when failing json.loads() on structured output test ( #25483 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dougbtv <dosmith@redhat.com > 
						
						
					 
					
						2025-09-23 18:03:31 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc464a3d39 
					 
					
						
						
							
							[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch ( #25505 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-23 18:00:29 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1210e4d95b 
					 
					
						
						
							
							[Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1 ( #25509 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-09-23 16:57:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0b24ea030 
					 
					
						
						
							
							[Perf] Increase default max splits for FA3 full cudagraphs ( #25495 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-23 16:53:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bde2a1a8a4 
					 
					
						
						
							
							[ROCm] Small functional changes for gptoss ( #25201 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jpvillam <jpvillam@amd.com >
Co-authored-by: jpvillam <jpvillam@amd.com > 
						
						
					 
					
						2025-09-23 23:39:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e25b12236 
					 
					
						
						
							
							[Kernel] [Mamba] Remove BLOCK_H=1 from list of tuneable configurations for _chunk_cumsum_fwd_kernel ( #25197 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com > 
						
						
					 
					
						2025-09-23 23:23:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c85d75cf08 
					 
					
						
						
							
							Add VLLM_NVTX_SCOPES_FOR_PROFILING=1 to enable nvtx.annotate scopes ( #25501 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Corey Lowman <clowman1993@gmail.com > 
						
						
					 
					
						2025-09-23 22:50:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						abad204be6 
					 
					
						
						
							
							[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting ( #25359 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-09-23 15:49:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7361ab379f 
					 
					
						
						
							
							Remove redundant mutates_args and dispatch_key for direct_register_custom_op ( #25512 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 22:48:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95bc60e4cb 
					 
					
						
						
							
							[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI ( #25428 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com > 
						
						
					 
					
						2025-09-23 15:46:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f2954f724 
					 
					
						
						
							
							Fix triton_reshape_and_cache_flash.py triton import ( #25522 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 15:26:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eca7be9077 
					 
					
						
						
							
							Add VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE & VLLM_ENABLE_INDUCTOR_COORDINA… ( #25493 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rouchenzi <ruochenwen@gmail.com >
Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com > 
						
						
					 
					
						2025-09-23 22:17:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						969b4da3a6 
					 
					
						
						
							
							[V0 Deprecation] Remove placeholder attn ( #25510 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-09-23 22:12:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f8c4b890a 
					 
					
						
						
							
							[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] ( #24830 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-09-23 15:11:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae002924e9 
					 
					
						
						
							
							[CI/Build] Fix and re-enable v1 PP test on CI ( #25496 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-23 21:58:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						690f948e4a 
					 
					
						
						
							
							[Bugfix] Fix for the import error from  #24588  ( #25481 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-23 21:31:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08275ec0a2 
					 
					
						
						
							
							[Build] Update Xgrammar to 0.1.25 ( #25467 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-23 21:25:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c828d1bf98 
					 
					
						
						
							
							[Bugfix] gpt-oss container tool output bug ( #25485 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alec Solder <alecs@fb.com >
Co-authored-by: Alec Solder <alecs@fb.com > 
						
						
					 
					
						2025-09-23 20:43:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8b8a8afc89 
					 
					
						
						
							
							[CI] Fix Pre-commit Issue ( #25497 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-24 04:09:37 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8bdd8b5c51 
					 
					
						
						
							
							Enable symmetric memory all reduce by default only enabling for TP ( #25070 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 15:53:00 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a8ffc4f0f2 
					 
					
						
						
							
							[Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible with FA3 ( #25508 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 12:49:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d5944d5146 
					 
					
						
						
							
							[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue ( #25406 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com > 
						
						
					 
					
						2025-09-23 15:44:35 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24fab45d96 
					 
					
						
						
							
							[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE ( #25444 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 15:29:26 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63400259d0 
					 
					
						
						
							
							[Performance] Move apply_w8a8_block_fp8_linear to an op class ( #24666 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com > 
						
						
					 
					
						2025-09-23 12:03:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c1c81a3de 
					 
					
						
						
							
							[core] add nccl symmetric memory for all reduce ( #24532 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Amir Samani <asamani@nvidia.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 14:33:06 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3a7828010 
					 
					
						
						
							
							[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 ( #24988 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com > 
						
						
					 
					
						2025-09-23 14:31:45 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5abb117901 
					 
					
						
						
							
							[Core] Ensure LoRA linear respect the base_layer's tp_size and tp_rank ( #25487 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-23 18:19:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						867ecdd1c8 
					 
					
						
						
							
							[Spec Decode][CI] Add e2e test for examples/spec_decode.py and prevent breaking Acceptance Length ( #24531 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-23 10:46:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24e8222745 
					 
					
						
						
							
							[Misc] Reduce initialization time of auto_tune ( #23682 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Weida Hong <wdhongtw@google.com > 
						
						
					 
					
						2025-09-23 17:34:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						100b630a60 
					 
					
						
						
							
							[V1][Kernel] Add triton implementation for reshape_and_cache_flash ( #24503 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-23 12:52:40 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						527821d191 
					 
					
						
						
							
							Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu ( #25346 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-23 09:45:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						846197f505 
					 
					
						
						
							
							[Log] Optimize kv cache memory log from Bytes to GiB ( #25204 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-23 12:44:37 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2357480b1a 
					 
					
						
						
							
							[BugFix] Fix UB in per_token_group_quant.cu ( #24913 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shreeasish Kumar <shreeasish@rivosinc.com > 
						
						
					 
					
						2025-09-23 09:14:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f11e3c516b 
					 
					
						
						
							
							[Kernels] Support blocked fp8 quantization for compressed tensors MoE ( #25219 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 16:11:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						875d6def90 
					 
					
						
						
							
							Add backward compatibility for GuidedDecodingParams ( #25422 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-23 17:07:30 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc1dc7ed6d 
					 
					
						
						
							
							[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support ( #24845 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-09-23 16:02:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a903669e10 
					 
					
						
						
							
							[V1] Remove V0 code paths for Hybrid models ( #25400 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-09-23 08:26:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c58742dff 
					 
					
						
						
							
							[UX] Change kv-cache-memory log level to debug ( #25479 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-23 08:01:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c966e440e 
					 
					
						
						
							
							[XPU] Fix MOE DP accuracy issue on XPU ( #25465 )  
						
						 
						
						
						
						
					 
					
						2025-09-23 14:32:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da5e7e4329 
					 
					
						
						
							
							[Docs] NixlConnector quickstart guide ( #24249 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Peter Pan <peter.pan@daocloud.io >
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com > 
						
						
					 
					
						2025-09-23 14:23:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f05a4f0e34 
					 
					
						
						
							
							[P/D] Support NIXL connector to disconnect during a clean shutdown ( #24423 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-09-23 16:08:02 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61d1b35561 
					 
					
						
						
							
							[BugFix] Register expert_map as named buffer for wake_up and sleep ( #25458 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wuxibin <wuxibin@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-23 21:49:13 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b6a136b58c 
					 
					
						
						
							
							[CI/Build] Fix disabled v1 attention backend selection test ( #25471 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-23 13:05:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d9fe260dd 
					 
					
						
						
							
							[docs] Benchmark Serving Incorrect Arg ( #25474 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-09-23 06:05:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						273690a50a 
					 
					
						
						
							
							[Core] Optimize LoRA weight loading ( #25403 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-23 18:19:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						231c2c63e4 
					 
					
						
						
							
							[Bugfix] Fix idefics3 tie_word_embeddings ( #25454 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-23 10:06:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4322c553a6 
					 
					
						
						
							
							[Test]: Hermes tool parser stream output error in Qwen3 case ( #25203 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com > 
						
						
					 
					
						2025-09-23 17:56:31 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						babad6e5dd 
					 
					
						
						
							
							[Misc] Move DP for ViT code inside model executor dir ( #25459 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-23 09:20:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9383cd6f10 
					 
					
						
						
							
							[Frontend] Add a new xml-based tool parser for qwen3-coder ( #25028 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhikaiiii <1658973216@qq.com > 
						
						
					 
					
						2025-09-23 16:07:27 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba8d2165b6 
					 
					
						
						
							
							Handle triton kernel import exception ( #25319 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-09-23 00:56:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c98be0a232 
					 
					
						
						
							
							[Model] Enable DP for ViT in Qwen2-VL ( #25445 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-23 05:17:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5774b0a1da 
					 
					
						
						
							
							[NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend ( #25121 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <Chendi.Xue@intel.com > 
						
						
					 
					
						2025-09-23 04:17:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e8db44f883 
					 
					
						
						
							
							[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP ( #24588 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-09-22 21:01:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fafbe11af4 
					 
					
						
						
							
							[Docs] Fix griffe warnings in vllm/lora/ops ( #25369 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-23 03:42:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78237e43bf 
					 
					
						
						
							
							[Bugfix] Remove contiguous output req for context parallel MLA ( #25414 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-22 20:26:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eea1783989 
					 
					
						
						
							
							[benchmarks]allow skip ready check for bench serve ( #25420 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com > 
						
						
					 
					
						2025-09-23 03:21:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f225ea7dd9 
					 
					
						
						
							
							[XPU] Fix compile_size is None case. ( #25433 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-09-23 03:09:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc97733da8 
					 
					
						
						
							
							[feat] Support MRoPE +  YaRN ( #25384 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com > 
						
						
					 
					
						2025-09-23 03:04:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4741239db7 
					 
					
						
						
							
							[Bug] Fix Long Context OOM Issue ( #25290 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-22 22:04:15 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c625f9043c 
					 
					
						
						
							
							[V0 deprecation] Remove _set_default_args_v0 function ( #25409 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-23 01:52:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6fa78d8f23 
					 
					
						
						
							
							[V0 deprecation] Remove platform v1 controling interface ( #25410 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-23 01:48:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9949aa2ef1 
					 
					
						
						
							
							[Perf] Apply torch.compile for per_block_cast_to_fp8 ( #24611 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-22 19:42:45 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b7bed9c38 
					 
					
						
						
							
							[Performance] Remove input pads in cutlass_mla and optimize v_proj output handling ( #25184 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-09-22 19:20:53 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac0048c0ae 
					 
					
						
						
							
							[BugFix] [DP/EP] Fix slow execution when BS <= DP ( #25407 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Chris Bamford <chrisbam4d@gmail.com > 
						
						
					 
					
						2025-09-22 17:26:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						090197034f 
					 
					
						
						
							
							[Bugfix] Fix missing clear_connector_metadata ( #25397 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-23 08:10:59 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f31ff87460 
					 
					
						
						
							
							[Core] Drop overly aggressive whisper assertion ( #25408 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-22 17:09:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d588cd2406 
					 
					
						
						
							
							[Bugfix] fix custom op test ( #25429 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Luka Govedič <lgovedic@redhat.com > 
						
						
					 
					
						2025-09-23 00:07:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45d7d852d3 
					 
					
						
						
							
							[Frontend] Responses API MCP tools for built in tools and to pass through headers ( #24628 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-22 23:38:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8bed179109 
					 
					
						
						
							
							[TPU] update torch_xla dependency for PyPI compatibility ( #25278 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Johnny Yang <johnnyyang@google.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-09-22 16:14:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f552d5e578 
					 
					
						
						
							
							[CI/Build] Skip Qwen3-VL initialization tests until models are actually released ( #25394 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-22 13:18:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8db2939289 
					 
					
						
						
							
							[KV offload][5/N] Add CPUOffloadingSpec ( #24251 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-09-22 12:30:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d5e0fca264 
					 
					
						
						
							
							[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug ( #23091 ), fix test ( #24376 ), and prep for custom op matching ( #24604 ) ( #24542 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: luka <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-22 12:30:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d0ee5a564 
					 
					
						
						
							
							[misc] Remove RFC review hours reference ( #25416 )  
						
						 
						
						
						
						
					 
					
						2025-09-22 12:16:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						922979bfcc 
					 
					
						
						
							
							[DP] support torchrun external launcher with Data Parallelism ( #24899 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com > 
						
						
					 
					
						2025-09-22 12:06:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						239ef0c1ac 
					 
					
						
						
							
							[CI Failure] Fix fp8 kv cache on <SM90 ( #25396 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-22 18:27:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1d7f95b85c 
					 
					
						
						
							
							[Compiler] Disable Inductor standalone compile by default ( #25391 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-09-22 17:37:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cfbee3d0e7 
					 
					
						
						
							
							[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables ( #25274 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qqma <qqma@amazon.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: qqma <qqma@amazon.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-22 10:37:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						06a41334c7 
					 
					
						
						
							
							[EPLB] Reduce EPLB Inference Overhead ( #24573 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bowen Wang <abmfy@icloud.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-09-22 16:31:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						175811e3b5 
					 
					
						
						
							
							[V1][Attention] Split triton_attn in triton-only and rocm specific backends  ( #24648 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com > 
						
						
					 
					
						2025-09-22 15:20:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c10101a3eb 
					 
					
						
						
							
							[Bugfix] Fix several issues with p2p xPyD in GET type ( #23993 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Csrayz <jover@cmbchina.com >
Signed-off-by: ivyilike <pww123@cmbchina.com >
Co-authored-by: ivyilike <pww123@cmbchina.com > 
						
						
					 
					
						2025-09-22 14:53:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac243886b0 
					 
					
						
						
							
							[Kernel] MI-300X triton moe configs ( #23445 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sara Kokkila Schumacher <saraks@ibm.com > 
						
						
					 
					
						2025-09-22 14:29:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d2c56b7a9 
					 
					
						
						
							
							Make mypy behave like a proper pre-commit hook ( #25313 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-22 12:23:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						64c824cd78 
					 
					
						
						
							
							Make pickle import check fast ( #25379 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-22 04:08:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						417a164af6 
					 
					
						
						
							
							[Misc] Remove unused encoder-decoder error strings ( #25374 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-22 11:04:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b6f01bd9a7 
					 
					
						
						
							
							refactor: abstract graph mode support into platform interface ( #25161 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com > 
						
						
					 
					
						2025-09-22 10:22:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4cf71cc88a 
					 
					
						
						
							
							[TPU] Deprecate xm.mark_step in favor of `torch_xla.sync  ( #25254 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-22 10:12:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a66d131381 
					 
					
						
						
							
							[TPU][Bugfix][CI] Fix broken tests/build dependency ( #25255 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-22 09:55:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21467f9a1c 
					 
					
						
						
							
							Enable Eagle3 speculative decoding for GPT-OSS model ( #25246 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com > 
						
						
					 
					
						2025-09-22 08:50:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f92d952632 
					 
					
						
						
							
							[V0 Deprecation] Remove MultiModalPlaceholderMap ( #25366 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-22 08:49:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d0b827cbd 
					 
					
						
						
							
							[V0 Deprecation] Remove V0-only methods in multi-modal registry ( #25362 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-22 13:58:26 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0eecb31663 
					 
					
						
						
							
							[Bugfix] Fix hermes tool parser handling of non-string argument types ( #22002 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangzi <3220100013@zju.edu.cn >
Signed-off-by: David Chen <530634352@qq.com >
Co-authored-by: wangzi <3220100013@zju.edu.cn >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-22 11:35:39 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						793be8d057 
					 
					
						
						
							
							[Docs] GSM8K Accuracy Evaluation doc update ( #25360 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Chen <530634352@qq.com > 
						
						
					 
					
						2025-09-22 02:49:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b57a433da 
					 
					
						
						
							
							[Model] Support Dots OCR ( #24645 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: yinz-aizip <yinz@aizip.ai > 
						
						
					 
					
						2025-09-22 02:24:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5aeb925452 
					 
					
						
						
							
							Multimodal - audio tests ( #25285 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Debolina Roy <debroy@redhat.com > 
						
						
					 
					
						2025-09-22 07:07:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04d3752329 
					 
					
						
						
							
							[Bugfix][V0 Deprecation][CI] use async mock and await for async method ( #25325 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang <lymailforjob@gmail.com > 
						
						
					 
					
						2025-09-22 07:06:16 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc6e542d9f 
					 
					
						
						
							
							Remove V0 attention backends ( #25351 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-21 16:03:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af7dfb0d1a 
					 
					
						
						
							
							[Perf] Further optimization for Qwen3-VL fast_pos_embed_interpolate ( #25347 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-21 20:12:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c3ffdbecc 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 sampling metadata ( #25345 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-21 10:37:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c438b2951c 
					 
					
						
						
							
							feat: Enable engine-level arguments with speculators models ( #25250 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com > 
						
						
					 
					
						2025-09-21 11:04:45 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ff8ebb2d7 
					 
					
						
						
							
							[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor ( #25334 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-21 08:52:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						26e673fe93 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 Sequence class & Sampler ( #25332 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-21 08:52:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65a5910ce3 
					 
					
						
						
							
							[Optimization] Cache chat template result when processor fails to be loaded ( #25341 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-21 19:41:02 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9aea7373ff 
					 
					
						
						
							
							[Bugfix] Typos in error message for missing model config file ( #25339 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com > 
						
						
					 
					
						2025-09-21 04:36:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30d08911f7 
					 
					
						
						
							
							[MM][Perf] Minor Optimization on Qwen3-VL fast_pos_embed_interpolate ( #25337 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-21 11:05:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf56cf78b4 
					 
					
						
						
							
							[V1] Add sliding window support to Flex Attention backend ( #24089 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-21 05:08:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ed82d1974 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 MP executor ( #25329 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-20 21:26:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12dbd834cf 
					 
					
						
						
							
							[V0 Deprecation] Remove from_seq_group methods ( #25330 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-20 21:10:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						035fd2bd2c 
					 
					
						
						
							
							[Multi Modal][Performance] Fused Q,K's apply_rope in more models ( #25005 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-21 03:55:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cd885bd54 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 model runner base & simplify worker base ( #25328 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-20 20:49:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						62b38dc832 
					 
					
						
						
							
							[Doc] improve test-pipeline.yaml documentation ( #25305 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com > 
						
						
					 
					
						2025-09-20 20:29:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c99db8c8dd 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 core ( #25321 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-20 19:58:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72dd1595b4 
					 
					
						
						
							
							[CI] Skip tests failing on main ( #25326 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-20 19:57:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						572ddf83ce 
					 
					
						
						
							
							[Chore] Remove unused sampler in models ( #25324 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-20 19:53:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86647d1cd0 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 Output Processor ( #25320 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-20 17:57:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						52c2a8d4ad 
					 
					
						
						
							
							[V0 Deprecation] Remove LLMEngine ( #25033 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-20 17:56:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						367a480bd3 
					 
					
						
						
							
							[Docs] Fix warnings in vllm/profiler and vllm/transformers_utils ( #25220 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-20 16:39:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bef180f009 
					 
					
						
						
							
							[V0 Deprecation] Enable the remaining multimodal tests in V1 ( #25307 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-20 17:50:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d88918e4c2 
					 
					
						
						
							
							[Core] Enable sharded state loader for V1 engine and enhance test coverage ( #25308 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: pengdrumli <pengdrumli@tencent.com > 
						
						
					 
					
						2025-09-20 21:15:22 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c713a9711 
					 
					
						
						
							
							[Model] Cleanup InternViT's data parallel implementation  ( #25306 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-20 05:46:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf8b26cad1 
					 
					
						
						
							
							Generate _ModelInfo properties file when loading to improve loading speed ( #23558 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Manoel Marques <manoel.marques@ibm.com >
Signed-off-by: Manoel Marques <manoelmrqs@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-20 11:51:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						032d661d27 
					 
					
						
						
							
							[Docs] Fix warnings in mkdocs build (continued)  ( #25042 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-09-20 11:45:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e08a3a3fdb 
					 
					
						
						
							
							[CI Failure] Disable FlashInfer RoPE to unblock CI ( #25299 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-20 08:16:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d9a1d2de5 
					 
					
						
						
							
							[V1] Support LLM.apply_model ( #18465 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-20 07:14:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be874c0201 
					 
					
						
						
							
							[Bugfix] Fix Qwen3-VL-MoE weight loading for EP ( #25300 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-20 00:04:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9607d5eb44 
					 
					
						
						
							
							[Hybrid Allocator] Support full attention with different hidden size  ( #25101 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-19 23:43:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c60e6137f0 
					 
					
						
						
							
							[Optimization] Avoid repeated model architecture conversion for pooling models ( #25261 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-20 13:30:22 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f91480b2d4 
					 
					
						
						
							
							[Bugfix] fix tool call arguments is empty ( #25223 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: xin.li <xin.li@daocloud.io > 
						
						
					 
					
						2025-09-20 13:29:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c5f82e5aa 
					 
					
						
						
							
							[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention ( #25298 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <Chendi.Xue@intel.com > 
						
						
					 
					
						2025-09-20 04:41:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b7f186bbb3 
					 
					
						
						
							
							[BugFix] Exclude self when checking for port collision ( #25286 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-20 12:28:31 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3642909617 
					 
					
						
						
							
							[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) ( #25268 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JartX <sagformas@epdcenter.es > 
						
						
					 
					
						2025-09-20 11:18:13 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c308501cb6 
					 
					
						
						
							
							Improve weight loading for encoder models in Transformers backend ( #25289 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-20 03:11:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						535d80056b 
					 
					
						
						
							
							[Misc] Support more collective_rpc return types ( #25294 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-20 02:02:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a25ade5d47 
					 
					
						
						
							
							[BugFix] Ensure appropriate guards in destructors ( #25284 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-20 09:06:34 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8945b001db 
					 
					
						
						
							
							[torch.compile] CUDAGraph Inductor partition integration ( #24281 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Signed-off-by: boyuanfeng <boyuan@meta.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-20 01:02:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b8a287a0a8 
					 
					
						
						
							
							[docs] Prompt Embedding feature support ( #25288 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-19 17:46:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7e713616a 
					 
					
						
						
							
							test: Remove vestigial skip for prompt embeds tests after landing v1 Prompt Embeds support ( #25291 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-19 17:33:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a36c675817 
					 
					
						
						
							
							Don't skip special tokens with hermes-style tool calling ( #25281 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-09-19 17:33:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3da17c2cc2 
					 
					
						
						
							
							[Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE  #2969  ( #25090 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Kabela <lucaskabela@meta.com > 
						
						
					 
					
						2025-09-19 20:27:21 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14c1432789 
					 
					
						
						
							
							[BugFix] Fix async scheduling CPU tensor race take 2 ( #25279 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-19 16:34:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ee7a66dd9a 
					 
					
						
						
							
							allow disable flashinfer prefill ( #25276 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-09-19 22:59:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						431535b522 
					 
					
						
						
							
							Enable modelopt gemma3 nvfp4/fp8, make workflow more robust ( #22771 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-19 22:40:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						711e912946 
					 
					
						
						
							
							[Compile] Fix Compile Warning for Ignoring MIN_BLOCK_PER_SM ( #25193 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-19 16:23:19 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e69e0b8b5f 
					 
					
						
						
							
							[Frontend] Responses API messages out, just harmony for now ( #24985 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alec Solder <alecs@fb.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-19 21:40:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ddc9048394 
					 
					
						
						
							
							Fix: Correct FusedMoE layer reference in auto_round quantization ( #24818 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David-Wen <18927700430@163.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-19 20:44:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1a63d1b3b 
					 
					
						
						
							
							[BugFix] Make FlashInferMetadataBuilder non-blocking ( #25040 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Lin <jullin@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-19 20:36:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						48ecb4438b 
					 
					
						
						
							
							[Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when available ( #21126 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-19 14:06:49 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e57fc15971 
					 
					
						
						
							
							Specify platform in pip-compile pre-commit hook so it runs on MacOS ( #25273 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-19 12:43:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4bdf400218 
					 
					
						
						
							
							[Bugfix] Fix chunked a2_scales in modular kernels ( #25264 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-09-19 19:42:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7852b82b93 
					 
					
						
						
							
							[Bugfix] GPT OSS Attritbute error on H100 ( #25228 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-09-19 13:14:09 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a2a5f79e09 
					 
					
						
						
							
							Optimize triton unified attention performance for sliding window attention ( #24390 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zixi-qi <qizixi@meta.com > 
						
						
					 
					
						2025-09-19 13:07:26 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c59a0eca42 
					 
					
						
						
							
							[KV offload][4/N] Offloading KV connector ( #22595 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-09-19 19:07:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b716ab93a7 
					 
					
						
						
							
							[bugfix] fix structured outputs key missing issue from  #24929  ( #25195 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-09-19 18:37:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						138f0d1e75 
					 
					
						
						
							
							[Docs] add __init__.py to vllm/model_executor/layers/quantization/compressed_tensors/transform ( #24974 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: samzong <samzong.lu@gmail.com > 
						
						
					 
					
						2025-09-19 18:32:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2506ce5189 
					 
					
						
						
							
							[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance ( #24990 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-09-19 12:22:53 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47fd08aaf9 
					 
					
						
						
							
							[CI/Build] fix test function_calling ( #25072 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-19 12:16:32 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12aed7e453 
					 
					
						
						
							
							Encoder model support for the Transformers backend ( #25174 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-19 19:15:22 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d90e212a3a 
					 
					
						
						
							
							Remove Redundant Assignment in Qwen3_VisionPatchMerger ( #25224 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-19 12:15:13 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2821986450 
					 
					
						
						
							
							[Core] Modify the initialization parameters of the lora manager ( #25249 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-19 18:01:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c117cff7d 
					 
					
						
						
							
							[Frontend] Pass API server count to each process ( #23717 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-20 01:15:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ac67ea525 
					 
					
						
						
							
							[KV offload][3/N] Add worker-side CPU support ( #21448 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-09-19 09:53:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce75e15373 
					 
					
						
						
							
							refactor(benchmarks): add type annotations to wait_for_endpoint parameters ( #25218 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: samzong <samzong.lu@gmail.com > 
						
						
					 
					
						2025-09-19 16:36:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aed16879a9 
					 
					
						
						
							
							Move ModelConfig from config/__init__.py to config/model.py ( #25252 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-19 16:22:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf278ff3b2 
					 
					
						
						
							
							Update CODEOWNERS ( #25269 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-19 09:12:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						838d7116ba 
					 
					
						
						
							
							[Qwen] Remove cuda hard-code in qwen3 next ( #25243 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Icey <1790571317@qq.com > 
						
						
					 
					
						2025-09-19 12:25:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5089fd749c 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 logic from get_input_embeddings interface ( #25242 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-19 11:10:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3d087adec 
					 
					
						
						
							
							[P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy ( #22188 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-19 11:09:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						058525b997 
					 
					
						
						
							
							Move PoolerConfig from config/__init__.py to config/pooler.py ( #25181 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-19 11:02:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1dfea5f4a9 
					 
					
						
						
							
							[Bugfix][Perf] Misc fixes for Qwen3 VL ( #25238 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-19 10:46:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cea91a32f2 
					 
					
						
						
							
							[Kernel][Performance] Add Triton kernel for Qwen3-VL interleaved MRoPE ( #25055 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-19 10:27:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a684c0124c 
					 
					
						
						
							
							[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B ( #25146 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yan Ma <yan.ma@intel.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-19 08:45:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f2718d2948 
					 
					
						
						
							
							[Misc] Cleanup test conftest for deprecated encoder-decoder models ( #25231 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-19 07:44:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						825fdb11ad 
					 
					
						
						
							
							[Bugfix][CPU] Add placeholder to avoid import errors when using fused_moe ops on platforms without triton ( #25137 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-19 07:41:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c1d4acbfe 
					 
					
						
						
							
							[CPU] Disable oneDNN linear on non-x86 platforms ( #25166 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-19 07:27:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						486c5599e3 
					 
					
						
						
							
							[Build] Update Xgrammar to 0.1.24 to get a CVE fix ( #25188 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-19 14:27:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6149aa587 
					 
					
						
						
							
							[OOT] Support sync_model_loading for OOT ( #25126 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <Chendi.Xue@intel.com > 
						
						
					 
					
						2025-09-19 05:41:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c8a3c099b 
					 
					
						
						
							
							[Docs] Fix griffe warnings in vllm/multimodal ( #25216 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-18 22:10:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31a8a2a7bc 
					 
					
						
						
							
							[Misc] Clean up MM profiling warnings ( #25222 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-19 04:46:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1a0a04dae9 
					 
					
						
						
							
							[Perf] Optimize memory peak during EAGLE model loading. ( #24585 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Ding <candy.dc@alibaba-inc.com > 
						
						
					 
					
						2025-09-19 03:31:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d8246aaff 
					 
					
						
						
							
							[gpt-oss] Add ResponseReasoningPartAddedEvent, ResponseReasoningPartDoneEvent for streaming ( #24938 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com > 
						
						
					 
					
						2025-09-18 19:11:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d1c50a5ac 
					 
					
						
						
							
							[KV offload][2/N] Introduce LRU-based CPU offloading management ( #20075 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-09-19 00:20:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a4600e4dc 
					 
					
						
						
							
							[CORE] Prompt Embeddings Support for v1 Engine ( #24278 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: Andrew Sansom <qthequartermasterman@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-09-19 08:03:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9fac6aa30b 
					 
					
						
						
							
							[BugFix] Fix DeepGEMM warmup, no m.weight_scale_inv ( #25206 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-18 14:26:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a53ad626d6 
					 
					
						
						
							
							[KV offload][1b/N] rename offloading to kv_offload ( #25191 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-09-18 20:53:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c3dad22ff 
					 
					
						
						
							
							[V0 Deprecation] Remove unused async_timeout.py ( #25190 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-18 20:35:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d2a30a2d93 
					 
					
						
						
							
							[Bug] Fix torch Compilation Cache Hit Error ( #25093 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-18 12:38:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75fb112d80 
					 
					
						
						
							
							[Bug] Fix returned_lse not Defined issue ( #25106 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-09-18 19:32:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						38db529f66 
					 
					
						
						
							
							[feat]: Create interface for model-specific M-RoPE ( #24194 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: AzizCode92 <azizbenothman76@gmail.com >
Signed-off-by: Aziz <azizbenothman76@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-18 19:18:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						064cac7bb7 
					 
					
						
						
							
							[fix]: remove data type hardcoding from gptoss model implementation ( #23807 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com > 
						
						
					 
					
						2025-09-18 18:15:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e19bce40a1 
					 
					
						
						
							
							[V0 Deprecation] Remove AsyncLLMEngine ( #25025 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-18 11:07:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						505805b645 
					 
					
						
						
							
							[KV offload][1/N] Introduce an offloading component ( #19848 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-09-18 10:57:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bbdc0f2366 
					 
					
						
						
							
							[ROCm][AITER][Bugfix] Switch AITER to use PIECEWISE_AND_FULL compilation ( #25104 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rohan138 <rohanpotdar138@gmail.com > 
						
						
					 
					
						2025-09-18 17:46:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc34059360 
					 
					
						
						
							
							[ROCm][CI/Build] Use ROCm7.0 as the base ( #25178 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-18 09:36:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c4cb0af98a 
					 
					
						
						
							
							[spec decode] Fix MTP inference path for MiMo-7B model ( #25136 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zixi-qi <qizixi@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-18 09:12:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c3b1634aa 
					 
					
						
						
							
							[Misc] Add codeowner for Transformers backend ( #25180 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 09:01:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ea50e977a 
					 
					
						
						
							
							Enable Allgather/ReduceScatter backend for NaiveAllToAll ( #23964 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Shu Wang <shuw@nvidia.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-18 15:52:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b419937c78 
					 
					
						
						
							
							[Docs] Fix warnings in mkdocs build (continued) ( #25163 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-09-18 08:23:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5f696c33b1 
					 
					
						
						
							
							[New Model] Support BertForTokenClassification / Named Entity Recognition (NER) task ( #24872 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-18 23:22:01 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67244c86f0 
					 
					
						
						
							
							feat(api): Return 503 on /health when engine is dead ( #24897 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dongbo910220 <1275604947@qq.com >
Co-authored-by: Claude <noreply@anthropic.com > 
						
						
					 
					
						2025-09-18 14:29:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						072d7e53e5 
					 
					
						
						
							
							[PERF] Add conv1d metadata to GDN attn ( #25105 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com > 
						
						
					 
					
						2025-09-18 14:27:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01a583fea4 
					 
					
						
						
							
							[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel ( #21197 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com > 
						
						
					 
					
						2025-09-18 14:27:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc19d75985 
					 
					
						
						
							
							[Misc] Add kv-connector label ( #25156 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-18 13:56:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fbd6523ac0 
					 
					
						
						
							
							Refactor dense FP8 tensor/channel/block utils and add CT FP8 block ( #21404 )  
						
						 
						
						
						
						
					 
					
						2025-09-18 08:53:45 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						470484a4f5 
					 
					
						
						
							
							[Structured Output][Refactor] Move apply_grammar_bitmask() method from ModelRunner to structured output utils ( #21999 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-09-18 20:44:31 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21da73343a 
					 
					
						
						
							
							[Misc] Clean up flags in vllm bench serve ( #25138 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-18 12:43:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66072b36db 
					 
					
						
						
							
							[Bugfix][Mamba] - Fix Conv State Kernel FP32 Support ( #24883 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 12:21:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ed1ec4af2 
					 
					
						
						
							
							Fix validate-config pre-commit check ( #25157 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 12:06:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a33ae9a3f 
					 
					
						
						
							
							Fix forward reference warning in documentation ( #25150 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 11:41:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9ff9e6f0c 
					 
					
						
						
							
							[Docs] add the parallel sampling usage in LLMEngine and AsyncLLM ( #24222 )  
						
						 
						
						
						
						
					 
					
						2025-09-18 04:37:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eaffe4486c 
					 
					
						
						
							
							[Docs] Fix pooling-params doc references in openai_compatible_server.md ( #24939 )  
						
						 
						
						
						
						
					 
					
						2025-09-18 04:36:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ed039d527 
					 
					
						
						
							
							Move StructuredOutputsConfig from config/__init__.py to config/structured_outputs.py ( #25153 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 11:24:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37970105fe 
					 
					
						
						
							
							[Model] Improve Pooling Model ( #25149 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-18 11:04:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc935fdd7e 
					 
					
						
						
							
							[Frontend] Support setting logprobs to -1 ( #25031 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-18 10:34:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						abdfcd4f3d 
					 
					
						
						
							
							silu-v1: Fix EPS not being used during max-reduction ( #25069 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvircrn <elvircrn@gmail.com > 
						
						
					 
					
						2025-09-18 10:25:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f02b77de4 
					 
					
						
						
							
							Fix: Add explicit #include <omp.h> for OpenMP compatibility on certain toolchains  ( #24951 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com > 
						
						
					 
					
						2025-09-18 17:43:23 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29283e8976 
					 
					
						
						
							
							[Chore] Cleanup guided namespace, move to structured outputs config ( #22772 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 09:20:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05b044e698 
					 
					
						
						
							
							[Doc] Fix cross-reference warnings ( #25058 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Punit Vara <punitvara@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 02:05:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa3f105c59 
					 
					
						
						
							
							Add 'path' option to ImagePrompt data_format ( #25081 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gerard Finol <gerard.finol@urv.cat > 
						
						
					 
					
						2025-09-18 02:02:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ef7eefe17a 
					 
					
						
						
							
							[Qwen] Add fp8 checkpoint support for qwen3-next. ( #25079 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-09-18 08:16:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						350c94deb3 
					 
					
						
						
							
							[Bugfix] when use s3 model cannot use default load_format ( #24435 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 07:47:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4cd80f944 
					 
					
						
						
							
							Retrieve sliding_window from text config in Gemma3 MM ( #25085 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-18 06:29:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						349e0e3462 
					 
					
						
						
							
							[Docs] Fix API Reference ( #25140 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-17 23:23:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						81b16a2bc9 
					 
					
						
						
							
							[Kernel] Better inf handling for grouped topk cu ( #24886 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lumina37 <starry.qvq@gmail.com > 
						
						
					 
					
						2025-09-18 05:53:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e111d5b0ae 
					 
					
						
						
							
							[CLI] Use streaming in CLI chat and completion commands ( #23769 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-17 22:30:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a904ea78ea 
					 
					
						
						
							
							[benchmark] add peak throughput metrics and plot ( #23867 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-17 22:30:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b7433ca1a4 
					 
					
						
						
							
							[Spec Decode] Efficient padded speculation ( #24539 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <bchislett@nvidia.com > 
						
						
					 
					
						2025-09-18 01:07:24 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c65a72bb1 
					 
					
						
						
							
							[V0 Deprecation] Remove more V0 tests ( #25117 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-17 22:05:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d8a2d86d2 
					 
					
						
						
							
							[EPLB] Add EPLB support for hunyuan_v1 ( #23078 )  
						
						 
						
						
						
						
					 
					
						2025-09-18 04:51:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3bc18127ff 
					 
					
						
						
							
							[XPU] Whisper model support on XPU Platform ( #25123 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chzhang <chaojun.zhang@intel.com > 
						
						
					 
					
						2025-09-18 04:30:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bec060fd99 
					 
					
						
						
							
							Mark prompt logprobs as incompatible with prompt embeds at API level ( #25077 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-17 21:25:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						52bc9d5b3e 
					 
					
						
						
							
							[Model] enable data parallel for InternVL vision encoder ( #23909 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu >
Signed-off-by: YiwenC <54658925+666even666@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-17 21:11:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc2979c585 
					 
					
						
						
							
							[Kernels] Overlap shared experts with combine instead of dispatch ( #24254 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-09-18 12:10:21 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						027d37df38 
					 
					
						
						
							
							[Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models ( #24960 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: toncao <cpatonn@gmail.com >
Co-authored-by: toncao <cpatonn@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-18 12:08:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b98219670f 
					 
					
						
						
							
							[Core][MM] Cleanup MultiModalCache ( #25006 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-09-17 21:08:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32baf1d036 
					 
					
						
						
							
							[Docs] Clean up the contributing README ( #25099 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-17 21:05:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3127274d02 
					 
					
						
						
							
							[MM Encoder] Apply DP ViT for Qwen3-VL model series ( #24955 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Huang Jie <92386084+JJJYmmm@users.noreply.github.com >
Co-authored-by: 松灵 <26085463+wulipc@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-17 21:04:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ac510f484 
					 
					
						
						
							
							[Kernels] Enable DeepGEMM by default ( #24462 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-09-17 20:19:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7fb2a5be28 
					 
					
						
						
							
							[V0 Deprecation] Skip PP test ( #25128 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-17 20:18:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c036615dc 
					 
					
						
						
							
							[V0 Deprecation] Remove misc V0 tests ( #25118 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-17 19:41:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2fc24e94f9 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 Tracing & Metrics tests ( #25115 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-17 19:40:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c3c1bd07a 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 Engine tests ( #25114 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-17 19:38:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5963b98b46 
					 
					
						
						
							
							[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses ( #22537 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-09-17 17:43:31 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e6585ddb45 
					 
					
						
						
							
							[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel ( #24833 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-17 16:37:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2a4d6412e6 
					 
					
						
						
							
							Add a batched auto tune script ( #25076 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Karan Goel <karangoel@google.com >
Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-17 22:41:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e67a79db03 
					 
					
						
						
							
							[Bugfix] Refactor Flashinfer TRTLLM attention kernel selection logic ( #24600 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-17 15:36:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f882d8791 
					 
					
						
						
							
							Disable failing GPT-OSS Eval (Blackwell) for now ( #25107 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-17 15:36:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1a456c7c90 
					 
					
						
						
							
							Aiter mha fp8 fix ( #24991 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com > 
						
						
					 
					
						2025-09-17 22:29:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fedb75fa27 
					 
					
						
						
							
							[Bugfix][B200] Fix cutlass_mla hang ( #24966 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-17 18:06:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bff2e5f1d6 
					 
					
						
						
							
							[gpt-oss][2] fix types for streaming ( #24556 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com > 
						
						
					 
					
						2025-09-17 22:04:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c068c637b 
					 
					
						
						
							
							[Kernel] Faster pre-processing time for W4A8 ( #23972 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: czhu-cohere <conway.zhu@cohere.com > 
						
						
					 
					
						2025-09-17 14:35:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f20c3b0951 
					 
					
						
						
							
							[BUG] Exclude .pth files when pulling remote files  ( #25092 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ahao-anyscale <ahao@anyscale.com > 
						
						
					 
					
						2025-09-17 20:42:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						883131544f 
					 
					
						
						
							
							[Bugfix] Update import path for bc_linter_include ( #24766 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu > 
						
						
					 
					
						2025-09-17 20:33:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ee5fd49150 
					 
					
						
						
							
							[Misc] Update owners for KV connector and V1 offloading ( #25041 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ApostaC <yihua98@uchicago.edu > 
						
						
					 
					
						2025-09-17 12:37:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ae9887542 
					 
					
						
						
							
							[V1] Logits processor docs ( #22919 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@redhat.com >
Signed-off-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com >
Co-authored-by: Joseph Marinier <Joseph.Marinier@gmail.com > 
						
						
					 
					
						2025-09-17 11:53:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e3db5ebb66 
					 
					
						
						
							
							[CI Bugfix] Fix failing test_model_load_with_params tests due to tokenizer refactor ( #25086 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-17 11:15:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d442b7c48 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 tests in test_sequence.py ( #25088 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-17 11:08:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb68c2dcd9 
					 
					
						
						
							
							[CI] Revert back prepare_prompts and check_answers ( #25087 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-17 11:03:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8b32464ac1 
					 
					
						
						
							
							Change log level from info to debug for IOProcessor ( #24999 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-17 10:21:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99cc41ad50 
					 
					
						
						
							
							[V0 Deprecation] Remove unused output processor util ( #25023 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-17 09:50:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6a518fdde 
					 
					
						
						
							
							Remove unused find_cuda_init helper script ( #25044 )  
						
						 
						
						
						
						
					 
					
						2025-09-17 09:47:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4aa8c7b047 
					 
					
						
						
							
							cleanup: remove adapter commons  ( #25045 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-17 16:46:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4b946d693e 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 Core tests ( #25082 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-17 09:32:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						087c6ffc92 
					 
					
						
						
							
							[CI Bugfix] Fix failing test_invalid_env ( #25078 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-17 08:28:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a2d33e371 
					 
					
						
						
							
							[Docs] vllm/benchmarks/datasets.py fix docstring param format. ( #24970 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: samzong <samzong.lu@gmail.com > 
						
						
					 
					
						2025-09-17 08:11:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f3616f422 
					 
					
						
						
							
							Remove old cutlass mla ( #23961 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com > 
						
						
					 
					
						2025-09-17 14:31:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47f670b03b 
					 
					
						
						
							
							[Docs] improve code formatting and comments for eliminate griffe build warning. ( #25010 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: samzong <samzong.lu@gmail.com > 
						
						
					 
					
						2025-09-17 07:31:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd6a910aac 
					 
					
						
						
							
							[Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP implementation. ( #24957 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-09-17 21:59:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b962e2457 
					 
					
						
						
							
							[fix] lora benchmarks pass no_lora_flag_cpu ( #23774 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dylan Maloy <34420038+dolpm@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-17 21:22:25 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bfe9380161 
					 
					
						
						
							
							Apply fixes for CUDA 13 ( #24599 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com > 
						
						
					 
					
						2025-09-17 09:15:42 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9fccd04e30 
					 
					
						
						
							
							[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check ( #25046 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-17 05:54:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						252ada5559 
					 
					
						
						
							
							Add RADIO Vision Encoder Support to vLLM ( #24595 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com >
Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster > 
						
						
					 
					
						2025-09-17 05:53:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e120533d7a 
					 
					
						
						
							
							[Misc] Avoid use of deprecated AutoModelForVision2Seq ( #25065 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-17 12:19:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b85697031 
					 
					
						
						
							
							[BugFix] enable DOTALL to match multi-line tool_call parameters in extract_tool_call_required_streaming ( #24668 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shijun Yin <shijun.yin@outlook.com > 
						
						
					 
					
						2025-09-17 09:21:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						544fe76b95 
					 
					
						
						
							
							[Frontend] Support returning all prompt logprobs ( #24956 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-17 09:03:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb58dc8c20 
					 
					
						
						
							
							[DP] Create placement groups by ray_device_key ( #25026 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-09-17 08:57:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0fb2551c23 
					 
					
						
						
							
							[Docs] Fix griffe warning in base_static_graph.py ( #25018 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-17 08:49:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c47f6bfa4 
					 
					
						
						
							
							[Core] Remove tokenizer group in vLLM ( #24078 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhuohan Li <zhuohan123@gmail.com > 
						
						
					 
					
						2025-09-17 08:42:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c15309a730 
					 
					
						
						
							
							[Model] Apply SharedFusedMoE to glm4_moe. ( #24849 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: whx-sjtu <2952154980@qq.com > 
						
						
					 
					
						2025-09-17 16:02:31 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a9375fe9d 
					 
					
						
						
							
							[Model] Pass param prefix to LLMHead ( #24862 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: whx-sjtu <2952154980@qq.com > 
						
						
					 
					
						2025-09-17 16:01:27 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						03191cd8f0 
					 
					
						
						
							
							[Core][MultiModalHasher] Hash images without converting image mode ( #24969 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-09-17 00:57:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b77bf34e53 
					 
					
						
						
							
							[EPLB] Support EPLB for Mixtral Model ( #22842 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rouchenzi <ruochenwen@gmail.com >
Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com >
Co-authored-by: Bowen Wang <abmfy@icloud.com > 
						
						
					 
					
						2025-09-17 07:27:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd39baf717 
					 
					
						
						
							
							[XPU] Fix xpu model runner call torch.cuda APIs ( #25011 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-09-17 06:45:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						43a62c51be 
					 
					
						
						
							
							Add more documentation and improve usability of lognormal dist (benchmark_serving_multi_turn) ( #23255 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: daniels <daniels@pliops.com > 
						
						
					 
					
						2025-09-17 05:53:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ca2d1925ef 
					 
					
						
						
							
							[Rocm] [quantization] Fix quark ptpc moe and add test case ( #24649 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com >
Co-authored-by: Haoyang Li <haoyang.li@amd.com > 
						
						
					 
					
						2025-09-16 22:15:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0f7acdd73c 
					 
					
						
						
							
							[Model] Support Qwen3-VL Model Series ( #24727 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Huang Jie <92386084+JJJYmmm@users.noreply.github.com >
Co-authored-by: 松灵 <26085463+wulipc@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-17 05:01:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5801e49776 
					 
					
						
						
							
							[V0 Deprecation] Remove MQLLMEngine ( #25019 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-16 21:29:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58d4c705a8 
					 
					
						
						
							
							[Core] Get num_encoder_tokens from scheduler config ( #24989 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-16 20:59:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ea3de5ef0d 
					 
					
						
						
							
							[misc] fix typo in value error ( #24995 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com > 
						
						
					 
					
						2025-09-16 20:58:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67532a1a68 
					 
					
						
						
							
							[UX] Remove "quantization is not fully optimized yet" log ( #25012 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-16 20:57:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5672ba90bd 
					 
					
						
						
							
							[Docs] fix invalid doc link ( #25017 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zxw <1020938856@qq.com > 
						
						
					 
					
						2025-09-16 20:53:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd83a157f1 
					 
					
						
						
							
							[UX] Enforce valid choices for envs like VLLM_ATTENTION_BACKEND, etc ( #24761 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-16 20:42:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a411ef6c4 
					 
					
						
						
							
							[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets ( #24719 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-17 03:29:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eeb135eb87 
					 
					
						
						
							
							[Core] Use CpuGpuBuffer for block table tensors ( #24795 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-16 19:18:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3059b9cc6b 
					 
					
						
						
							
							[Doc] Add --force-overwrite option to generate_cmake_presets.py ( #24375 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 18:45:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						64ad551878 
					 
					
						
						
							
							Removes source compilation of nixl dependency ( #24874 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com > 
						
						
					 
					
						2025-09-17 01:33:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cef32104b4 
					 
					
						
						
							
							[FP8] Extend per-token-group quantization support to QuantFP8 ( #24342 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com > 
						
						
					 
					
						2025-09-16 18:31:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						493b10f8bf 
					 
					
						
						
							
							[CI] GPT-OSS GPQA eval test for Blackwell ( #24920 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-16 18:13:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d119fc8614 
					 
					
						
						
							
							[CI][Bugfix] Fix failing Blackwell test ( #24993 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 15:55:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dbebb7f812 
					 
					
						
						
							
							[Perf] Reuse workspace for FP8+FP4 Marlin MoE ( #20500 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 15:45:10 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3053a22b33 
					 
					
						
						
							
							fp8 kv cache support fix for torch.compile ( #22758 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 21:27:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						02d4b85454 
					 
					
						
						
							
							Use kwargs for long lists of EngineCoreRequest arguments in tests and fix extra kwargs ( #24987 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-16 14:06:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86daa875fe 
					 
					
						
						
							
							[gpt-oss][1][bugfix] fix streaming final output ( #24466 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com > 
						
						
					 
					
						2025-09-16 13:56:16 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dcf2f3ec06 
					 
					
						
						
							
							[ROCm] Add dependencies for ROCm ( #24900 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yida Wu <yida.wu@amd.com > 
						
						
					 
					
						2025-09-16 19:49:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						218454b9b2 
					 
					
						
						
							
							[MISC] Add code owners of vllm/v1 to vllm/v1/core ( #24928 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-16 19:07:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4d6eb95cf 
					 
					
						
						
							
							[gpt-oss][1b] streaming add item id, content id ( #24788 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com > 
						
						
					 
					
						2025-09-16 18:41:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd1f885bcf 
					 
					
						
						
							
							Directly get max encoder len from VLLM config in V1 ( #24866 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sugar-zsg <952242923@qq.com > 
						
						
					 
					
						2025-09-16 17:52:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d593cf28fa 
					 
					
						
						
							
							[Misc] Add removed encoder-decoder models to previously supported models list ( #24961 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-16 10:46:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						faa7a5daac 
					 
					
						
						
							
							[Bugfix] Fix unable to run encoder model when disable_hybrid_kv_cache_manager is true ( #24571 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lianyibo <lianyibo1@kunlunit.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-16 17:36:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						567939953b 
					 
					
						
						
							
							[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM ( #23693 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 12:21:48 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08369289af 
					 
					
						
						
							
							[Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing ( #24925 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-09-16 15:32:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73cfb3c5ee 
					 
					
						
						
							
							[Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 ( #24331 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 14:53:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e5affeaa1 
					 
					
						
						
							
							[CI] Add Decode Context Parallelism (DCP) test to CI ( #24487 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-09-16 21:21:28 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e4f0b4cd96 
					 
					
						
						
							
							(doc): set cmake c++ compatible standard when building on MacOS CPU. ( #23483 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: teekenl <teekenlau@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 06:08:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de3e53a75b 
					 
					
						
						
							
							feat: Add Grafana and Perces monitoring dashboards for vLLM ( #23498 )  
						
						 
						
						
						
						
					 
					
						2025-09-16 05:53:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						85e0df1392 
					 
					
						
						
							
							[Docs] move benchmarks README to contributing guides ( #24820 )  
						
						 
						
						
						
						
					 
					
						2025-09-16 05:52:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0faf3cc3e8 
					 
					
						
						
							
							Move SpeculativeConfig from config/__init__.py to config/speculative.py ( #24904 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 12:51:35 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ea5c73ad7 
					 
					
						
						
							
							[Feat][EPLB] A novel static EPLB placement strategy for MoE models. ( #23745 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: bruceszchen <bruceszchen@tencent.com >
Signed-off-by: Chen Bruce <bruceszchen@tencent.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Chen Bruce <cszwwdz@vip.qq.com >
Co-authored-by: lemon412 <lemon412@foxmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 10:55:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27fcfe7bcf 
					 
					
						
						
							
							[Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 ( #24593 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-16 10:51:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68dbde5dbb 
					 
					
						
						
							
							[Bugfix] remove duplicate tokens streamed in required tool choice streaming ( #23312 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jason Cheng <jasoncky96@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-16 15:16:32 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04ad0dc275 
					 
					
						
						
							
							[benchmark] Add triton version in the moe tuned config ( #24769 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-16 14:10:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						238c4c1705 
					 
					
						
						
							
							[QWEN NEXT] Fused MoE kernels Optimization configs ( #24924 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Saman Keon <samanamp@outlook.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-16 13:06:03 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c54610265 
					 
					
						
						
							
							[Bug] [Spec Dec]: Fix kv_cache dtype mismatch for Eagle3 drafter on FP8 target ( #24505 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-09-16 04:45:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						17871983a2 
					 
					
						
						
							
							[Bugfix] Fix sequence parallelism bug when enable pipeline parallelism ( #24021 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com > 
						
						
					 
					
						2025-09-16 04:32:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						759ef49b15 
					 
					
						
						
							
							Remove V0 Encoder-Decoder Support ( #24907 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-15 21:17:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5206ab20ba 
					 
					
						
						
							
							[XPU] Fix circular import error.  ( #24927 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-09-16 03:35:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0af3ce1355 
					 
					
						
						
							
							Upgrade flashinfer to 0.3.1 ( #24470 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-16 02:36:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1279ef00f 
					 
					
						
						
							
							[Docs] Update instructions for how to using existing torch binary ( #24892 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-16 02:25:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2942970d44 
					 
					
						
						
							
							[Metrics] Hide deprecated metrics with gpu_ prefix ( #24245 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-09-15 20:15:57 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c96e7b8a1 
					 
					
						
						
							
							[CI] Small Accuracy Eval Test for Deepseek Model ( #24259 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-15 20:14:50 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b42566f440 
					 
					
						
						
							
							[Bug] Fix is_flashmla_supported Check Error ( #24774 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-15 20:10:55 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d96e11167d 
					 
					
						
						
							
							Add pytest-cov and .coveragerc ( #24778 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Reza Barazesh <rezabarazesh@meta.com > 
						
						
					 
					
						2025-09-15 20:08:46 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2891603efd 
					 
					
						
						
							
							[ROCm][Bugfix] Fix the case where there's bias ( #24895 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-15 20:05:12 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de2cc3d867 
					 
					
						
						
							
							[Deprecation] Remove DeepGEMM Old Symbol Wrapper ( #24902 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-15 20:03:29 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e95084308b 
					 
					
						
						
							
							Updated CODEOWNERS for flashinfer, mla, fused_moe ( #24906 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-16 02:01:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f6f2c1182 
					 
					
						
						
							
							HuggingFace -> Hugging Face in Integration with Hugging Face docs (#24889 )  
						
						 
						
						
						
						
					 
					
						2025-09-15 17:28:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5bcc153d7b 
					 
					
						
						
							
							[Compile] Fix noop_elimination pass and add tests for noop_elimination ( #24880 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-09-15 23:33:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45bfa49cb8 
					 
					
						
						
							
							[Tests] fix initialization of kv hash in tests ( #24273 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mickael Seznec <mickael@mistral.ai > 
						
						
					 
					
						2025-09-15 21:48:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fd2f10546c 
					 
					
						
						
							
							[ci] fix wheel names for arm wheels ( #24898 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-15 14:39:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e757a629e7 
					 
					
						
						
							
							[Bug] Fix Cutlass Scaled MM Compilation Error ( #24887 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-15 17:21:17 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aae725af7c 
					 
					
						
						
							
							[Performance] Remove redundant clone() calls in cutlass_mla ( #24891 )  
						
						 
						
						
						
						
					 
					
						2025-09-15 20:21:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73df49ef3a 
					 
					
						
						
							
							[gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still ( #24759 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com > 
						
						
					 
					
						2025-09-15 13:08:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						25aba2b6a3 
					 
					
						
						
							
							[gpt-oss] Add IncompleteDetails to ResponsesRepsonse ( #24561 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com > 
						
						
					 
					
						2025-09-15 13:07:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						94b03f88dd 
					 
					
						
						
							
							Bump Flashinfer to 0.3.1 ( #24868 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: bbartels <benjamin@bartels.dev > 
						
						
					 
					
						2025-09-15 12:45:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						49bfc538e4 
					 
					
						
						
							
							Update num_tokens_across_dp to use nccl instead of gloo ( #24105 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-09-15 19:05:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0b26701c9 
					 
					
						
						
							
							[Transform] Deterministic Hadacore Transforms ( #24106 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-09-15 12:59:31 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c4afdb69cc 
					 
					
						
						
							
							Move MultiModalConfig from config/__init__.py to config/multimodal.py ( #24659 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-15 17:43:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b834b4cbf1 
					 
					
						
						
							
							[USAGE] Improve error handling for weight initialization in Unquantized… ( #20321 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com >
Signed-off-by: Rafael Koike <koike.rafael@gmail.com > 
						
						
					 
					
						2025-09-15 16:45:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						740f0647b1 
					 
					
						
						
							
							Reinstate existing torch script ( #24729 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-15 09:43:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01413e0cf5 
					 
					
						
						
							
							Fp8 paged attention update ( #22222 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiao Yu <xiao.yu@amd.com >
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com >
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com >
Co-authored-by: Xiao Yu <xiao.yu@amd.com >
Co-authored-by: Bowen Bao <bowenbao@amd.com > 
						
						
					 
					
						2025-09-15 10:43:26 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e219cd50b 
					 
					
						
						
							
							[Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 ( #24822 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-15 20:45:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72c99f2a75 
					 
					
						
						
							
							[Model]: support Ling2.0 ( #24627 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vito.yy <vito.yy@antgroup.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-15 05:09:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf214ca226 
					 
					
						
						
							
							[Misc] Fix examples openai_pooling_client.py  ( #24853 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-15 11:57:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e41f5abca 
					 
					
						
						
							
							[XPU] Set consistent default KV cache layout ( #24745 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-15 18:09:34 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc0f6059a2 
					 
					
						
						
							
							[UT] enhance free kv cache block queue popleft_n ( #24220 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-09-15 10:04:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8de261b04a 
					 
					
						
						
							
							[P/D]kv_output_aggregator support P TP > D TP ( #23917 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: LCAIZJ <leichao139636@163.com >
Co-authored-by: leichao.lc <leichao.lc@antgroup.com > 
						
						
					 
					
						2025-09-15 11:36:06 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0d8b9738d 
					 
					
						
						
							
							[Misc] Own KVConnectors installation ( #24867 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-15 02:21:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						59e17dd4a0 
					 
					
						
						
							
							[Misc] rename interval to max_recent_requests ( #24229 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-09-15 09:18:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4979eb79da 
					 
					
						
						
							
							[Doc]: fix typos in various files ( #24821 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-15 01:08:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a8c0f59973 
					 
					
						
						
							
							[Bugfix] MiDashengLM model contact error under concurrent testing ( #24738 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chenbing8 <chenbing8@xiaomi.com >
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com > 
						
						
					 
					
						2025-09-15 06:38:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4a948f33f 
					 
					
						
						
							
							[Frontend] Skip stop in reasoning content ( #14550 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-15 06:04:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3f3313981c 
					 
					
						
						
							
							[kv cache] update num_free_blocks in the end ( #24228 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-09-15 05:15:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78818dd1b0 
					 
					
						
						
							
							[Docs] Have a try to improve frameworks/streamlit.md ( #24841 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-14 21:50:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8e5cdcda4e 
					 
					
						
						
							
							[Hybrid Allocator] Support Pipeline Parallel ( #23974 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-14 15:55:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90f3f7d73e 
					 
					
						
						
							
							[Spec Decoding]Support Spec Decoding Metrics in DP Mode ( #24049 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wuhang <wuhang6@huawei.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-09-14 21:11:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6dc8da5dc1 
					 
					
						
						
							
							[Chore] Remove ipex_ops warning ( #24835 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-09-14 19:41:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						79cbcab871 
					 
					
						
						
							
							Force use C++17 globally to avoid compilation error ( #24823 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chenfengjin <1871653365@qq.com > 
						
						
					 
					
						2025-09-14 19:30:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ff68035932 
					 
					
						
						
							
							[Benchmarks] Throw usage error when using dataset-name random and dataset-path together ( #24819 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-14 17:50:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1177dd53e9 
					 
					
						
						
							
							fix type of sampling rate for encode_base64 ( #24826 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: co63oc <co63oc@users.noreply.github.com > 
						
						
					 
					
						2025-09-14 16:17:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc2dbcda8b 
					 
					
						
						
							
							[Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement ( #24783 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-09-14 11:20:17 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fec347dee1 
					 
					
						
						
							
							[Misc] Improve s3_utils type hints with BaseClient ( #24825 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-09-14 12:11:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc3173ae98 
					 
					
						
						
							
							[Multi Modal][Performance] Fused Q,K's apply_rope into one ( #24511 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-14 08:10:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e903b6cb4 
					 
					
						
						
							
							[Chore] Minor simplification for non-PP path ( #24810 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-13 17:41:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						973c9d01da 
					 
					
						
						
							
							[Minor] Simplify duplicative device check for cuda ( #24793 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ziliang Peng <ziliangdotme@gmail.com > 
						
						
					 
					
						2025-09-13 18:28:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						15b8fef453 
					 
					
						
						
							
							Remove redundant assignment in xfer_buffers, This is a little fix ( #24732 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com > 
						
						
					 
					
						2025-09-13 08:11:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cfa3234a5b 
					 
					
						
						
							
							[CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again ( #24771 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-09-13 15:45:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41ae4a1eab 
					 
					
						
						
							
							[Doc]: fix typos in various files ( #24798 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-13 00:43:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4dad72f0d9 
					 
					
						
						
							
							[Misc] Correct an outdated comment. ( #24765 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-13 00:34:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						59d7ffc17f 
					 
					
						
						
							
							[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe ( #24750 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-13 07:29:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1da0f1441d 
					 
					
						
						
							
							[Core][Multimodal] Cache supports_kw ( #24773 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-09-13 07:27:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98229db244 
					 
					
						
						
							
							[Kernels][DP/EP] Optimize Silu Kernel for R1 ( #24054 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvircrn <elvircrn@gmail.com > 
						
						
					 
					
						2025-09-13 00:17:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dbeee3844c 
					 
					
						
						
							
							[Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization ( #24757 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-09-13 00:16:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30498f2a65 
					 
					
						
						
							
							[Doc]: Remove 404 hyperlinks ( #24785 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rakesh Asapanna  <45640029+rozeappletree@users.noreply.github.com > 
						
						
					 
					
						2025-09-13 00:15:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						abc7989adc 
					 
					
						
						
							
							[Docs] Remove Neuron install doc as backend no longer exists ( #24396 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-13 00:15:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a8966bcc2 
					 
					
						
						
							
							[Docs] Fix warnings in mkdocs build (continued) ( #24791 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-09-13 00:13:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5febdc8750 
					 
					
						
						
							
							[Chore] Remove unused batched RoPE op & kernel ( #24789 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-13 00:08:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99bfef841f 
					 
					
						
						
							
							[Bugfix] Fix GPUModelRunner has no attribute lora_manager ( #24762 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-12 23:55:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89e08d6d18 
					 
					
						
						
							
							[Model] Add Olmo3 model implementation ( #24534 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shane A <shanea@allenai.org >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-13 03:26:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f2ea7074e 
					 
					
						
						
							
							[Frontend][Multimodal] Allow skipping media data when UUIDs are provided.  ( #23950 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-09-13 02:16:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4fdd6f5cbf 
					 
					
						
						
							
							[Core] Support async scheduling with uniproc executor  ( #24219 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Ronald1995 <ronaldautomobile@163.com >
Co-authored-by: Ronald1995 <ronaldautomobile@163.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-09-12 16:34:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8226dd56bf 
					 
					
						
						
							
							[Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes ( #24660 ) ( #24667 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-09-12 22:31:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5fe643fc26 
					 
					
						
						
							
							Add FLASHINFER_MLA to backend selector test ( #24753 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com > 
						
						
					 
					
						2025-09-12 22:30:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ba32aa60b 
					 
					
						
						
							
							[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode ( #24705 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com > 
						
						
					 
					
						2025-09-12 15:45:53 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c89ed8de43 
					 
					
						
						
							
							Invert pattern order to make sure that out_proj layers are identified ( #24781 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexandre Marques <almarque@redhat.com > 
						
						
					 
					
						2025-09-12 14:45:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3beadc2f25 
					 
					
						
						
							
							[Compilation Bug] Fix Inductor Graph Output with Shape Issue ( #24772 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-12 21:23:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc636f21a6 
					 
					
						
						
							
							[Benchmark] Allow arbitrary headers to be passed to benchmarked endpoints ( #23937 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Clayton Coleman <smarterclayton@gmail.com > 
						
						
					 
					
						2025-09-12 13:57:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						017354c0ef 
					 
					
						
						
							
							[CI] Trigger BC Linter when labels are added/removed ( #24767 )  
						
						 
						
						
						
						
					 
					
						2025-09-12 11:44:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						010acc6e1e 
					 
					
						
						
							
							[Bugfix] Fix incompatibility between  #20452  and  #24548  ( #24754 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-12 11:17:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c8c42597ab 
					 
					
						
						
							
							[CI] Speed up model unit tests in CI ( #24253 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@redhat.com > 
						
						
					 
					
						2025-09-12 10:36:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d2a44606d 
					 
					
						
						
							
							[UX] Remove AsyncLLM torch profiler disabled log ( #24609 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-12 10:08:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f17c075884 
					 
					
						
						
							
							[Model] Switch to Fused RMSNorm in GLM-4.1V model ( #24733 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: SamitHuang <285365963@qq.com > 
						
						
					 
					
						2025-09-12 09:12:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b0d1213ac3 
					 
					
						
						
							
							[Models] Prevent CUDA sync in Qwen2.5-VL ( #24741 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-09-12 16:03:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57f94e88ea 
					 
					
						
						
							
							[Models] Optimise and simplify _validate_and_reshape_mm_tensor ( #24742 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-09-12 15:37:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						684b6870e1 
					 
					
						
						
							
							[Bugfix][Frontend] Fix --enable-log-outputs does not match the documentation ( #24626 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-09-12 08:01:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a5b84f1cbf 
					 
					
						
						
							
							[Core] Shared memory based object store for Multimodal data caching and IPC ( #20452 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: donglu <donglu@cohere.com > 
						
						
					 
					
						2025-09-12 07:54:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f04d9d55f 
					 
					
						
						
							
							[Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP ( #24739 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvircrn <elvircrn@gmail.com > 
						
						
					 
					
						2025-09-12 07:54:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d7c1d531b 
					 
					
						
						
							
							[Bugfix] Fix MRoPE dispatch on XPU ( #24724 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yan Ma <yan.ma@intel.com > 
						
						
					 
					
						2025-09-12 21:43:56 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41f17bf290 
					 
					
						
						
							
							[Docs] Fix warnings in mkdocs build (continued) ( #24740 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com > 
						
						
					 
					
						2025-09-12 06:43:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bcb06d7baf 
					 
					
						
						
							
							[Doc]: fix typos in various files ( #24726 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-12 06:43:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0377802c20 
					 
					
						
						
							
							[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec  ( #24548 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-09-12 21:42:23 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72fc8aa412 
					 
					
						
						
							
							[Multi Modal] Add FA3 in VIT ( #24347 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-09-12 21:27:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdb09c77d6 
					 
					
						
						
							
							[sleep mode] save memory for on-the-fly quantization ( #24731 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-12 11:25:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7a1c4025f1 
					 
					
						
						
							
							[Kernel] [CPU] refactor cpu_attn.py:_run_sdpa_forward for better memory access ( #24701 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ignaciosica <mignacio.sica@gmail.com > 
						
						
					 
					
						2025-09-12 19:23:07 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						60a0951924 
					 
					
						
						
							
							[Bugfix] Fix BNB name match ( #24735 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-12 11:12:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						64d90c3e4f 
					 
					
						
						
							
							[Misc][gpt-oss] Add gpt-oss label to PRs that mention harmony or related to builtin tool call ( #24717 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-12 18:57:07 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						59d5d2c736 
					 
					
						
						
							
							[CI/Build] Skip prompt embeddings tests on V1-only CPU backend ( #24721 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-12 18:51:01 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d21a36f5f9 
					 
					
						
						
							
							[CI] Add ci_envs for convenient local testing ( #24630 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-12 08:52:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						561a0baee0 
					 
					
						
						
							
							[CI] Fix flaky test  v1/worker/test_gpu_model_runner.py::test_kv_cache_stride_order          ( #24640 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-12 07:49:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f592b3174b 
					 
					
						
						
							
							[BugFix] Fix Qwen3-Next PP ( #24709 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-11 23:35:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7920de0a2a 
					 
					
						
						
							
							[Bugfix] Fix MRoPE dispatch on CPU ( #24712 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-12 04:56:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ddcec289c7 
					 
					
						
						
							
							Fix implementation divergence for BLOOM models between vLLM and HuggingFace when using prompt embeds ( #24686 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-12 04:35:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e090b7b45b 
					 
					
						
						
							
							Enable conversion of multimodal models to pooling tasks ( #24451 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-09-12 03:30:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a50eaa0d3 
					 
					
						
						
							
							[DOCs] Update ROCm installation docs section ( #24691 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-11 20:02:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12a8414d81 
					 
					
						
						
							
							[Qwen3-Next] MoE configs for H20 TP=1,2,4,8 ( #24707 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-12 10:06:26 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						880c741bb6 
					 
					
						
						
							
							[Bugfix] fixes the causal_conv1d_update kernel update non-speculative decoding cases ( #24680 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-11 18:16:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40b6c9122b 
					 
					
						
						
							
							[V1] feat:add engine v1 tracing ( #20372 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com >
Signed-off-by: Ye Zhang <zhysishu@gmail.com >
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com >
Co-authored-by: Ye Zhang <zhysishu@gmail.com >
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev >
Co-authored-by: simon-mo <simon.mo@hey.com >
Co-authored-by: 瑜琮 <ly186375@antfin.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 17:10:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e6bc46821 
					 
					
						
						
							
							[Startup] Make DeepGEMM warmup scale with max-num-batched-tokens ( #24693 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-09-11 20:10:19 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fcba05c435 
					 
					
						
						
							
							[Bug] Fix Layer weight_block_size Assertion Issue ( #24674 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-11 19:47:59 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7a30fa8708 
					 
					
						
						
							
							[Doc] Clarify cudagraph capture size logic and default behavior in scheduler ( #18698 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zazzle516 <2405677060@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 23:18:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f82f7a8990 
					 
					
						
						
							
							[Qwen3-Next] MOE configs for H100 TP4 ( #24699 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-11 15:45:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c3aea10dc8 
					 
					
						
						
							
							[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel ( #23280 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 15:43:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d4fd2768ef 
					 
					
						
						
							
							[Bugfix][Attention] Fix FlashInfer MLA block size logic ( #24692 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com > 
						
						
					 
					
						2025-09-11 22:39:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7a70a71892 
					 
					
						
						
							
							[Qwen3-Next] Add B200 MoE configs for Qwen3-next ( #24698 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com > 
						
						
					 
					
						2025-09-11 15:34:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7d4651997a 
					 
					
						
						
							
							[CI/Build] Add bc-linter to vLLM CI ( #21234 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhewenli <zhewenli@meta.com > 
						
						
					 
					
						2025-09-11 15:34:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						569bf1c9c0 
					 
					
						
						
							
							[Qwen3-Next] MoE configs for H200 TP=1,2,4 ( #24695 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-11 14:38:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1ec20355f5 
					 
					
						
						
							
							[Bugfix] Set VLLM_ALLREDUCE_USE_SYMM_MEM default to False ( #24696 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-11 14:32:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e42af78b18 
					 
					
						
						
							
							[flashinfer] [kernel] support for fp8 kv cache for trtllm prefill attention ( #24197 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiaozhu <mxz297@gmail.com > 
						
						
					 
					
						2025-09-11 14:20:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						074854b24f 
					 
					
						
						
							
							[Kernel][B200] mxfp4 fused cutlass moe ( #23696 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-11 17:04:56 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						79ac59f32e 
					 
					
						
						
							
							Update Spec Decode metrics to include drafted and accepted token throughput ( #24127 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Xia <axia@meta.com > 
						
						
					 
					
						2025-09-11 19:58:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b971f91504 
					 
					
						
						
							
							[BugFix] Fix tokenize asyncio task leak ( #24677 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-11 19:44:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c733bd5e87 
					 
					
						
						
							
							[Qwen3-Next] Add MoE Config for H200 ( #24688 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-09-11 12:40:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a892b259b4 
					 
					
						
						
							
							[Doc] Remove Useless Comments ( #24687 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-11 12:25:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						127ded0a9e 
					 
					
						
						
							
							[Ultravox] Use wrapped_model_config to instantiate inner model ( #24679 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Salas <peter@fixie.ai > 
						
						
					 
					
						2025-09-11 18:52:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb2b5126da 
					 
					
						
						
							
							[VLM] Migrate remain DP-supported ViT models to use disable_tp ( #24363 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-11 18:30:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						361ae27f8a 
					 
					
						
						
							
							[Docs] Fix formatting of transcription doc ( #24676 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 11:18:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e26fef8397 
					 
					
						
						
							
							fix some typos ( #24616 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: co63oc <co63oc@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 10:48:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c1eda615ba 
					 
					
						
						
							
							Fix model name included in responses ( #24663 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 10:47:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4aa23892d6 
					 
					
						
						
							
							[Bugfix] Fix platform-specific routing in CustomOp implementations ( #24444 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Konrad Zawora <kzawora@habana.ai > 
						
						
					 
					
						2025-09-11 17:15:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1fdd5c42d7 
					 
					
						
						
							
							[Kernels] Enable Torch Symmetric Memory All-Reduce By Default ( #24111 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-11 09:45:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bcbe2a4d9e 
					 
					
						
						
							
							[VLM] Optimize GLM4.5-V-style video processing to only decode necessary frames ( #24161 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-11 09:44:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51d41265ad 
					 
					
						
						
							
							[Docs] Fix typos in EP deployment doc ( #24669 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 09:07:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4984a291d5 
					 
					
						
						
							
							[Doc] Fix Markdown Pre-commit Error ( #24670 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-11 09:05:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						404c85ca72 
					 
					
						
						
							
							[Docs] Add transcription support to model ( #24664 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-11 07:39:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						817beef7f3 
					 
					
						
						
							
							[Bugifx] Fix qwen-next packed_modules_mapping ( #24656 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-11 22:26:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f6593b058 
					 
					
						
						
							
							[HybridKVCache][Platform] Add support_hybrid_kv_cache for platform ( #24646 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: MengqingCao <cmq0113@163.com > 
						
						
					 
					
						2025-09-11 21:47:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						94e6b2d55f 
					 
					
						
						
							
							Allow users to specify kv cache memory size ( #21489 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 13:41:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fd1ce98cdd 
					 
					
						
						
							
							[CI] Split mteb test from Language Models Test ( #24634 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-11 06:37:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d11ec124a0 
					 
					
						
						
							
							[Bench] Add qwen-next in benchmark_moe.py ( #24661 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-11 21:29:43 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f510715882 
					 
					
						
						
							
							[build] add torch to tool.uv no-build-isolation-package ( #24303 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 13:19:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f946197473 
					 
					
						
						
							
							[Docs] Fixes a typo in the qwen3next model name. ( #24654 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-09-11 19:35:14 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0cd72a7b72 
					 
					
						
						
							
							[XPU] add missing dependency tblib for XPU CI ( #24639 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Fanli Lin <fanli.lin@intel.com > 
						
						
					 
					
						2025-09-11 11:22:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5f5271f1ee 
					 
					
						
						
							
							Move LoRAConfig from config/__init__.py to config/lora.py ( #24644 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 11:01:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6249d0699 
					 
					
						
						
							
							Fix typing for safetensors_load_strategy ( #24641 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-11 10:41:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						25bb9e8c65 
					 
					
						
						
							
							[CI Failure] fix models/language/pooling/test_auto_prefix_cache_support.py ( #24636 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-11 03:31:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a1213fae5f 
					 
					
						
						
							
							[Misc] Add @NickLucche to codeowners ( #24647 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-11 17:18:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a8b0361c92 
					 
					
						
						
							
							[CI] Split pooling from entrypoints Test ( #24632 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-11 01:53:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed5ae4aace 
					 
					
						
						
							
							[Bugfix] Fix _synced_weight_loader ( #24565 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com > 
						
						
					 
					
						2025-09-11 16:52:33 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0fc36463e0 
					 
					
						
						
							
							[CI]Add transformers_utils to Async Engine, Inputs, Utils, Worker Test ( #24615 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com > 
						
						
					 
					
						2025-09-11 01:52:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d14c4ebf08 
					 
					
						
						
							
							[Docs] Use 1-2-3 list for deploy steps in deployment/frameworks/ ( #24633 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-11 01:50:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba6011027d 
					 
					
						
						
							
							[Docs] Update V1 doc to reflect whisper support ( #24606 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-11 01:50:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						85df8afdae 
					 
					
						
						
							
							[Docs] Revise frameworks/anything-llm.md ( #24489 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-11 01:50:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6aeb1dab4a 
					 
					
						
						
							
							[Bugfix] Fix incorrect import of CacheConfig ( #24631 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-11 01:48:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e93f4cc9e3 
					 
					
						
						
							
							Add the support for the qwen3 next model (a hybrid attention model). ( #24526 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-11 15:32:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2048c4e379 
					 
					
						
						
							
							[torchao] Support quantization configs using module swap ( #21982 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jerry Zhang <jerryzh168@gmail.com > 
						
						
					 
					
						2025-09-10 23:53:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d13360183a 
					 
					
						
						
							
							Remove redundant all gather + split ( #23441 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Chenxi Yang <cxyang@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com > 
						
						
					 
					
						2025-09-10 23:45:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9bd831f501 
					 
					
						
						
							
							[Model] New model support for Motif-1-Tiny ( #23414 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ca1207 <ca1207zzz@gmail.com >
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com >
Co-authored-by: WyldeCat <skan1543@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-10 23:29:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e2b1f863aa 
					 
					
						
						
							
							[Doc]: fixing doc typos ( #24635 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-10 23:19:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41329a0ff9 
					 
					
						
						
							
							[Core] feat: Add --safetensors-load-strategy flag for faster safetensors loading from Lustre ( #24469 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shiqi Sheng <shengshiqi@google.com >
Signed-off-by: shengshiqi-google <160179165+shengshiqi-google@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-10 23:10:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ee0bc5e1b4 
					 
					
						
						
							
							Enable --profile in 'vllm bench throughput' ( #24575 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com > 
						
						
					 
					
						2025-09-10 23:06:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d1393f6fc 
					 
					
						
						
							
							Kimi K2 Fused MoE kernels Optimization configs ( #24597 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Saman Keon <samanamp@outlook.com > 
						
						
					 
					
						2025-09-10 23:06:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a894084d2 
					 
					
						
						
							
							[Engine][Chore] use local variable and remove output var assignment ( #24554 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guy Stone <guys@spotify.com > 
						
						
					 
					
						2025-09-10 23:05:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e2d8c27f68 
					 
					
						
						
							
							[BugFix] Fix pipeline parallel ( #24621 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-10 23:05:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29799ddacc 
					 
					
						
						
							
							[Bugfix] Add missing VIT backend dispatch on CPU ( #24623 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-10 22:28:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f17a6aa4ec 
					 
					
						
						
							
							[Ultravox] Fix Gemma instantiation, support quantization via --hf-overrides ( #24131 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Salas <peter@fixie.ai > 
						
						
					 
					
						2025-09-10 22:25:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c8deacd72 
					 
					
						
						
							
							[Bug] [Spec Decode] Fix model_initialization test and mismatch in aux_hidden_layers ( #24613 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-09-10 21:23:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55b823ba0f 
					 
					
						
						
							
							Add @chaunceyjiang to codeowner for reasoning Reasoning and Tool parser ( #24406 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-11 04:23:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c5a747246 
					 
					
						
						
							
							[distributed] update known issues ( #24624 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-11 11:09:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5931b7e5d9 
					 
					
						
						
							
							[Models][Quantization] Add quantization configuration update in Voxtral model ( #24122 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexandre Marques <almarque@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-10 19:13:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc99baf14d 
					 
					
						
						
							
							[Misc] Make timeout passable in init_distributed_environment ( #24522 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jberkhahn <jaberkha@us.ibm.com > 
						
						
					 
					
						2025-09-10 15:41:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dcb28a332b 
					 
					
						
						
							
							[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration ( #21078 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: hjjq <hanjieq@nvidia.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-09-10 15:31:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fba7856581 
					 
					
						
						
							
							[Perf] Warmup FlashInfer attention during startup ( #23439 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com > 
						
						
					 
					
						2025-09-10 15:03:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b5e383cd8b 
					 
					
						
						
							
							[gpt-oss] raise error for flashinfer backend without trtllm ( #24482 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-10 14:33:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a161307f5 
					 
					
						
						
							
							[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends ( #19767 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-10 13:59:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37e8182bfe 
					 
					
						
						
							
							[v1] Add Whisper model support (encoder-decoder) ( #21088 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-10 13:53:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4db4426404 
					 
					
						
						
							
							[CI] Fail subprocess tests with root-cause error ( #23795 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-10 13:53:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0933c3bd6 
					 
					
						
						
							
							[Bugfix] Enable FP8 KV cache for FlashInfer and Triton backend on non-sm100 GPUs ( #24577 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-09-10 12:33:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						09e68bce34 
					 
					
						
						
							
							[Misc] update log level debug to warning when process port is used by ( #24226 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-10 11:32:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9fb74c27a7 
					 
					
						
						
							
							[Core] Support configuration parsing plugin ( #24277 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com >
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-10 11:32:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4032949630 
					 
					
						
						
							
							[Bugfix] Fix DeepEP config for DP4TP4 ( #23619 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-09-10 10:37:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08abfa78ec 
					 
					
						
						
							
							[Bugfix] fix modelopt exclude_modules name mapping ( #24178 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-10 10:20:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2bef2d1405 
					 
					
						
						
							
							[Logging] allow config logging stream ( #24336 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shiyan Deng <dsy842974287@meta.com > 
						
						
					 
					
						2025-09-10 15:02:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						36cacd0958 
					 
					
						
						
							
							[Doc] Add documentation for GLM-4.5 series models: tool-calling and reasoning parser ( #24589 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-09-10 07:50:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb3eb80d92 
					 
					
						
						
							
							[Core] Split LoRA layers ( #24574 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-10 07:47:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fcc0a3130a 
					 
					
						
						
							
							[CI] Fix tensorizer test assertion ( #24545 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Schuurman <psch@google.com > 
						
						
					 
					
						2025-09-10 06:57:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						736569da8d 
					 
					
						
						
							
							[Platform] Custom ops support for LMhead and LogitsProcessor ( #23564 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zzhx1 <zzh_201018@outlook.com > 
						
						
					 
					
						2025-09-10 06:26:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2eb9986a2d 
					 
					
						
						
							
							[BugFix] python collect_env.py and vllm collect-env compatibility with uv venv ( #24066 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-09-10 21:25:33 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ccee371e86 
					 
					
						
						
							
							[Docs] Fix warnings in mkdocs build (continued) ( #24092 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-10 06:23:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c0bd6a684a 
					 
					
						
						
							
							Fix Auto_Round Quatization Loading on SM75 and Lower GPUs ( #24217 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: RoadToNowhereX <37441177+RoadToNowhereX@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-10 06:22:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3144d90217 
					 
					
						
						
							
							fix some typos ( #24167 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: co63oc <co63oc@users.noreply.github.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-10 06:21:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f5e5c18de 
					 
					
						
						
							
							[CI/Build] bump timm dependency ( #24189 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-10 06:20:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd98842c8a 
					 
					
						
						
							
							[CI] Add PPL test for generation models ( #24485 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-10 06:16:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6069887c6 
					 
					
						
						
							
							[rocm] enable torchao quantization for rocm ( #24400 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lifan Shen <lifans@meta.com > 
						
						
					 
					
						2025-09-10 06:16:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						492196ed0e 
					 
					
						
						
							
							[CI/Build] split true unit tests to Entrypoints Unit Tests ( #24418 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-10 06:16:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4f1a8df22 
					 
					
						
						
							
							[BugFix] Ensure integrity of reused CPU tensors during async scheduling ( #24527 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: guoze.lin <guozelin@tencent.com > 
						
						
					 
					
						2025-09-10 21:15:14 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b9a612fa3 
					 
					
						
						
							
							[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat ( #24549 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lacora2017 <yehu@meta.com >
Co-authored-by: lacora2017 <yehu@meta.com > 
						
						
					 
					
						2025-09-10 21:14:55 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c04eef706 
					 
					
						
						
							
							[BugFix][Multi Modal] Fix TensorSchema shape mismatch in Molmo ( #24559 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-09-10 06:14:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f36355abfd 
					 
					
						
						
							
							Move LoadConfig from config/__init__.py to config/load.py ( #24566 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-10 06:14:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9e3c3a7df2 
					 
					
						
						
							
							[LoRA]: Add LoRA support to Mistral's Voxtral models ( #24517 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yash Pratap Singh <yashsingh20001@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-10 06:12:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6cbd41909e 
					 
					
						
						
							
							Feature/vit attention unification# 23880 ( #23978 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-10 06:10:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72d30108a0 
					 
					
						
						
							
							Support for NemotronH Nano VLM ( #23644 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com > 
						
						
					 
					
						2025-09-10 06:10:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8b83b93739 
					 
					
						
						
							
							[Docs] Document the extra memory footprint overhead when using EPLB ( #24537 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-09-10 06:09:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9dbefd88e9 
					 
					
						
						
							
							[Docs] Improve organisation of API Reference nav ( #24569 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-10 06:08:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c195d43da 
					 
					
						
						
							
							[ROCm][Bugfix] Fix Aiter RMSNorm  ( #23412 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-09-10 21:08:03 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ae43dbf8c 
					 
					
						
						
							
							[Attention] add DCP support for FLASH_ATTN_MLA backend ( #24453 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com > 
						
						
					 
					
						2025-09-10 17:19:26 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						267c80d31f 
					 
					
						
						
							
							[Model] Limit CPU threads for image transformations in InternVL to reduce cpu contention. ( #24519 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: li-jinpeng <3332126450@qq.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-10 16:45:44 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						77f62613f9 
					 
					
						
						
							
							Consolidate rendering parameters into RenderConfig dataclass ( #24543 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-09-10 08:44:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						feaf202e93 
					 
					
						
						
							
							[Bugfix] Guard _may_reorder_batch for encoder-only models on CPU ( #24319 ) ( #24348 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Remy <eunhwan.shin@dtonic.io >
Co-authored-by: Li, Jiang <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-10 14:24:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						91130ae376 
					 
					
						
						
							
							[docs] promo pytorch conf and ray summit ( #24562 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-09 23:24:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e40827280b 
					 
					
						
						
							
							[Docs] Enable relative links in examples to function when rendered in the docs ( #24041 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-09 21:40:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4377b1ae3b 
					 
					
						
						
							
							[Bugfix] Update Run:AI Model Streamer Loading Integration ( #23845 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai >
Signed-off-by: Peter Schuurman <psch@google.com >
Co-authored-by: Omer Dayan (SW-GPU) <omer@run.ai >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-09 21:37:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						009d689b0c 
					 
					
						
						
							
							[Core] Simplify and unify mm uuid handling & auto-generated mm hash overrides processing.  ( #24271 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-09-09 21:36:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0efdb5c3ba 
					 
					
						
						
							
							[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading ( #24154 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wei Wei <wwei6@meta.com > 
						
						
					 
					
						2025-09-10 04:27:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53b42f4102 
					 
					
						
						
							
							[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 ( #24392 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-09-09 21:24:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						309d7aa401 
					 
					
						
						
							
							[P/D] MultiConnector supports shutdown ( #24425 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-09 21:24:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4a01aaf95 
					 
					
						
						
							
							[KV Connector] More async support for get_num_new_matched_tokens ( #23620 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ApostaC <yihua98@uchicago.edu > 
						
						
					 
					
						2025-09-09 21:23:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83dd28aae4 
					 
					
						
						
							
							[CI] Adjust threshold for flaky ngram spec decoding test ( #24528 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-09 21:07:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f88e84016f 
					 
					
						
						
							
							[BugFix] Fix async core engine client finalizer ( #24540 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-09 21:07:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c2156b3af 
					 
					
						
						
							
							[Hardware][Apple-CPU] Enable native bfloat16 on Apple Silicon (M2 and later) ( #24129 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ignaciosica <mignacio.sica@gmail.com > 
						
						
					 
					
						2025-09-10 03:50:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e7db04310 
					 
					
						
						
							
							[CI] Retry flaky fp8 cutlass mla tests ( #24536 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-09 20:33:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41f160b974 
					 
					
						
						
							
							Add @heheda12345 to CODEOWNERS of KVCacheManager related code ( #24546 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-10 03:30:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc625ea6b8 
					 
					
						
						
							
							[Perf] Convert np array to torch tensor to index into block table for attn chunking ( #24474 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-09-09 20:01:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b23fb78623 
					 
					
						
						
							
							[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. ( #24538 )  
						
						 
						
						
						
						
					 
					
						2025-09-09 17:53:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						561f38dc3c 
					 
					
						
						
							
							[Bugfix] Improve EPLB config validation error message ( #24524 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-09-10 00:32:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73e688cb79 
					 
					
						
						
							
							[ROCm][Feature] Enable Pipeline Parallelism with Ray Compiled Graph on ROCm ( #24275 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-09-09 23:27:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb1a8f932a 
					 
					
						
						
							
							[Benchmark] Add option to skip oversampling in benchmark ( #24457 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com > 
						
						
					 
					
						2025-09-09 22:00:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0dc9cbb527 
					 
					
						
						
							
							[Benchmark] Update bench doc with mtbench, blazedit, spec bench ( #24450 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com > 
						
						
					 
					
						2025-09-09 21:15:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b5fb3005a8 
					 
					
						
						
							
							[Log] Use a relative path in debug-level logs to distinguish files with identical names ( #23846 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-09-09 16:46:35 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						15de5ff9ea 
					 
					
						
						
							
							[Feature] Disallow FlashMLA on Blackwell ( #24521 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-09 14:59:34 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b8a93076d3 
					 
					
						
						
							
							[CI] execute all piecewise compilation tests together ( #24502 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-09-09 11:05:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c3f9773b2c 
					 
					
						
						
							
							[TPU] Fix tpu structured decoding in mixed batches ( #24458 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-09-09 11:04:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3707cb2505 
					 
					
						
						
							
							[Docs] Gemma3n transcriptions endpoint support ( #24512 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-09 11:03:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						920ed46b09 
					 
					
						
						
							
							[Misc] bump outlines_core to fix the version conflicts with outlines >= 1.2.0 ( #24368 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kazuhiro Serizawa <nserihiro@gmail.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-09 10:59:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						15cb047e25 
					 
					
						
						
							
							Extend renderer with embedding support and integrate completion endpoint ( #24405 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-09-10 01:46:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ad0688e43 
					 
					
						
						
							
							[Bugfix] Fix  hidden_size for multimodal classification model ( #24501 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-09 10:37:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9a1c4c8a2 
					 
					
						
						
							
							[ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork ( #24279 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-09 12:21:56 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1aa427fdc1 
					 
					
						
						
							
							[Kernels] Add Flash Linear Attention Kernels ( #24518 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-10 00:04:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c63a16b65 
					 
					
						
						
							
							[Core] Run garbage collector after CUDA graph capture to fix throughput regression ( #24128 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-09-09 10:38:10 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						922d3b401b 
					 
					
						
						
							
							[Bugfix] Handle the edge case in detokenizer where processed tokens contain both stop str and eos token ( #23938 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dtransposed <damian.bogunowicz@gmail.com > 
						
						
					 
					
						2025-09-09 07:30:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						19332c0479 
					 
					
						
						
							
							[Model] Systematic support for fp32 head, pooling models part ( #23810 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-09 07:29:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a55cf41a09 
					 
					
						
						
							
							[Compilation][WideEP] Enable Piecewise CUDAGraph for DeepEPHT ( #24123 )  
						
						 
						
						
						
						
					 
					
						2025-09-09 10:21:10 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6fb2788163 
					 
					
						
						
							
							[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency ( #24411 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-09 10:02:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d2a2de8f7 
					 
					
						
						
							
							[RL] fast weight update with zmq + ipc handles ( #24295 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: huangweixiao <huangweixiao@msh.team >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-09 16:57:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1116590b16 
					 
					
						
						
							
							[gpt-oss] Validate gpt-oss python tool during initialization ( #23856 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-09 08:37:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ccb97338af 
					 
					
						
						
							
							[Misc] Add Codex settings to gitignore ( #24493 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-09-09 01:25:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45c9cb5835 
					 
					
						
						
							
							[Misc] Add claude settings to gitignore ( #24492 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-09 01:14:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e283976f3a 
					 
					
						
						
							
							[Performance][MM] Building the inverse permutation in O(n) time in Qwen2_5_VisionTransformer ( #24443 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: Junhong <liujunhong11@huawei.com > 
						
						
					 
					
						2025-09-09 00:24:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						46876dff32 
					 
					
						
						
							
							[Doc]: fixing typos to improve docs ( #24480 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-08 23:06:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1823a00d67 
					 
					
						
						
							
							[Misc] Support bench serve long context ( #24373 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-09-08 22:53:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed16d0f26f 
					 
					
						
						
							
							[Doc] mention fpdb for multiprocess breakpoints ( #24452 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mickael Seznec <mickael@mistral.ai > 
						
						
					 
					
						2025-09-08 21:46:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0cdd213641 
					 
					
						
						
							
							[Misc] Improve Worker process title and logging prefix ( #22205 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 21:43:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						948dd3443b 
					 
					
						
						
							
							[Bugfix] Fix Apertus HF repo name ( #24447 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-08 21:40:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2f7745774 
					 
					
						
						
							
							Add data_parallel_size to VllmConfig string representation ( #24298 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Cong Chen <congc@meta.com > 
						
						
					 
					
						2025-09-08 21:35:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82dfb12e52 
					 
					
						
						
							
							[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead ( #23673 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: linzebing <linzebing1995@gmail.com > 
						
						
					 
					
						2025-09-08 21:34:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bba1042c6f 
					 
					
						
						
							
							[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel ( #23647 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 20:53:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b6fbc15634 
					 
					
						
						
							
							[BugFix][Model] Fix Ernie4.5-VL hanging on long inputs ( #24074 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangyafeng <wangyafeng@baidu.com > 
						
						
					 
					
						2025-09-09 11:37:16 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e0d4a3475 
					 
					
						
						
							
							Move KVTransferConfig from config/__init__.py to config/kv_transfer.py ( #24434 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 20:30:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						562663a044 
					 
					
						
						
							
							Bump actions/github-script from 7.0.1 to 8.0.0 ( #24413 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-09 03:12:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed1623a88a 
					 
					
						
						
							
							Bump actions/stale from 9.1.0 to 10.0.0 ( #24412 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-09 03:11:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						13b89bd823 
					 
					
						
						
							
							[doc] update vllm serve cli args documentation ( #24329 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com > 
						
						
					 
					
						2025-09-09 03:07:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						22a0070530 
					 
					
						
						
							
							Bump actions/setup-python from 5.4.0 to 6.0.0 ( #24414 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-09 02:54:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						170129eb28 
					 
					
						
						
							
							[gpt-oss] Harmony changes with container tool support ( #23386 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhiweiz <zhiweiz@fb.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
Co-authored-by: zhiweiz <zhiweiz@fb.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 19:03:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						955c624915 
					 
					
						
						
							
							[Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE ( #24134 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com > 
						
						
					 
					
						2025-09-08 19:01:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f87abdcc6 
					 
					
						
						
							
							Update reviewers for modelopt related files ( #24468 )  
						
						 
						
						
						
						
					 
					
						2025-09-09 01:53:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6910b56da2 
					 
					
						
						
							
							[CI] Add nightly multiarch manifests to dockerhub ( #24102 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-09 01:18:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e10fef0883 
					 
					
						
						
							
							[Hardware][IBM Z] Fix Outlines Core issue for s390x ( #24034 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com > 
						
						
					 
					
						2025-09-08 16:50:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e680723eba 
					 
					
						
						
							
							[Bugfix] Disable the statslogger if the api_server_count is greater than 1 ( #22227 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-08 15:28:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						620db1fc58 
					 
					
						
						
							
							[Attention] FlashAttention MLA cudagraph support ( #23958 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 22:05:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41183c1fe0 
					 
					
						
						
							
							[Spec Decode] Fix offline spec_decode.py ( #24257 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-08 20:44:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						43d9ad03ba 
					 
					
						
						
							
							[Model loader]: support multi-thread model weight loading ( #23928 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com >
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-08 18:49:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7be141b2c5 
					 
					
						
						
							
							[CI] Enable encoder model compilation test ( #24442 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-09-08 11:48:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d7f39b48c 
					 
					
						
						
							
							[Model] Remove quantized mixtral ( #24437 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-08 11:02:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd08636926 
					 
					
						
						
							
							[Spec Decode][Benchmark] Add Blitzedit dataset ( #23605 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-08 10:32:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3feeeb9fea 
					 
					
						
						
							
							[Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking ( #23563 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 10:32:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6f4a82f8b5 
					 
					
						
						
							
							[Model] Enable BNB support for qwen2_5_omni_thinker ( #24420 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-08 09:37:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c44797a4d6 
					 
					
						
						
							
							[Docs]add eplb_config param use docs ( #24213 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-09-08 09:36:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55be93baf5 
					 
					
						
						
							
							[Doc]: fix 2 hyperlinks leading to Ray site after they changed Ray's doc structure ( #24438 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 09:36:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						717fc00e98 
					 
					
						
						
							
							[Docs] Move feature compatibility tables to README ( #24431 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 06:45:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01dfb5e982 
					 
					
						
						
							
							[Frontend] User-provided uuids for medias in chat. (RFC  #22044 ) ( #23449 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.me >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-09-08 06:42:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						03dd652c16 
					 
					
						
						
							
							Move KVEventsConfig from config/__init__.py to config/kv_events.py ( #24433 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 06:41:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9cd76b71ab 
					 
					
						
						
							
							[Misc] Terratorch related fixes ( #24337 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-08 06:40:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e041314184 
					 
					
						
						
							
							[Bugfix] Fix mamba2 prefill chunking ( #23279 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-08 11:42:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e537f45b4 
					 
					
						
						
							
							[Bugfix] Fix get_quant_config when using modelscope ( #24421 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com > 
						
						
					 
					
						2025-09-08 11:03:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c2a8b08fcd 
					 
					
						
						
							
							[Doc] Fix issues in integrations/llamastack.md ( #24428 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-08 02:28:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4962a6d55 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24417 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-08 00:22:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f0b833a05 
					 
					
						
						
							
							[Docs] Fix a tip indentation and typo ( #24419 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-09-08 00:19:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						425b04b8f4 
					 
					
						
						
							
							[gpt-oss][Responses API] Fix the function call id format ( #24409 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-08 06:49:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						60f0843ef8 
					 
					
						
						
							
							[Model] Remove unnecessary CUDA sync of Qwen2VL image and video preprocess ( #24334 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Win <chatcharinsang@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-07 23:11:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a46602606 
					 
					
						
						
							
							[Model] Remove unnecessary CUDA sync of GLM-4.1V image and video preprocess ( #24332 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Win <chatcharinsang@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-07 23:10:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61aa4b2901 
					 
					
						
						
							
							[P/D] Add a shutdown method to the Connector API ( #22699 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-07 23:07:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c892b1831 
					 
					
						
						
							
							[Doc] Fix UTF-8 encoding issues in documentation generation on Windows ( #24361 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: alekramelaheehridoy <aliqramalaheehridoy@gmail.com >
Signed-off-by: alekramelaheehridoy <alekramelaheehridoy@gmail.com >
Co-authored-by: alekramelaheehridoy <alekramelaheehridoy@gmail.com > 
						
						
					 
					
						2025-09-07 22:33:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3bca396f79 
					 
					
						
						
							
							[CI/Build] Fix local image inputs in test_pixtral.py ( #24401 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-08 03:31:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a3e91bdfe 
					 
					
						
						
							
							[CI/Build] Disable flaky test_structured_output tests ( #24404 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-09-08 02:51:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3d7e3c845 
					 
					
						
						
							
							[Sampler] Support returning all prompt logprobs ( #23868 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-07 19:34:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67841317d1 
					 
					
						
						
							
							[xpu] upgrade ipex/python3.12 for xpu ( #23830 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yan Ma <yan.ma@intel.com > 
						
						
					 
					
						2025-09-08 02:07:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86173ad593 
					 
					
						
						
							
							[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA ( #24385 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-08 09:27:12 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						795b6951cd 
					 
					
						
						
							
							Add @luccafong to codeowner for spec decode ( #24397 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-09-08 08:30:27 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e5d21378d 
					 
					
						
						
							
							Skip MM Encoder for non-first PP ranks ( #24387 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-07 09:38:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0661cb9df3 
					 
					
						
						
							
							Add renderer-based prompt processing for embedding and classification endpoints ( #24356 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-09-07 08:26:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						105d3d62ef 
					 
					
						
						
							
							[TPU] Remove TopKTopPSampler dependency for TPU sampler ( #24391 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-07 01:12:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						62f66be1f7 
					 
					
						
						
							
							[Bugfix] Fix Qwen3-coder moe tuned config ( #24072 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-07 05:19:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						81c53ef55c 
					 
					
						
						
							
							[Misc] collect flashinfer version in collect_env.py ( #24378 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-07 03:30:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75334956c2 
					 
					
						
						
							
							QWEN3 Thinking Fused MoE kernels Optimization configs ( #24330 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Saman Keon <samanamp@outlook.com > 
						
						
					 
					
						2025-09-07 03:18:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						77aec83b8c 
					 
					
						
						
							
							[Benchmark] add benchmark for custom activation op ( #23908 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-06 20:12:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e67597545b 
					 
					
						
						
							
							[CI][Fix] deterministic seed for flaky CI runs on structured outputs ( #24380 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-09-07 11:10:40 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37a6fa95fd 
					 
					
						
						
							
							Migrate Qwen2 inputs to TensorSchema ( #23475 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-06 20:07:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						558f0907dc 
					 
					
						
						
							
							[attention][DCP] use AttentionImpl.need_to_return_lse_for_decode ( #24372 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-07 01:18:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4172235ab7 
					 
					
						
						
							
							[V0 deprecation] Deprecate V0 Neuron backend ( #21159 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-06 16:15:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						848562bd49 
					 
					
						
						
							
							break execute_model in gpu_model_runner into sub-functions for custom scopes ( #24265 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Bangsheng Tang <bangsheng@meta.com > 
						
						
					 
					
						2025-09-06 14:02:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e68dc2f014 
					 
					
						
						
							
							[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test ( #24370 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-09-06 20:39:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3645ed94d 
					 
					
						
						
							
							[Frontend][Responses API] Support reporting tool output tokens and fix reasoning token count ( #24285 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-09-06 13:27:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb691ee4e7 
					 
					
						
						
							
							[Fix] [gpt-oss] fix non-tool calling path for chat completion ( #24324 )  
						
						 
						
						
						
						
					 
					
						2025-09-06 19:10:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6024d115cd 
					 
					
						
						
							
							Lora bias(enable_lora_bias) deprecate warning ( #24339 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-07 00:42:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7555d6b34a 
					 
					
						
						
							
							[Bugfix] Fix test_mixtral_moe ( #24371 )  
						
						 
						
						
						
						
					 
					
						2025-09-06 09:32:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						00a4e56d8d 
					 
					
						
						
							
							[Bugfix] Fix broken deepseek fp8 TP weights loading ( #24367 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-06 09:23:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0eadaeff7e 
					 
					
						
						
							
							[Bugfix] Avoid uninitialized usage of azp_val when AZP is false. ( #24335 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mohan Kumar Kumar <mohan.cbein@gmail.com >
Signed-off-by: mohankku <mohan.cbein@gmail.com > 
						
						
					 
					
						2025-09-06 08:17:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0077c8634e 
					 
					
						
						
							
							Add @benchislett to codeowner for spec decode and structured outputs ( #24362 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-09-06 22:03:35 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b121ca22ad 
					 
					
						
						
							
							[CI] Disable flaky structured output test from CI ( #24366 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-09-06 13:31:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eddaafc1c7 
					 
					
						
						
							
							[Multimodal] Improve max video embedding length estimation in V1 ( #24312 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-09-06 02:33:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						305a1cc0d2 
					 
					
						
						
							
							refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer ( #24345 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-09-05 23:01:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d6c6b05d3 
					 
					
						
						
							
							[New Model]: google/embeddinggemma-300m ( #24318 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-05 22:58:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53b19ccdd5 
					 
					
						
						
							
							[Core] Allow disabling TP sharding for parallel Linear layer ( #23024 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-05 22:53:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6432739ef1 
					 
					
						
						
							
							[Bugfix] Catch and log invalid token ids in detokenizer ( #24351 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-05 22:30:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac201a0eaf 
					 
					
						
						
							
							[Feature] Support Decode Context Parallel (DCP) for MLA ( #23734 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: hongchao <hongchao@msh.team >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: hongchao <hongchao@msh.team >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-06 13:24:05 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c529fc994 
					 
					
						
						
							
							[KV Sharing] Raise error if using eagle with fast prefill ( #24350 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-09-05 20:22:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35bf193864 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24294 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-05 19:41:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35efa70297 
					 
					
						
						
							
							Add @22quinn as code reviewer for RL related components ( #24346 )  
						
						 
						
						
						
						
					 
					
						2025-09-06 01:56:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cee182b297 
					 
					
						
						
							
							[Perf][V1] Fully overlap model execution ( #23569 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-09-05 18:20:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c954c6629c 
					 
					
						
						
							
							[CI] Add timeouts to tests ( #24260 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-05 17:26:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9dfbeb41e5 
					 
					
						
						
							
							[RFC] allow cancelation after shutdown in blocking collective_rpc ( #23390 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shiyan Deng <dsy842974287@meta.com > 
						
						
					 
					
						2025-09-05 14:14:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eedb2a2a10 
					 
					
						
						
							
							[Bugfix] Fix silu_mul+quant fusion test ( #24341 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-09-05 20:13:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23a6c5280e 
					 
					
						
						
							
							[gpt-oss][Bugfix]Fix streamableparser for missing handling of certain token_ids ( #24306 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-09-05 10:26:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7812bcf278 
					 
					
						
						
							
							[docs] add shenzhen meetup ( #24326 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-09-05 22:48:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						006e7a34ae 
					 
					
						
						
							
							Adding int4 and int8 models for CPU benchmarking ( #23709 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tsai, Louie <louie.tsai@intel.com > 
						
						
					 
					
						2025-09-05 20:08:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e599e2c65e 
					 
					
						
						
							
							[XPU][P/D] Add XPU support in NixlConnector ( #22436 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhenwei <zhenwei.liu@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-09-04 21:03:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c29fb540ff 
					 
					
						
						
							
							[gpt-oss] tool parser supports for /chat/completions [1/n] ( #22386 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-04 20:39:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65e038931d 
					 
					
						
						
							
							[Frontend] Skip unnecessary detokenization when token_id is requested ( #24236 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-04 23:04:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						886ccbe5ba 
					 
					
						
						
							
							[CI/Build] Reduce the number of redundant cases to test for LoRA ( #24276 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhuohan Li <zhuohan123@gmail.com > 
						
						
					 
					
						2025-09-04 21:58:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						adc3ddb430 
					 
					
						
						
							
							[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files ( #23727 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-04 14:25:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						60b755cbcb 
					 
					
						
						
							
							[Misc] Have AsyncLLM custom_stat_loggers extend default logger list ( #20952 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-04 14:25:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						482e52f56c 
					 
					
						
						
							
							QWEN3 Coder Fused MoE kernels Optimization configs ( #24266 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Saman Keon <samanamp@outlook.com > 
						
						
					 
					
						2025-09-04 20:33:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78336a0c3e 
					 
					
						
						
							
							Upgrade FlashInfer to v0.3.0 ( #24086 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-09-04 09:49:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						94866d7c93 
					 
					
						
						
							
							[Misc] Slight improve deepgemm print ( #24085 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-04 16:06:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83609ca91d 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24173 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-04 08:52:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e41a0fa377 
					 
					
						
						
							
							[Perf] Freeze core engine proc heap after init ( #24008 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-09-04 22:55:23 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37241077d5 
					 
					
						
						
							
							[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp ( #23725 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Lin <jullin@nvidia.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-04 09:25:40 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9f7081f9c 
					 
					
						
						
							
							[LoRA]: Add lora support to qwen-2.5-omni ( #24231 )  
						
						 
						
						
						
						
					 
					
						2025-09-04 05:50:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						16ded21eeb 
					 
					
						
						
							
							[XPU] support Triton Attention backend on Intel GPU ( #24149 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-09-04 20:41:08 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b30afa442 
					 
					
						
						
							
							Use hidden_size_per_head as head_size fallback ( #24221 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com > 
						
						
					 
					
						2025-09-04 12:59:16 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eafa8dcde6 
					 
					
						
						
							
							[Model] Add pp support for hunyuan ( #24212 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-09-04 03:58:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c7af8110a 
					 
					
						
						
							
							[Doc] Update vLLM Singapore Meetup info ( #24234 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-09-04 02:58:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f423e5f43 
					 
					
						
						
							
							[Feature][Response API] Add streaming support for non-harmony ( #23741 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-09-04 17:49:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						369a079568 
					 
					
						
						
							
							[Hardware][Apple-CPU] Disable OneDNN build for Apple Silicon ( #24200 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ignaciosica <mignacio.sica@gmail.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-04 02:48:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						402759d472 
					 
					
						
						
							
							[Attention] FlashAttn MLA ( #14258 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com > 
						
						
					 
					
						2025-09-04 02:47:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c301ee2eb 
					 
					
						
						
							
							[Bugfix] Fix Incremental Detokenization with tokenizers == 0.22.0 ( #24159 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Fanli Lin <fanli.lin@intel.com >
Signed-off-by: Fanli Lin <fanli0116@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-04 02:47:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3efb9f4d95 
					 
					
						
						
							
							[Attention][Platform] Refactor MLA to support Custom Op ( #23332 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: whx-sjtu <2952154980@qq.com > 
						
						
					 
					
						2025-09-04 02:46:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04f3c35cff 
					 
					
						
						
							
							Improve flexibility of auto_tune.sh execution. ( #23766 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Anthony Su <50185138+anthonsu@users.noreply.github.com >
Signed-off-by: anthonsu <50185138+anthonsu@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-04 09:41:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51d5e9be7d 
					 
					
						
						
							
							[Core][Model] Terratorch backend integration ( #23513 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com >
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-04 00:22:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7fc70016f 
					 
					
						
						
							
							[Model] Add MiDashengLM model support ( #23652 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chenbing8 <chenbing8@xiaomi.com >
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-04 00:08:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12e1e63cc5 
					 
					
						
						
							
							[Misc] Enhance output readability of helper script ( #24214 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Weida Hong <wdhongtw@google.com > 
						
						
					 
					
						2025-09-04 06:38:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57b1ce94f7 
					 
					
						
						
							
							[CPU] Refactor CPU unquantized linear ( #24150 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-09-04 14:28:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb55ad86fe 
					 
					
						
						
							
							Migrate ultravox inputs to TensorSchema ( #23503 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-09-04 06:09:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						712b273f65 
					 
					
						
						
							
							[Refactor] Introduce basic Renderer for completion-style request ( #24010 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-09-04 05:21:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e919d6f549 
					 
					
						
						
							
							[Kernel][Bugfix] Fix grouped topk cu ( #24146 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mayuyuace <qiming1.zhang@intel.com > 
						
						
					 
					
						2025-09-04 12:37:37 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a38f8bd54c 
					 
					
						
						
							
							[Feature][Responses API]Support MCP tools with streaming mode + background mode ( #23927 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wuhang <wuhang6@huawei.com > 
						
						
					 
					
						2025-09-04 04:05:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b5ee1e3261 
					 
					
						
						
							
							Remove deprecated PyNcclConnector ( #24151 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-09-03 22:49:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						36c260dad6 
					 
					
						
						
							
							[Feature][gpt-oss] Add support for num_cached_tokens and num_reasoning_tokens tracking ( #23460 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: George Nagy II <george.nagy0969@gmail.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-09-03 21:08:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a43a3f1770 
					 
					
						
						
							
							[Bugfix][DP] DP distribution does not require ray[default] ( #23822 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-09-03 13:21:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6adaed42f4 
					 
					
						
						
							
							[Feature][P/D]: Optimize NIXL Connector xfer Launch ( #23887 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ycyaw66 <497410282@qq.com >
Co-authored-by: ycyaw66 <497410282@qq.com > 
						
						
					 
					
						2025-09-03 19:14:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a742322092 
					 
					
						
						
							
							[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend ( #23289 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni@redhat.com > 
						
						
					 
					
						2025-09-03 14:05:24 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						731a6940e3 
					 
					
						
						
							
							Migrate whisper inputs to TensorSchema ( #23505 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-09-03 18:04:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e9b92dcd89 
					 
					
						
						
							
							[Kernels] Overlap shared experts with send/recv ( #23273 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-09-03 12:35:18 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa4311d85f 
					 
					
						
						
							
							[V1] v1 engine + full CUDA graph support for PLaMo2 ( #23998 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hemmi Shinichi <shemmi@preferred.jp >
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com >
Co-authored-by: Hemmi Shinichi <shemmi@preferred.jp >
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com > 
						
						
					 
					
						2025-09-03 08:24:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d80ae83e1 
					 
					
						
						
							
							[Bugfix] Fixing division by zero in triton_attn if query_heads/kv_heads > 16  ( #23424 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com > 
						
						
					 
					
						2025-09-03 15:01:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ba0c587ba 
					 
					
						
						
							
							FIX: Add libnuma-dev to Dockerfile for dev stage ( #20388 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dongbo910220 <1275604947@qq.com > 
						
						
					 
					
						2025-09-03 07:17:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6997a25ac6 
					 
					
						
						
							
							[Model] Remove useless code from MiniMax implementation ( #23982 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: QscQ <qscqesze@gmail.com >
Signed-off-by: qingjun <qingjun@minimaxi.com > 
						
						
					 
					
						2025-09-03 11:27:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						28f350e147 
					 
					
						
						
							
							Support add_generation_prompt in embeddings endpoint with chat request ( #23931 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: biba10 <jaksmid@seznam.cz > 
						
						
					 
					
						2025-09-03 10:47:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51383bd472 
					 
					
						
						
							
							[CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant ( #24088 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-09-03 17:23:56 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9c99e4871f 
					 
					
						
						
							
							[Misc] Clean up deadcode for legacy processing pipeline ( #24153 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-03 08:34:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70549c1245 
					 
					
						
						
							
							[CI/Build] Serve images used by multimodal tests through local HTTP Server ( #23907 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com >
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-09-03 16:13:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f0c503f66e 
					 
					
						
						
							
							[Nixl] Heterogeneous TP support FlashInfer ( #20189 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-03 15:19:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f38035c123 
					 
					
						
						
							
							[distributed][rl] remove nccl cumem env var override ( #24141 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-03 06:45:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						426cc8629f 
					 
					
						
						
							
							[BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models ( #24132 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-09-03 04:57:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e81d4e69c1 
					 
					
						
						
							
							[Misc] Add check for dual_chunk_attention ( #24070 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-09-03 04:19:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						02d411fdb2 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24115 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-02 21:14:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d7e1e59972 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24093 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-02 21:05:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c4ed78b14f 
					 
					
						
						
							
							[Compile] Fix Compile Warning for w4a8_mm_entry.cu ( #23660 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-09-02 20:45:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1bd007f234 
					 
					
						
						
							
							fix some typos ( #24071 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: co63oc <co63oc@users.noreply.github.com > 
						
						
					 
					
						2025-09-02 20:44:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						136d853e65 
					 
					
						
						
							
							[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing ( #23656 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@redhat.com > 
						
						
					 
					
						2025-09-03 02:52:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e32a0e8678 
					 
					
						
						
							
							Upgrade xgrammar to 0.1.23 ( #22988 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-09-03 02:32:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						42dc59dbac 
					 
					
						
						
							
							Update release pipeline post PyTorch 2.8.0 update ( #24073 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-09-03 10:09:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						862f2ef893 
					 
					
						
						
							
							[XPU] Fix the bug of LoRA logits on the XPU platform ( #24081 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chzhang <chaojun.zhang@intel.com > 
						
						
					 
					
						2025-09-03 08:21:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2fd1a40a54 
					 
					
						
						
							
							[CI/Build] Disable SiluMul NVFP4 quant fusion tests ( #24121 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni@redhat.com > 
						
						
					 
					
						2025-09-02 16:50:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						930a24144c 
					 
					
						
						
							
							[Bug] R1 Accuracy: Fix routed_scaling_factor Double Mul Issue ( #24119 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-02 22:22:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						457e471971 
					 
					
						
						
							
							[AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault ( #23692 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-09-02 22:13:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d328f7894f 
					 
					
						
						
							
							[CI] Enable all hf transformers baselines in test_hybrid ( #23936 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-09-02 20:15:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98aee612aa 
					 
					
						
						
							
							[Log] Only Print Profiler Results on Rank 0 ( #23370 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-09-02 18:53:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						598bd74cf8 
					 
					
						
						
							
							Fix weights loading for Apertus ( #24100 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nathan Ranchin <nranchin@student.ethz.ch > 
						
						
					 
					
						2025-09-02 18:34:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2417798471 
					 
					
						
						
							
							[Metrics] Deprecate TPOT in favor of ITL ( #24110 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-09-02 18:10:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9480ae24e3 
					 
					
						
						
							
							[Bugfix] Fix packed_factor missing attribute error ( #23902 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com > 
						
						
					 
					
						2025-09-02 10:56:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f399182e8c 
					 
					
						
						
							
							Run ruff format on a few files. ( #24075 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-09-02 17:55:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c41310584 
					 
					
						
						
							
							[Bugfix] Fix transform_config parsing in Compressed Tensors ( #23945 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-09-02 13:54:10 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c83c4ff815 
					 
					
						
						
							
							[Benchmark] Add support for local hf dataset path in benchmark ( #23999 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-09-02 17:49:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e1759cd54 
					 
					
						
						
							
							[docs] add SYS_NICE cap & security-opt for docker/k8s ( #24017 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Peter Pan <peter.pan@daocloud.io >
Co-authored-by: Li, Jiang <bigpyj64@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-02 17:27:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e66ed3e675 
					 
					
						
						
							
							[CI Failure] Skip failing nvfp4 silu test ( #23959 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-02 13:18:15 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0653f6c0b 
					 
					
						
						
							
							[Model] Classification models support logit_bias / sigmoid_normalize ( #24031 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-09-02 16:48:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						38ba061f6f 
					 
					
						
						
							
							[BugFix] Fix EXAONE4 rotary embeddings ( #23918 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lkm2835 <lkm2835@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-02 14:40:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0a74e9d0f2 
					 
					
						
						
							
							[Gemma3n] Fix audio batching ( #24052 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-09-02 22:23:35 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8bd5844989 
					 
					
						
						
							
							correct LWS deployment yaml ( #23104 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cberge908 <42270330+cberge908@users.noreply.github.com > 
						
						
					 
					
						2025-09-02 12:04:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce30dca5c4 
					 
					
						
						
							
							[CI]: reduce HTTP calls inside entrypoints openai tests ( #23646 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: AzizCode92 <azizbenothman76@gmail.com >
Signed-off-by: Aziz <azizbenothman76@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-02 10:49:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f0bab3f26 
					 
					
						
						
							
							[Model] Support dp on ViT on GLM-4.5V ( #23168 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Chen <530634352@qq.com > 
						
						
					 
					
						2025-09-02 10:48:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fad73be1a5 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24077 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-02 02:38:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56d04089ef 
					 
					
						
						
							
							Migrate Interns1 inputs to TensorSchema ( #23510 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-09-02 04:35:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7be0cb8e9e 
					 
					
						
						
							
							[XPU][Feature] fp8 online quantization support for XPU ( #23148 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yan Ma <yan.ma@intel.com >
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com > 
						
						
					 
					
						2025-09-02 04:06:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1fa1d6a9a0 
					 
					
						
						
							
							Migrate OvisImagePatchInputs to TensorSchema ( #22024 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-09-02 12:01:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d59c986444 
					 
					
						
						
							
							Remove runtime checks based on pooling params ( #24051 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-09-02 11:54:37 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04d0c60770 
					 
					
						
						
							
							[Bugfix] Fix the issue that Blip2ForConditionalGeneration' object has… ( #24028 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com > 
						
						
					 
					
						2025-09-02 11:54:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b41cbbf03 
					 
					
						
						
							
							[V1][Mamba1] - FP32 SSM Kernel Support ( #23506 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com > 
						
						
					 
					
						2025-09-01 20:53:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0235103cbb 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24042 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-09-01 19:07:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a344a5aa0a 
					 
					
						
						
							
							[bugfix]fix MTP hidden states ( #24056 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-09-01 21:09:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5685370271 
					 
					
						
						
							
							[Chore][V0 Deprecation] Move LogProb to a separate file ( #24055 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-01 12:07:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0e0efd6bd 
					 
					
						
						
							
							[Model] Support DP for ViT on Kimi-VL-A3B-Thinking-2506 ( #23817 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-09-01 16:56:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf91a89dd2 
					 
					
						
						
							
							[docs][misc] IOProcessor plugins fixes ( #24046 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Christian Pinto <christian.pinto@ibm.com > 
						
						
					 
					
						2025-09-01 09:17:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						39a22dcaac 
					 
					
						
						
							
							[Misc] Minor code simplification for spec decode ( #24053 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-01 08:54:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41c80698b3 
					 
					
						
						
							
							Document multi-proc method selection for profiling ( #23802 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jdebache <jdebache@nvidia.com > 
						
						
					 
					
						2025-09-01 06:28:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c8271cd1e 
					 
					
						
						
							
							[Model]: support KeyeVL-1_5-8B ( #23838 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangruitao <wangruitao@kuaishou.com >
Co-authored-by: wangruitao <wangruitao@kuaishou.com > 
						
						
					 
					
						2025-09-01 03:50:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e330fcb21 
					 
					
						
						
							
							[Doc]: Fix CPU install docs: force torch-backend=cpu to avoid GPU torchvision errors ( #24033 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-09-01 03:34:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d46934b229 
					 
					
						
						
							
							[Frontend] Gemma3n audio transcriptions/translations endpoint ( #23735 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-09-01 18:07:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						107284959a 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24026 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-09-01 09:38:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc1a53186d 
					 
					
						
						
							
							[Kernel] Update DeepGEMM to latest commit ( #23915 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-09-01 02:38:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55602bb2e6 
					 
					
						
						
							
							[Frontend] Update the warning log when using VLLM_ALLOW_LONG_MAX_MODEL_LEN ( #20904 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-09-01 08:50:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d7fbc6ddac 
					 
					
						
						
							
							[Misc] Enable V1 FP16 inference on pre-Ampere GPUs ( #24022 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-01 08:12:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5438967fbc 
					 
					
						
						
							
							[Misc] add hash_function doc string ( #24014 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-31 23:11:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						422e793fa6 
					 
					
						
						
							
							[Bugfix] Add support for <tool_call> format in streaming mode for XLAM Tool Parser ( #22769 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Devon Peroutky <devon@kindo.ai > 
						
						
					 
					
						2025-09-01 14:07:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cb39dbcdd 
					 
					
						
						
							
							[Misc] IO Processor plugins for pooling models ( #22820 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-08-31 23:07:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						437c3ce026 
					 
					
						
						
							
							Migrate Phi4 inputs to TensorSchema ( #23471 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-09-01 14:05:59 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						499b074bfd 
					 
					
						
						
							
							[Misc] refactor code by import as for torch._inductor.config ( #23677 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-09-01 14:05:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ff0e59d83a 
					 
					
						
						
							
							[CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization ( #23357 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-31 22:52:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b55713683c 
					 
					
						
						
							
							[Misc] Move fast prefill logic to separate method ( #24013 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-01 05:40:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						acc1a6e10a 
					 
					
						
						
							
							Fix the bug related to loading GPTP INT3 weights. ( #23328 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JunHowie <JunHowie@aliyun.com >
Co-authored-by: JunHowie <JunHowie@aliyun.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-01 05:39:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c742a66d1 
					 
					
						
						
							
							[Misc] Avoid redundant copy for encoder-only models ( #24012 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-09-01 04:02:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						183a70967a 
					 
					
						
						
							
							[BUGFIX] GPTQ quantization compatibility for Qwen3 MOE models (AutoGPTQ and AutoRound-GPTQ) ( #23994 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JartX <sagformas@epdcenter.es >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-09-01 03:33:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14b4326b94 
					 
					
						
						
							
							v1: Support KV events from connectors ( #19737 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-09-01 01:13:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						752d2e1c36 
					 
					
						
						
							
							[Minor] Fix some random typos in comments ( #24009 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-31 16:42:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						81eea3d348 
					 
					
						
						
							
							vllm fix check on max vocab size ( #22471 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-31 20:57:05 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9701352e4b 
					 
					
						
						
							
							[Doc]: fix typos in Python comments ( #24001 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-08-31 08:21:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						749be00a98 
					 
					
						
						
							
							[Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers. ( #23394 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-08-30 18:01:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b8077b8ac 
					 
					
						
						
							
							Fix wrong truncate_prompt_tokens type hint ( #22761 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gabriel Marinho <gmarinho@ibm.com >
Signed-off-by: Gabriel Marinho <104592062+gmarinho2@users.noreply.github.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-08-30 20:39:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						038e9be4eb 
					 
					
						
						
							
							[LoRA] Much faster startup when LoRA is enabled ( #23777 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-30 15:37:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68a349114f 
					 
					
						
						
							
							[Misc] enhance type hint for rearrange return value ( #23519 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-30 06:43:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e80bca309e 
					 
					
						
						
							
							[Refactor] refactor freezing_value/cuda_event initialize outside try finally ( #23758 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-30 06:42:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb4983e112 
					 
					
						
						
							
							[Misc] add reorder_batch AttentionMetadataBuilder ( #23798 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-30 06:41:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						379ea2823a 
					 
					
						
						
							
							Add LoRA support for DeepSeek models (V2, V3, R1-0528) ( #23971 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sadeghja1070 <sadegh.ja1070@gmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-30 06:40:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a6acad431 
					 
					
						
						
							
							[Model] Enable encoder DP for MiniCPM-V ( #23948 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-30 06:31:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5490d633ce 
					 
					
						
						
							
							[UT] fix unify_kv_cache_configs when kv cache config needs sort ( #23843 )  
						
						 
						
						
						
						
					 
					
						2025-08-30 11:22:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						628d00cd7b 
					 
					
						
						
							
							[Bugfix] Fix test_lora_resolvers.py ( #23984 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-30 11:16:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4071c76cf3 
					 
					
						
						
							
							[V1] [Hybrid] Move MiniMaxLinearAttention into layers/mamba ( #23831 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-30 00:16:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f1bddbd852 
					 
					
						
						
							
							[Core] Cleanup TPU model runner for MM ( #23894 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-30 00:14:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9748c5198b 
					 
					
						
						
							
							[CI] Fix broken compile tests due to unsupported SiluMul+Nvfp4Quant fusion ( #23973 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-08-30 00:14:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ee52a32705 
					 
					
						
						
							
							[CI] Move testing image from remote URL to S3 ( #23980 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-08-29 21:41:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8fb85b7bb6 
					 
					
						
						
							
							Add routed_scaling_factor to MoE grouped topk ( #23123 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-29 21:36:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b31cb1781 
					 
					
						
						
							
							[Bugfix] Fix --config arg expansion called from api_server.py ( #23944 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jean-Francois Dube <dubejf+gh@gmail.com >
Co-authored-by: Jean-Francois Dube <dubejf+gh@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-29 21:36:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d660c98c1b 
					 
					
						
						
							
							[CI] Fix unavailable image remote URL ( #23966 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-08-29 15:40:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5674a40366 
					 
					
						
						
							
							[Misc] Make download_weights_from_hf more reliable ( #23863 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-29 12:37:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c3e199998 
					 
					
						
						
							
							Revert gemma3n fast prefill changes ( #23897 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-08-29 12:16:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c26b42296 
					 
					
						
						
							
							[Docs] [V1] [Hybrid] Add new documentation re: contributing mamba-based models  ( #23824 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-29 18:47:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b7adf94c4a 
					 
					
						
						
							
							Tuned H100/H200 triton fp8 block configs for fused_qkv_a_proj ( #23939 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-29 10:28:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d7fe40fc0 
					 
					
						
						
							
							[RL][BugFix] Fix missing tokenizer error for token-in-token-out ( #23904 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-30 01:09:55 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0dc9532065 
					 
					
						
						
							
							[BUGFIX ] fix undefined silu_and_mul_nvfp4_quant ( #23929 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: hongchao <hongchao@msh.team >
Signed-off-by: Richard Zou <zou3519@gmail.com >
Co-authored-by: hongchao <hongchao@msh.team >
Co-authored-by: Richard Zou <zou3519@gmail.com >
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com > 
						
						
					 
					
						2025-08-29 09:36:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72a69132dc 
					 
					
						
						
							
							[CI]  Add aiter to matching list of issue auto labeller for rocm tag ( #23942 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-08-29 15:29:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d90d8eb674 
					 
					
						
						
							
							[BugFix] Async scheduling and PP compatibility with DP ( #23770 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-29 08:17:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0a2f4c0793 
					 
					
						
						
							
							[Models] Use in-place adds in Idefics2Vision ( #23932 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-08-29 07:42:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cf3753b90 
					 
					
						
						
							
							[MODEL] Apertus and XIELU ( #23068 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com >
Co-authored-by: AllenHaoHuang <allenhuangdd@gmail.com > 
						
						
					 
					
						2025-08-29 20:29:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f7cde7272 
					 
					
						
						
							
							Adds json_count_leaves utility function  ( #23899 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: aditchawdhary <aditxy@hotmail.com > 
						
						
					 
					
						2025-08-29 05:28:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67c14906aa 
					 
					
						
						
							
							Update PyTorch to 2.8.0 ( #20358 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-29 18:57:35 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69f46359dd 
					 
					
						
						
							
							[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec ( #23779 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-08-29 18:36:57 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d9e00dbd1f 
					 
					
						
						
							
							[Performance] V1 Classify Models E2E Performance Optimization ( #23541 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-29 03:12:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad39106b16 
					 
					
						
						
							
							[CPU] Enable data parallel for CPU backend ( #23903 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-29 02:19:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2554b27baa 
					 
					
						
						
							
							[V0 Deprecation] Remove pooling model support in V0  ( #23434 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-29 00:04:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						934bebf192 
					 
					
						
						
							
							Better errors for Transformers backend missing features ( #23759 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-29 07:01:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						885ca6d31d 
					 
					
						
						
							
							[Misc] Fix warnings for mistral model ( #23552 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com > 
						
						
					 
					
						2025-08-29 06:58:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2d0afcc9dc 
					 
					
						
						
							
							[mrope][Qwen2-VL] Fix edge case where getting index of image/video token can potentially throw in default vl mrope implementation.  ( #23895 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-08-28 23:29:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4f9e9631c 
					 
					
						
						
							
							[CI/Build] Clean up LoRA test ( #23890 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-28 23:28:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05d839c19e 
					 
					
						
						
							
							Fix(async): Add support for truncate_prompt_tokens in AsyncLLM ( #23800 )  
						
						 
						
						
						
						
					 
					
						2025-08-28 22:55:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6597d7a456 
					 
					
						
						
							
							[Platform] import activation_quant_fusion for CUDA only ( #23882 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-08-28 22:54:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5264015d74 
					 
					
						
						
							
							[BugFix][AMD][Deepseek] fix a dtype mismatch error for deepseek running on AMD ( #23864 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinghui Zhang <jinghuizhang0804@gmail.com > 
						
						
					 
					
						2025-08-28 22:54:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98ac0cb32d 
					 
					
						
						
							
							[Bugfix] Use ReplicatedLinear for SequenceClassification head ( #23836 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-29 04:41:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c8b3b299c9 
					 
					
						
						
							
							[tests] Improve speed and reliability of test_transcription_api_correctness ( #23854 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-08-29 04:25:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						006477e60b 
					 
					
						
						
							
							[ROCm][Fix] Fix rocm build caused by  #23791  ( #23847 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-08-28 19:52:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de533ab2a1 
					 
					
						
						
							
							[Models] Improve iteration over layers ( #19497 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-08-29 09:26:34 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						235c9db8a7 
					 
					
						
						
							
							[XPU] support data parallel for MoE models on XPU ( #22887 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chzhang <chaojun.zhang@intel.com > 
						
						
					 
					
						2025-08-29 09:23:04 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b668055a11 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 Samplers test ( #23862 )  
						
						 
						
						
						
						
					 
					
						2025-08-28 18:05:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3d2aad5a2 
					 
					
						
						
							
							[Log] Use Debug Once for DeepGEMM E8M0 When not Enabled ( #23858 )  
						
						 
						
						
						
						
					 
					
						2025-08-28 22:18:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb293f6a79 
					 
					
						
						
							
							[V1] Enable prefill optimization for Gemma3n ( #22628 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-08-28 14:54:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ffbf27239 
					 
					
						
						
							
							[BugFix][FlashInfer] Fix potential race condition for paged_kv_indptr_cpu ( #23737 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-28 14:22:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27e88cee74 
					 
					
						
						
							
							chore: build release image by default ( #23852 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Codex <codex@openai.com > 
						
						
					 
					
						2025-08-28 13:17:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						16a45b3a28 
					 
					
						
						
							
							[NVIDIA] Support SiluMul + NVFP4 quant fusion ( #23671 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jindih <jindih@nvidia.com >
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: jindih <jindih@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedic <lgovedic@redhat.com > 
						
						
					 
					
						2025-08-28 19:36:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57d4ede520 
					 
					
						
						
							
							[bugfix] [spec-decoding] fix data race in sample_recovered_tokens_kernel (vLLM v1) ( #23829 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: He-Jingkai <he-jingkai@outlook.com > 
						
						
					 
					
						2025-08-28 19:05:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04d1dd7f4a 
					 
					
						
						
							
							[ROCm][Aiter] Add triton fp8 bmm kernel for mla ( #23264 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com >
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com > 
						
						
					 
					
						2025-08-28 18:18:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f32a5bc505 
					 
					
						
						
							
							Migrate Llama4ImagePatchInputs to TensorSchema ( #22021 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-28 17:29:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8805ad9fa9 
					 
					
						
						
							
							Add scale_config.yml file for Meta autoscalers for GH Actions ( #23840 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jean Schmidt <contato@jschmidt.me > 
						
						
					 
					
						2025-08-28 09:31:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0583578f42 
					 
					
						
						
							
							[ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime ( #23757 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jean Schmidt <contato@jschmidt.me > 
						
						
					 
					
						2025-08-28 08:59:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						db74d60490 
					 
					
						
						
							
							[Bugfix] Add fake mode around passes ( #23349 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: angelayi <yiangela7@gmail.com > 
						
						
					 
					
						2025-08-28 11:25:56 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95089607fa 
					 
					
						
						
							
							[Model][gpt-oss] Support DP+EP for GPT-OSS with FlashInfer trtllm-gen MoE ( #23819 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Po-Han Huang <pohanh@nvidia.com > 
						
						
					 
					
						2025-08-28 06:56:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f096f9b95 
					 
					
						
						
							
							[CI] Fix linting error on main ( #23835 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-28 06:52:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66548f6603 
					 
					
						
						
							
							[Bugfix] Fix benchmark_moe.py for blockwise fp8. ( #23823 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: crischeng <420985011@qq.com >
Co-authored-by: cris <grace@guisenbindeMacBook-Pro.local > 
						
						
					 
					
						2025-08-28 21:44:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3da2eea54 
					 
					
						
						
							
							[Doc]: fix typos in Python scripts ( #23828 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-08-28 05:37:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bfab219648 
					 
					
						
						
							
							[Model] [gpt-oss] fix gpt-oss pp support ( #23815 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-08-28 05:36:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3432f18fd 
					 
					
						
						
							
							[BugFix][Spec Decode] Use float64 for uniform_probs ( #23803 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-28 12:26:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67cee40da0 
					 
					
						
						
							
							[CI/Build][Bugfix] Fix Qwen VL tests on CPU ( #23818 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-28 11:57:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d99c3a4f7b 
					 
					
						
						
							
							[Doc]: fix typos in .md files (including those of  #23751 ) ( #23825 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-08-28 04:38:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3462c1c522 
					 
					
						
						
							
							[FIXBUG] Add return_success parameter to moe_wna16_weight_loader function ( #22797 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-28 09:03:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c5d004aaaf 
					 
					
						
						
							
							[Model] Add PP support and VLM backbone compatability for GPT-OSS ( #23680 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-28 16:03:28 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11a7fafaa8 
					 
					
						
						
							
							[New Model]: Support GteNewModelForSequenceClassification ( #23524 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-28 15:36:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						186aced5ff 
					 
					
						
						
							
							[Kernel] cuda kernels for upcoming decode context parallel feature ( #23791 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: hongchao <hongchao@msh.team > 
						
						
					 
					
						2025-08-28 15:29:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						daa1273b14 
					 
					
						
						
							
							[Bugfix] when set offline model running error ( #23711 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-08-28 07:27:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c07a73317d 
					 
					
						
						
							
							[CI] enable idefics3 and fuyu-8b test in multimodal test ( #23790 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-08-28 14:51:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						22feac8e95 
					 
					
						
						
							
							[Transform] [Quantization] Add transforms to compressed tensors ( #22486 )  
						
						 
						
						
						
						
					 
					
						2025-08-28 02:43:48 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c8851a4723 
					 
					
						
						
							
							Add deprecation warning for lora_extra_vocab_size ( #23635 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinheng Li <ahengljh@gmail.com > 
						
						
					 
					
						2025-08-27 22:34:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f48a9af892 
					 
					
						
						
							
							[CI] make all multi-gpu weight loading tests run nightly ( #23792 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex Yun <alexyun04@gmail.com > 
						
						
					 
					
						2025-08-27 21:27:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a11adafdca 
					 
					
						
						
							
							Gracefully handle edge cases in harmony utils ( #23155 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jan Kessler <jakessle@uni-mainz.de >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-27 20:14:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a781e84ec2 
					 
					
						
						
							
							[Perf] Tune configs for triton block fp8 gemm H100/H200 ( #23748 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-28 11:12:53 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b7b161a09 
					 
					
						
						
							
							[Feature] models: pass layer prefix to replace_linear_class for per-layer quantization routing. Addresses  #23239  ( #23556 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shrey Gupta <shreyg1303@gmail.com > 
						
						
					 
					
						2025-08-27 20:12:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a69693e38f 
					 
					
						
						
							
							Migrate Qwen inputs to TensorSchema ( #23473 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-28 10:43:26 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5da4f5d857 
					 
					
						
						
							
							[Bugfix] Fix for V1 priority scheduling crashes at preemption ( #23713 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hanchenli <lihanc2002@gmail.com > 
						
						
					 
					
						2025-08-28 00:44:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						321938e9ac 
					 
					
						
						
							
							[Feature] Add VLLM_DISABLE_PAD_FOR_CUDAGRAPH to Avoid Hang Issue ( #23595 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-27 21:52:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f9ca2b40a0 
					 
					
						
						
							
							[Bugfix] Fix Marlin NVFP4 for modelopt ( #23659 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-27 17:48:16 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						082cc07ef8 
					 
					
						
						
							
							DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100 ( #23608 )  
						
						 
						
						
						
						
					 
					
						2025-08-27 17:33:21 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						853c371fc3 
					 
					
						
						
							
							[V1][Mamba] - Enable V1 by default for Mamba Models ( #23650 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 20:53:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8bf6266a17 
					 
					
						
						
							
							[Multimodal] Generate mm_hash based on request metadata when caching is turned off ( #23690 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-08-27 20:24:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0585a9e73c 
					 
					
						
						
							
							Disable torch.compile for dynamic rope models in Transformers backend ( #23738 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 19:03:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c0ef769ba 
					 
					
						
						
							
							ci: Add arm64 docker build to release pipeline ( #23210 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eli Uriegas <eliuriegas@meta.com >
Signed-off-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 10:41:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e4d017b6f 
					 
					
						
						
							
							[Docs] Fix warnings in mkdocs build (continued) ( #23743 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com >
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com > 
						
						
					 
					
						2025-08-27 17:17:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd58932280 
					 
					
						
						
							
							[V1] [Hybrid] Enable compile and piecewise CUDA graph for MiniMax-Text models ( #22589 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-27 10:05:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						52883ed084 
					 
					
						
						
							
							[Model] Merge SupportsMultiModalWithRawInput with SupportsMultiModal ( #23749 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-27 10:01:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f35be10a9 
					 
					
						
						
							
							[BugFix] Fix topk_softmax assert ( #19764 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Luka Govedic <lgovedic@redhat.com > 
						
						
					 
					
						2025-08-27 09:47:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b61d2e22f 
					 
					
						
						
							
							[Docs] Remove in-tree Gaudi install instructions ( #23628 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 09:22:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ce8285d6d 
					 
					
						
						
							
							[LogitsProcs] Deduplicate built-in LP implementation logic ( #23362 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-27 23:11:33 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83f555f637 
					 
					
						
						
							
							[Doc]: upgrade version of crate-ci tool for improved typo detection ( #23755 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-08-27 07:59:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						841490434a 
					 
					
						
						
							
							[Model] Enable native HF format InternVL support ( #23742 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-27 14:45:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3af47c3cc6 
					 
					
						
						
							
							[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt ( #23666 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-27 14:09:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						513c1fe255 
					 
					
						
						
							
							Only run get_attr_docs if generating help text ( #23723 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 13:55:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe8d7b6f03 
					 
					
						
						
							
							[Model] Interface to enable batch-level DP support ( #23733 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-27 06:41:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						16dc4052b0 
					 
					
						
						
							
							Fix pre-commit on main ( #23747 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 06:39:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8dd2baa597 
					 
					
						
						
							
							Add vLLM Korea Meetup in the README.md and meetups.md ( #23746 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rebel-hongseok <hongseok@rebellions.ai >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-27 06:25:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5eeef1b908 
					 
					
						
						
							
							[Model] Explicit default_pooling_type interface ( #23736 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-27 13:24:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						704432af3c 
					 
					
						
						
							
							[V1] [Hybrid] Disable prefix caching by default for hybrid or mamba-based models  ( #23716 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-27 12:51:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a403d0fa41 
					 
					
						
						
							
							[Misc] Remove unnecessary _send_reconfig_message() in core_client.py ( #23127 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-27 05:50:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c13820f0b 
					 
					
						
						
							
							[Bugfix] Fix task field initialization when PYTHONOPTIMIZE is enabled ( #23718 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cndoit18 <cndoit18@outlook.com > 
						
						
					 
					
						2025-08-27 12:42:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d30de4469 
					 
					
						
						
							
							[model] Support MiniCPM-V 4.5 ( #23586 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tc-mb <caitianchi@modelbest.cn >
Signed-off-by: Xin Yang <xyangx@amazon.com >
Signed-off-by: Abatom <abzhonghua@gmail.com >
Signed-off-by: chzhang <chaojun.zhang@intel.com >
Signed-off-by: Pate Motter <patemotter@google.com >
Signed-off-by: Terrencezzj <terrence@cohere.ai >
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: siyuanf <siyuanf@nvidia.com >
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Zijing Liu <liuzijing2014@users.noreply.github.com >
Signed-off-by: jiabin.00 <jiabin.00@bytedance.com >
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: tc-mb <157115220+tc-mb@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.me >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Huy Do <huydhn@gmail.com >
Signed-off-by: Matúš Námešný <matus.namesny@ameria.com >
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: oye93 <en.ouyang93@outlook.com >
Signed-off-by: Julien Lin <jullin@nvidia.com >
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Tianyu Li <tianyu.li@arm.com >
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Zerohertz <ohg3417@gmail.com >
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com >
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com >
Signed-off-by: Federico <65908512+coval3nte@users.noreply.github.com >
Signed-off-by: Zixuan Zhang <zixuanzhang@bytedance.com >
Signed-off-by: wuhang <wuhang6@huawei.com >
Signed-off-by: czhu-cohere <conway.zhu@cohere.com >
Signed-off-by: Wei Wei <wwei6@meta.com >
Signed-off-by: Yiheng Xu <charlesyihengxu@gmail.com >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com >
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com >
Co-authored-by: Pate Motter <p@temotter.com >
Co-authored-by: Terrence Zhao <32208165+Terrencezzj@users.noreply.github.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: weiliang <weiliangl@nvidia.com >
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com >
Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Raghavan <oneraghavan@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.me >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Matúš Námešný <matus@namesny.com >
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: En Ouyang <en.ouyang93@outlook.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
Co-authored-by: nvjullin <jullin@nvidia.com >
Co-authored-by: Didier Durand <2927957+didier-durand@users.noreply.github.com >
Co-authored-by: TianyuLi0 <116711075+TianyuLi0@users.noreply.github.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Yuekai Zhang <zhangyuekai@foxmail.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Hyogeun Oh (오효근) <ohg3417@gmail.com >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Huzaifa Sidhpurwala <huzaifas@redhat.com >
Co-authored-by: Federico <65908512+coval3nte@users.noreply.github.com >
Co-authored-by: zixuanzhang226 <zixuanzhang@bytedance.com >
Co-authored-by: wuhang <wuhang6@huawei.com >
Co-authored-by: yzds <41983536+youzhedian@users.noreply.github.com >
Co-authored-by: hongchao <hongchao@msh.team >
Co-authored-by: czhu-cohere <conway.zhu@cohere.com >
Co-authored-by: Wei <weiweinpu@gmail.com >
Co-authored-by: Yiheng Xu <charlesyihengxu@gmail.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: CSWYF3634076 <58356743+CSWYF3634076@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 05:38:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f7a9c95e4 
					 
					
						
						
							
							[Docs] Fix a 1-2-3 list and style issues in tpu.md ( #23729 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-08-27 05:37:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f0d7eaea8 
					 
					
						
						
							
							[XPU] Fix OOM issue for data parallel with Ray backend ( #22500 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Fanli Lin <fanli.lin@intel.com >
Signed-off-by: Fanli Lin <fanli0116@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-27 19:57:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e03940762b 
					 
					
						
						
							
							[CI/Build] Reduce LoRA layer test cases ( #23721 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-27 10:59:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11eddf02f0 
					 
					
						
						
							
							[FlashInfer] Cache hyper params in metadata builder ( #23732 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-27 03:45:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04ff1e43fb 
					 
					
						
						
							
							[Misc] Move CpuGpuBuffer to vllm/v1/utils.py ( #23728 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-27 03:25:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6578e87365 
					 
					
						
						
							
							Optimize input preparation for FlashInfer [2/N] ( #23174 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-27 02:52:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5bd9f84158 
					 
					
						
						
							
							[Docs] Fix an admonition important ( #23726 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-08-27 02:50:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						91e382c935 
					 
					
						
						
							
							[CI/Build] Remove redundant register in model init tests ( #23715 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-27 08:11:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6446677839 
					 
					
						
						
							
							[XPU]fix cuda event used in XPU model runner ( #23708 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-08-27 07:27:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69244e67e6 
					 
					
						
						
							
							[Core] Use key-only cache for BaseMultiModalProcessor ( #23018 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-27 14:19:13 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8dbf6ed7be 
					 
					
						
						
							
							[Bugfix] fix when config.yaml config value is list parse error ( #23528 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-08-27 05:54:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9de25c294b 
					 
					
						
						
							
							[CI/Build] Remove redundant LoRA model tests ( #23706 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-27 05:51:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fce10dbed5 
					 
					
						
						
							
							[XPU] Add xpu torch.compile support ( #22609 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-08-27 05:33:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d272415e57 
					 
					
						
						
							
							[Quantization] Expand compressed-tensors MoE matching logic to support NFP4 + FP8 MoEs ( #22674 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-08-27 05:00:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						142ac08030 
					 
					
						
						
							
							[Frontend] Optimize beam search performance by limiting concurrency ( #23599 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-27 04:59:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3210264421 
					 
					
						
						
							
							[Frontend] Add --log-error-stack to print stack trace for error response ( #22960 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-27 04:58:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						644d57d531 
					 
					
						
						
							
							[Model] Add Ernie4.5 VL Model Support ( #22514 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangyafeng <wangyafeng@baidu.com > 
						
						
					 
					
						2025-08-26 21:02:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c905684cfe 
					 
					
						
						
							
							[Core] Asynchronous h2d in merge_multimodal_embeddings via pinned memory. ( #23686 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-08-26 20:05:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						786835807b 
					 
					
						
						
							
							[Bugfix]: Qwen3 Coder Tool Parser ( #23099 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yiheng Xu <charlesyihengxu@gmail.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-08-26 19:58:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fecbb7c782 
					 
					
						
						
							
							[Bugfix][gpt-oss] passing the cache config in gpt-oss ( #23613 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wei Wei <wwei6@meta.com > 
						
						
					 
					
						2025-08-27 02:54:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6dab89b8ec 
					 
					
						
						
							
							[Docs] Fix math rendering in docs ( #23676 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 18:47:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de02b07db4 
					 
					
						
						
							
							[Bugfix] Lazy import gpt_oss_triton_kernels_moe for mxfp4 ( #23678 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-27 09:34:57 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb1995167e 
					 
					
						
						
							
							[gpt-oss] Enable unit test for response API harmony integration ( #23533 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-26 18:23:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c2b140ae8 
					 
					
						
						
							
							[quantization] use channel scales for w4a8 + misc fixes ( #23570 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: czhu-cohere <conway.zhu@cohere.com > 
						
						
					 
					
						2025-08-26 18:23:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7c80af084 
					 
					
						
						
							
							fix pynccl reduce_scatter ( #23648 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: hongchao <hongchao@msh.team > 
						
						
					 
					
						2025-08-26 18:21:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6891205b16 
					 
					
						
						
							
							[Feature][Responses API] Support MCP tool in background mode ( #23494 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wuhang <wuhang6@huawei.com > 
						
						
					 
					
						2025-08-27 01:06:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1625dbe9c 
					 
					
						
						
							
							feat: add triton fused moe config for GLM-4.5-Air-FP8 on B200 ( #23695 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zixuan Zhang <zixuanzhang@bytedance.com > 
						
						
					 
					
						2025-08-26 18:06:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						585e0bde36 
					 
					
						
						
							
							[Bugfix] UnboundLocalError when GptOss reasoning specified ( #23054 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Federico <65908512+coval3nte@users.noreply.github.com > 
						
						
					 
					
						2025-08-27 00:29:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						714872f1a9 
					 
					
						
						
							
							[Compile] Fix Cmake Warning ( #23689 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-26 23:48:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5f1af97f86 
					 
					
						
						
							
							[V1] [Hybrid] Enable Full CUDA graph by default for hybrid models in V1 ( #22594 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-26 23:28:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c3b0fd1ee6 
					 
					
						
						
							
							[V1][P/D]P2pNcclConnector supports flashinfer ( #23536 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-08-26 22:56:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6421b66bf4 
					 
					
						
						
							
							[Docs] Move quant supported hardware table to README ( #23663 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 22:26:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f13319f47 
					 
					
						
						
							
							Enhance the pre-notification policy ( #23532 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com > 
						
						
					 
					
						2025-08-26 20:41:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d696f86e7b 
					 
					
						
						
							
							[doc] Hybrid KV Cache Manager design doc ( #22688 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 20:19:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9816b81f5f 
					 
					
						
						
							
							[Model] Enable video support for InternVL3.5 models ( #23658 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-26 19:46:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c37c0af990 
					 
					
						
						
							
							[Misc] Fix comments in tests/kernels/quantization ( #23675 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-08-26 19:31:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9715f7bb0f 
					 
					
						
						
							
							[Bugfix] Fix incorrect original shape in hashing ( #23672 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com > 
						
						
					 
					
						2025-08-26 19:01:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98aa16ff41 
					 
					
						
						
							
							[v1] Add cross-attention KV cache support for encoder-decoder models ( #23664 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-08-26 18:49:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						227e231b55 
					 
					
						
						
							
							[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models ( #23665 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-26 18:33:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						730d0ac8b9 
					 
					
						
						
							
							[Docs] Fix warnings in mkdocs build ( #23649 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zerohertz <ohg3417@gmail.com >
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 18:19:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9b0187003e 
					 
					
						
						
							
							[Bugfix] Fix cuda event usage with CPU model runner ( #23643 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-26 17:10:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44ac25eae2 
					 
					
						
						
							
							[CI] [Doc]: Add GH Action for auto labeling issues with rocm tag ( #20988 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-26 16:20:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ea22e42d5 
					 
					
						
						
							
							[Misc] Add override for allreduce fusion thresholds ( #23639 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Lin <jullin@nvidia.com > 
						
						
					 
					
						2025-08-26 15:53:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d4183dd2e 
					 
					
						
						
							
							[model] support qwen2audio embedding input ( #23625 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-26 23:48:08 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						513298f1b4 
					 
					
						
						
							
							[Bugfix] fix bf16 multimodal model hash ( #23623 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-26 23:47:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						379f828fba 
					 
					
						
						
							
							[Docs] Reduce requirements for docs build ( #23651 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 15:43:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1fdc732419 
					 
					
						
						
							
							[ROCm] Starting to add AMD code reviewers for ROCm components ( #23496 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hongxia Yang <hongxia.yang@amd.com > 
						
						
					 
					
						2025-08-26 07:32:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f58675bfb3 
					 
					
						
						
							
							[CPU] add cpu fused moe pytorch native implementation ( #23146 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tianyu Li <tianyu.li@arm.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-26 14:09:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c04779afa 
					 
					
						
						
							
							[Doc]: fix various spelling issues in multiple files ( #23636 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-08-26 14:05:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f66673a39d 
					 
					
						
						
							
							[Kernel] Added flashinfer fp8 per-tensor gemms ( #22895 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Lin <jullin@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-26 06:54:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b78bed1bc5 
					 
					
						
						
							
							[Hardware][Mac] Fix the installation fail for Apple Silicon (CPU)  ( #23565 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: oye93 <en.ouyang93@outlook.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-26 13:04:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						164b2273c8 
					 
					
						
						
							
							[Docs] Fix broken links to docs/api/summary.md ( #23637 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 13:00:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b4fc9bd9b 
					 
					
						
						
							
							Support FlashAttention Backend for Hybrid SSM Models ( #23299 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-26 12:41:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ebd5a77bb5 
					 
					
						
						
							
							feat: add usage to TranscriptionResponse (text and json response_format) ( #23576 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-08-26 05:26:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						384dd1b0a8 
					 
					
						
						
							
							[Bugfix] Add missing enable_log_outputs parameter to init_app_state function ( #23634 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matúš Námešný <matus.namesny@ameria.com > 
						
						
					 
					
						2025-08-26 12:13:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdeb3dac13 
					 
					
						
						
							
							[Model] fix DeepSeek e_score_correction_bias dtype to fp32 ( #23640 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-26 20:09:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d52358c1e0 
					 
					
						
						
							
							[Perf] Remove duplicated NVFP4 blockscales to save memory ( #23379 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-26 19:16:33 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ace2f72b0 
					 
					
						
						
							
							Fix writing benchmark results with tuple keys ( #23633 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-08-26 19:16:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b00e69f8ca 
					 
					
						
						
							
							Fix nits from  #20059  ( #23548 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 03:27:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						50fede6634 
					 
					
						
						
							
							[V1] Enable V1 for compute capability < 8.0 + FP32 ( #23614 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-26 03:00:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b5d34af328 
					 
					
						
						
							
							[Bugfix] Fix scheduling when repeated images in one request ( #23544 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.me >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com > 
						
						
					 
					
						2025-08-26 09:46:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9b5f64238f 
					 
					
						
						
							
							[Bugfix] Fix Qwen25VL packed_modules_mapping ( #23604 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-26 01:09:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ff77764f86 
					 
					
						
						
							
							Fix CLI parameter documentation inconsistency in pooling_models.md ( #23630 )  
						
						 
						
						
						
						
					 
					
						2025-08-26 01:05:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bfc1edc9f5 
					 
					
						
						
							
							[Docs] Fix titles for multi-file examples that are rendered in the docs ( #23573 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 00:16:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ecbb14b81 
					 
					
						
						
							
							[Benchmarks] add benchmark for embedding models ( #23000 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-08-25 23:57:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7d67a9d9f9 
					 
					
						
						
							
							[mypy] Fix incorrect type hint for EAGLE3 support ( #23617 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-25 23:50:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						959783fb99 
					 
					
						
						
							
							[fix] fix seed-oss-parser ( #23560 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiabin.00 <jiabin.00@bytedance.com > 
						
						
					 
					
						2025-08-25 23:16:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce0e9dbd43 
					 
					
						
						
							
							[CI/Build] Fix typo in  #23561  ( #23616 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-25 23:13:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b395b3b0a3 
					 
					
						
						
							
							[Disagg][Perf] Use CUDA event sync instead of blocking tolist to avoid unintentional copy ops blocking across different CUDA streams, improving disagg TTIT/TTFT ( #22760 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Zijing Liu <liuzijing2014@users.noreply.github.com > 
						
						
					 
					
						2025-08-25 21:06:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6fad29b11b 
					 
					
						
						
							
							Remove graph_pool as member of VllmBackend and argument to CUDAGraphWrapper ( #23385 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-08-25 19:34:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6fd45e7b8a 
					 
					
						
						
							
							[CI/Build] Use vLLM client's user agent to fetch images ( #23561 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-25 19:34:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56dcf4e7e9 
					 
					
						
						
							
							[Bug] Fix DeepGEMM Env Control ( #23591 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-25 18:41:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae067888d6 
					 
					
						
						
							
							Update Flashinfer to  0.2.14.post1 ( #23537 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: siyuanf <siyuanf@nvidia.com >
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-25 18:30:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						906e461ed6 
					 
					
						
						
							
							[CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests ( #23568 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-25 18:29:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2a97ffc33d 
					 
					
						
						
							
							[Misc] Add release note draft to PR template ( #23598 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-08-25 16:44:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						efc88cf64a 
					 
					
						
						
							
							[Misc] Simplify FlashInfer attention metadata ( #23585 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-08-25 15:42:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b6a837275 
					 
					
						
						
							
							[Docs] Update Documentation of Cohere Command-A Models ( #23584 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Terrencezzj <terrence@cohere.ai >
Signed-off-by: Abatom <abzhonghua@gmail.com >
Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com > 
						
						
					 
					
						2025-08-25 21:53:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c34c82b7fe 
					 
					
						
						
							
							[TPU][Bugfix] Fixes prompt_token_ids error in tpu tests. ( #23574 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pate Motter <patemotter@google.com > 
						
						
					 
					
						2025-08-25 14:29:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a044754bd 
					 
					
						
						
							
							[XPU] Delay BF16 check to worker init for spawn compatibility ( #22979 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chzhang <chaojun.zhang@intel.com > 
						
						
					 
					
						2025-08-25 13:09:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9188ae7cb5 
					 
					
						
						
							
							[Bugfix][V1][P/D]Fix the issue where repeated requests for the same input produce abnormal outputs for P2pNcclConnector ( #23403 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com > 
						
						
					 
					
						2025-08-25 12:57:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a3cd90af5 
					 
					
						
						
							
							[Kernel] Add fused grouped_topk kernel for MoE ( #23274 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xin Yang <xyangx@amazon.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-08-25 11:47:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2a167b2eeb 
					 
					
						
						
							
							[test][RL] Add sleep level 2 test and fix reload with sleep mode ( #23521 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-08-26 00:25:52 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ff902f3b4 
					 
					
						
						
							
							[Refactor] Refactor persistent buffers with CpuGpuBuffer  ( #23515 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-25 08:44:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a9082a4d14 
					 
					
						
						
							
							[Bugfix] Fix Qwen3 MoE GPTQ inference ( #23490 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-25 06:40:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0329ed4b4 
					 
					
						
						
							
							Updates to Flex + VLLm integration ( #21416 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: drisspg <drisspguessous@gmail.com > 
						
						
					 
					
						2025-08-25 09:32:42 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6879cd80ae 
					 
					
						
						
							
							[Refactor] Pass tokenizer explicitly instead of binding to prompt update ( #23542 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-25 06:31:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e269be2ba2 
					 
					
						
						
							
							[Doc] Add caution for API server scale-out ( #23550 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-25 06:14:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c4b6e66fe 
					 
					
						
						
							
							[Attention] Unify mamba and attention backend selection ( #23171 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com > 
						
						
					 
					
						2025-08-25 09:09:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d0a4a3f645 
					 
					
						
						
							
							[misc] add shanghai meetup ( #23535 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-25 17:00:03 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ebafb0936d 
					 
					
						
						
							
							[Bugfix] Allow dynamic number of patches for llava_onevision ( #23525 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-25 08:34:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0cb7b065c3 
					 
					
						
						
							
							Feature/benchmark/random mm data/images ( #23119 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: breno.skuk <breno.skuk@hcompany.ai > 
						
						
					 
					
						2025-08-25 01:28:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2da02dd0d8 
					 
					
						
						
							
							[Fix] DeepSeek V3.1 tool parser error message ( #23492 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-08-25 00:56:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d765cf01fe 
					 
					
						
						
							
							[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests ( #22711 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io > 
						
						
					 
					
						2025-08-25 00:41:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						712d0f88d8 
					 
					
						
						
							
							[Refactor] Dynamic target and content for prompt updates ( #23411 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-24 23:39:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						49ab23b3cc 
					 
					
						
						
							
							[gpt-oss] use reasoning channel for reasoning text in serving_chat ( #22920 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yu Guo <yuguo@meta.com > 
						
						
					 
					
						2025-08-25 06:29:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9abb10489 
					 
					
						
						
							
							[Bugfix] Fix Dense module loading for sentence-transformers embedding models (simplified V2) ( #23408 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: FFFfff1FFFfff <yifanli0919@gmail.com > 
						
						
					 
					
						2025-08-25 05:39:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						787cdb3829 
					 
					
						
						
							
							Migrate DonutImagePixelInputs to TensorSchema ( #23509 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-25 05:02:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a5203d04df 
					 
					
						
						
							
							Migrate skyworkr1v inputs to TensorSchema ( #23499 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-25 04:43:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99f8094400 
					 
					
						
						
							
							Migrate tarsier inputs to TensorSchema ( #23500 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-25 04:42:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						170e8ea9ea 
					 
					
						
						
							
							[Misc] Unified linear print info ( #23516 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-24 20:13:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a71e4765cc 
					 
					
						
						
							
							[Bugfix] Fix Qwen2.5-VL quantized model weights loading ( #23512 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zifei Tong <zifeitong@gmail.com > 
						
						
					 
					
						2025-08-25 10:40:22 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						39971db3aa 
					 
					
						
						
							
							Frontend: Adding LM Format Enforcer support to V1 engine ( #22564 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Noam Gat <noamgat@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-24 19:31:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						504d914314 
					 
					
						
						
							
							[Perf] Add Triton config for DeepSeek V3 FP8 EP32 H200 ( #23504 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-08-24 18:06:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47455c424f 
					 
					
						
						
							
							[Doc: ]fix various typos in multiple files ( #23487 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-25 00:04:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7fc6b1354 
					 
					
						
						
							
							fix incompatibililty with non cuda platform for nvfp4 ( #23478 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com > 
						
						
					 
					
						2025-08-24 15:35:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad78868450 
					 
					
						
						
							
							[Misc] Remove unused slot_mapping buffer ( #23502 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-24 14:03:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e2db1164a1 
					 
					
						
						
							
							[Model] Enable BLOOM on V1 ( #23488 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-24 13:30:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						416f05929a 
					 
					
						
						
							
							[New Model]Donut model ( #23229 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-08-24 12:52:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e021b4981 
					 
					
						
						
							
							(Misc): add missing test for zero truncation size. ( #23457 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: teekenl <teekenlau@gmail.com > 
						
						
					 
					
						2025-08-24 18:12:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b9b16649c 
					 
					
						
						
							
							[Misc] update dict parse to EPLBConfig from json dumps to dict unpacking ( #23305 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-08-24 08:06:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e76e233540 
					 
					
						
						
							
							[kernel] Support W4A8 on Hopper ( #23198 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: czhu-cohere <conway.zhu@cohere.com > 
						
						
					 
					
						2025-08-24 06:18:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a75277285b 
					 
					
						
						
							
							Migrate Paligemma inputs to TensorSchema ( #23470 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-24 04:56:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9dc30b7068 
					 
					
						
						
							
							[Bugfix] Add strong reference to CUDA pluggable allocator callbacks ( #23477 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Eric Marcus <eric.marcus@kaiko.ai >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-24 12:56:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						053278a5dc 
					 
					
						
						
							
							Migrate Pixtral inputs to TensorSchema ( #23472 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-24 04:55:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c55c028998 
					 
					
						
						
							
							[gpt-oss] Streaming Output for Python Tool ( #23409 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-08-24 04:42:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65197a5fb3 
					 
					
						
						
							
							[Misc] Modify CacheConfig import ( #23459 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-23 06:05:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b8f17f5d98 
					 
					
						
						
							
							Support DeepSeek-V3.1 tool call ( #23454 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xu Wenqing <xuwq1993@qq.com > 
						
						
					 
					
						2025-08-23 05:50:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d9a55204ba 
					 
					
						
						
							
							fix(tests): Correct unreachable assertion in truncation test ( #23425 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: AzizCode92 <azizbenothman76@gmail.com > 
						
						
					 
					
						2025-08-23 05:23:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4e9fd811f 
					 
					
						
						
							
							Revert "[PERF] Use faster way of decode in tokenizer: avoid useless list-to-list conversion ( #20000 )" ( #23396 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-23 04:16:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						308fa287a8 
					 
					
						
						
							
							Add glm4.5v tp2,4 fp8 config on H100_80GB ( #23443 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Chenxi Yang <cxyang@meta.com > 
						
						
					 
					
						2025-08-23 02:54:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa78de9dc3 
					 
					
						
						
							
							Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs ( #22527 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: feng <fengli1702@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-22 20:53:21 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f6818a92cb 
					 
					
						
						
							
							[UX] Move Dockerfile DeepGEMM install to tools/install_deepgemm.sh ( #23360 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-22 20:52:50 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23c939fd30 
					 
					
						
						
							
							[Model] Support DP for ViT on MiniCPM-V-4 ( #23327 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ycyaw66 <497410282@qq.com >
Co-authored-by: ycyaw66 <497410282@qq.com > 
						
						
					 
					
						2025-08-23 02:14:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						add1adfec7 
					 
					
						
						
							
							[BugFix] Fix MinPLogitsProcessor.update_states() ( #23401 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-23 08:22:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c80c53a30f 
					 
					
						
						
							
							[BugFix] Fix batch updates for pooling models ( #23398 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-23 08:20:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24d0c9e6ed 
					 
					
						
						
							
							[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel ( #22703 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-08-22 22:09:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc7ae5e7ca 
					 
					
						
						
							
							[BugFix][AMD][Quantization] Fix torch.compile issue where wvSplitKQ not being called when it should when using quantized FP8 model ( #22281 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-08-22 21:47:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0313cf854d 
					 
					
						
						
							
							[PERF] PyTorch Symmetric Memory All-Reduce ( #20759 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-22 15:39:08 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0483fabc74 
					 
					
						
						
							
							[CI/Build] add EP dependencies to docker ( #21976 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-08-22 13:34:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da65bec309 
					 
					
						
						
							
							add an env var for path to pre-downloaded flashinfer cubin files ( #22675 )  
						
						 
						
						
						
						
					 
					
						2025-08-22 19:25:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4645024d3a 
					 
					
						
						
							
							[Quantization] Allow GGUF quantization to skip unquantized layer ( #23188 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-22 13:04:22 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd7a3df26f 
					 
					
						
						
							
							[Bugfix] Fix broken Florence-2 model ( #23426 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-08-22 17:50:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32d2b4064f 
					 
					
						
						
							
							[Model] Add Ovis2.5 PP support ( #23405 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-22 17:46:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						22cf679aad 
					 
					
						
						
							
							[Doc]: fix various typos in multiple files ( #23179 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Didier Durand <durand.didier@gmail.com > 
						
						
					 
					
						2025-08-22 10:38:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b6d7d34fc6 
					 
					
						
						
							
							Add unit tests for batched guided and non-guided requests ( #23389 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-08-22 10:31:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						341923b982 
					 
					
						
						
							
							fix(tests): Ensure reliable CUDA cache clearing in MoE test ( #23416 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: AzizCode92 <azizbenothman76@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-22 17:20:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						424fb7a5d2 
					 
					
						
						
							
							[BugFix] Fix the issue where image embeddings were incorrectly split.… ( #23366 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: bppps <bpppsaka@gmail.com >
Co-authored-by: zouyu.zzx <zouyu.zzx@alibaba-inc.com >
Co-authored-by: bppps <bpppsaka@gmail.com > 
						
						
					 
					
						2025-08-22 16:56:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88491c1b6b 
					 
					
						
						
							
							[Speculators][Speculative Decoding] Fix Qwen 2 Eagle3 Support ( #23337 )  
						
						 
						
						
						
						
					 
					
						2025-08-22 16:39:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						613a23b57f 
					 
					
						
						
							
							[Bugfix]: Installing dev environment due to pydantic incompatible version ( #23353 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com > 
						
						
					 
					
						2025-08-22 16:22:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51a215300b 
					 
					
						
						
							
							[Fix] Bump triton version in rocm-build requirements ( #21630 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com > 
						
						
					 
					
						2025-08-22 15:13:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ebe14621e3 
					 
					
						
						
							
							[Bug fix] Dynamically setting the backend variable for genai_perf_tests in the run-nightly-benchmark script ( #23375 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Naman Lalit <nl2688@nyu.edu > 
						
						
					 
					
						2025-08-22 15:12:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						325aa3dee9 
					 
					
						
						
							
							[Misc] local import code clean ( #23420 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-22 14:01:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a073be6d87 
					 
					
						
						
							
							[Doc] Update the doc for log probs + prefix caching ( #23399 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-22 13:20:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						695e7adcd2 
					 
					
						
						
							
							[misc] Remove outdate comment about runai_model_streamer ( #23421 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: carlory <baofa.fan@daocloud.io > 
						
						
					 
					
						2025-08-22 13:08:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						281710ef9a 
					 
					
						
						
							
							[Attention] Allow V1 flash_attn to support cross-attention ( #23297 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-08-22 12:10:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						808d2e9aa0 
					 
					
						
						
							
							[Misc] Move M-RoPE init logic to _init_mrope_positions ( #23422 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-22 03:07:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						285178b3b8 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 LoRA test ( #23418 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-22 09:56:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88016c372a 
					 
					
						
						
							
							[Bugfix] Fix pooling models on CPU backend ( #23392 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-22 09:47:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						998720859c 
					 
					
						
						
							
							Migrate MiniCPMOAudioInputs to TensorSchema ( #21847 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-22 16:43:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ba1b54ac6 
					 
					
						
						
							
							[gpt-oss] add input/output usage in responses api when harmony context is leveraged ( #22667 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-08-22 08:32:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53415653ff 
					 
					
						
						
							
							[P/D][Nixl] Make kv cache register compatible with hybrid memory allocator ( #23079 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-08-21 22:30:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						17373dcd93 
					 
					
						
						
							
							[Attention] Refactor AttentionMetadata Preparation for Encoder-only Models ( #23154 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-22 05:05:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5964069367 
					 
					
						
						
							
							[New Model] Add Seed-Oss model ( #23241 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiabin.00 <jiabin.00@bytedance.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-22 04:58:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de9c085e17 
					 
					
						
						
							
							[Misc] Add gemma3 chat template with pythonic-style function calling ( #17149 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Philip Chung <philip.f.chung@gmail.com > 
						
						
					 
					
						2025-08-21 21:06:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						111692bb8c 
					 
					
						
						
							
							[CI] Add end-to-end V1 min_tokens test coverage ( #22495 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com >
Co-authored-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com > 
						
						
					 
					
						2025-08-21 22:04:07 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						394591e343 
					 
					
						
						
							
							[Feature] Enable DeepGEMM Linear on B200; 1.5% E2E throughput improvement ( #23351 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-21 21:01:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ac849665d 
					 
					
						
						
							
							[CI/Build] Skip Idefics3 and SmolVLM generation test again ( #23356 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-22 03:39:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b9cc56fac 
					 
					
						
						
							
							Migrate MllamaImagePixelInputs to TensorSchema ( #22020 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-22 11:28:49 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8896eb72eb 
					 
					
						
						
							
							[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed ( #18800 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-22 10:56:57 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						19fe1a0510 
					 
					
						
						
							
							[Kernel] Add FP8 support with FlashMLA backend ( #22668 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com > 
						
						
					 
					
						2025-08-22 02:26:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						480bdf5a7b 
					 
					
						
						
							
							[Core] Support custom executor qualname ( #23314 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-08-22 09:40:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5368f76855 
					 
					
						
						
							
							[Feature][Responses API] Support logprobs(non-stream) ( #23319 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-08-21 23:09:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ef6b8a38c 
					 
					
						
						
							
							Always use cache mounts when installing vllm to avoid populating pip cache in the image. Also remove apt cache. ( #23270 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Valentyn Tymofieiev <valentyn@google.com > 
						
						
					 
					
						2025-08-21 18:01:03 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3bbe11cc13 
					 
					
						
						
							
							[Perf] Small optimizations for silu_mul_fp8_quant_deep_gemm ( #23265 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-21 17:56:15 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c5041f899f 
					 
					
						
						
							
							[CI] improve pr comments bot ( #23380 )  
						
						 
						
						
						
						
					 
					
						2025-08-21 14:49:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8b5fe6eb51 
					 
					
						
						
							
							[CI] Clean up actions: remove helm, publish workflows and improve pr … ( #23377 )  
						
						 
						
						
						
						
					 
					
						2025-08-21 14:29:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						800349c2a5 
					 
					
						
						
							
							[Structured Outputs] Refactor bitmask construction into get_grammar_bitmask ( #23361 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-21 20:53:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						044931f97b 
					 
					
						
						
							
							Make sure that vectorize_with_alignment produced vectorized global loads ( #23182 )  
						
						 
						
						
						
						
					 
					
						2025-08-21 20:06:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1d353b6352 
					 
					
						
						
							
							[Core] Always use tensor cores for Flashinfer Decode Wrapper ( #23214 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com > 
						
						
					 
					
						2025-08-21 16:02:11 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3496274663 
					 
					
						
						
							
							[Misc] Convert VLLM_TORCH_PROFILER_DIR path to absolute ( #23191 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-21 15:49:09 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a19303173 
					 
					
						
						
							
							[BugFix][gpt-oss] Fix Chat Completion with Multiple Output Message ( #23318 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-21 10:31:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						603fbbbce0 
					 
					
						
						
							
							[Misc] Misc code cleanup/simplification ( #23304 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-21 17:22:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10f535c086 
					 
					
						
						
							
							[Bugfix] Fix port conflict by obtaining a list of open ports upfront ( #21894 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-08-21 10:22:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						48bfb0c9b7 
					 
					
						
						
							
							[Bug] Fix R1 Accuracy 0 Bug ( #23294 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-21 13:11:28 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f8ce022948 
					 
					
						
						
							
							add tg-mxfp4-moe-test ( #22540 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: siyuanf <siyuanf@nvidia.com >
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-21 17:05:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0278f1ac3a 
					 
					
						
						
							
							Fix nvfp4 swizzling ( #23140 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yiliu30 <yi4.liu@intel.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-08-21 16:54:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a482e4e769 
					 
					
						
						
							
							Migrate MolmoImageInputs to TensorSchema ( #22022 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-21 16:54:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0b056e443 
					 
					
						
						
							
							[ci/build] Fix abi tag for aarch64 ( #23329 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-21 23:32:55 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						79f05e4436 
					 
					
						
						
							
							[Multimodal] Always enable hashing mm data ( #23308 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-21 07:23:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f8daddcc4c 
					 
					
						
						
							
							[Bugfix] set system_message in phi4mini chat template ( #23309 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhuangqh <zhuangqhc@gmail.com > 
						
						
					 
					
						2025-08-21 14:22:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c8e33c72c6 
					 
					
						
						
							
							[V1] Remove unnecessary check for main thread ( #23298 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-08-21 14:08:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d70a16625d 
					 
					
						
						
							
							[Performance] V1 Pooling Models E2E Performance Optimization ( #23162 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-21 13:26:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5cc54f7c5b 
					 
					
						
						
							
							[Doc] Fix batch-level DP example ( #23325 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-21 06:16:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c6e40bbaa 
					 
					
						
						
							
							[Refactor] Simplify code for MM budget ( #23310 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-21 08:00:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e2000f352 
					 
					
						
						
							
							[Model] Add LFM2 architecture ( #22845 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Paul Pak <paulpak58@gmail.com > 
						
						
					 
					
						2025-08-21 09:35:07 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31282401b6 
					 
					
						
						
							
							[BugFix] Fix Python 3.9 Support ( #23306 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-20 23:23:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c31e28e95 
					 
					
						
						
							
							[Bugfix] Fix extra whitespace in strings caused by newline ( #23272 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-20 22:03:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f571ff8eb6 
					 
					
						
						
							
							[Sampler] Support returning final logprobs ( #22387 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-20 21:28:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f64ee61d9e 
					 
					
						
						
							
							[CI] Block the cu126 wheel build while broken ( #23285 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-21 04:21:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8993073dc1 
					 
					
						
						
							
							[CI] Delete images older than 24h. ( #23291 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-08-20 21:15:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						655a09f653 
					 
					
						
						
							
							[Model][VLM] Support R-4B Model ( #23246 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yannqi <yannqi@qq.com >
Signed-off-by: 杨奇(yann qi) <51905299+yannqi@users.noreply.github.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: yannqiyang <yannqiyang@tencent.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-21 04:08:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f94bf9b924 
					 
					
						
						
							
							[Compile] Fix Compile Warning SM100 Cutlass MLA ( #23287 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-21 03:09:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3663870c72 
					 
					
						
						
							
							[V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support ( #23035 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: asafg <asafg@ai21.com >
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
Co-authored-by: asafg <asafg@ai21.com > 
						
						
					 
					
						2025-08-20 20:08:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2461d9e562 
					 
					
						
						
							
							[CI/Build] Split out mm processor tests ( #23260 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-20 20:05:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7be5d113d8 
					 
					
						
						
							
							[CPU] Refactor CPU W8A8 scaled_mm ( #23071 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-21 09:34:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b029de9902 
					 
					
						
						
							
							[Optimization] Make new_block_ids None if empty ( #23262 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-08-20 18:25:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bbea1cefdd 
					 
					
						
						
							
							[CI Bugfix] Fix CI by fully removing --enable-prompt-adapter ( #23284 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-20 17:18:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f5aa307d77 
					 
					
						
						
							
							Remove duplicate entry in vllm.attention.__all__ ( #23296 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-08-20 17:14:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4b795020ed 
					 
					
						
						
							
							[EP] Add logging for experts map ( #22685 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-08-20 23:46:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c86af22f31 
					 
					
						
						
							
							[Fix] remove is_marlin param in benchmark_moe ( #23286 )  
						
						 
						
						
						
						
					 
					
						2025-08-20 22:04:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10cc12ba66 
					 
					
						
						
							
							Feature/mla tests ( #23195 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com > 
						
						
					 
					
						2025-08-20 21:46:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4fbb32fab 
					 
					
						
						
							
							Remove chunked_prefill_enabled flag in V1 MLA ( #23183 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com > 
						
						
					 
					
						2025-08-20 21:43:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b125004be 
					 
					
						
						
							
							[misc] fix multiple arch wheels for the nightly index ( #23110 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-20 14:15:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4fbda0b20c 
					 
					
						
						
							
							[Feature] use --eplb_config to set eplb param ( #20562 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: rongfu.leng <lenronfu@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-20 14:07:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e51fa8cba 
					 
					
						
						
							
							Do not use eval() to convert unknown types ( #23266 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-08-20 13:28:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf7c99dfc4 
					 
					
						
						
							
							[Perf] Speed up function _convert_tokens_to_string_with_added_encoders by 13.7x ( #20413 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Saurabh Misra <misra.saurabh1@gmail.com >
Signed-off-by: Aseem Saxena <aseem.bits@gmail.com >
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: Aseem Saxena <aseem.bits@gmail.com > 
						
						
					 
					
						2025-08-20 13:17:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b95697d731 
					 
					
						
						
							
							[Frontend] improve error logging of chat completion ( #22957 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-20 13:03:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						582bbe6bd7 
					 
					
						
						
							
							[Fix] correct tool_id for kimi-k2 when use tool_choice=required ( #21259 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: wangzhengtao <wangzhengtao@msh.team > 
						
						
					 
					
						2025-08-20 12:59:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0cdbf5e61c 
					 
					
						
						
							
							[Kernel/Quant] Remove the original marlin format and qqq ( #23204 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-20 15:13:36 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ebe56a0064 
					 
					
						
						
							
							Small fix for Command-A-Vision ( #23268 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: donglu <donglu@cohere.com > 
						
						
					 
					
						2025-08-20 18:15:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f77a0802b7 
					 
					
						
						
							
							Limit HTTP header count and size ( #23267 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com > 
						
						
					 
					
						2025-08-20 17:57:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c4477f55e5 
					 
					
						
						
							
							Migrate Mistral3ImagePixelInputs to TensorSchema ( #21945 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-20 17:37:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dfd2382039 
					 
					
						
						
							
							[torch.compile] Support conditional torch.compile per module ( #22269 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-08-20 16:52:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b11b26b50 
					 
					
						
						
							
							[FIXBUG ] Allow disabling rocm_aiter_fa backend for ROCm GPUs not compatible with AITER ( #22795 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JartX <sagformas@epdcenter.es >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-08-20 09:08:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6d13bd49e 
					 
					
						
						
							
							[Misc] Add max_seq_len to CommonAttentionMetadata  ( #23216 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-20 09:05:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5efd6905bc 
					 
					
						
						
							
							[CLI][Doc] Formalize --mm-encoder-tp-mode ( #23190 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-20 23:42:28 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b17109beea 
					 
					
						
						
							
							[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute ( #23045 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shixian Cui <shixian@amazon.com > 
						
						
					 
					
						2025-08-20 10:35:26 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4449235843 
					 
					
						
						
							
							[Bugfix] Ensure correctness of HCXVision processing ( #23254 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-20 14:19:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						38217877aa 
					 
					
						
						
							
							[Fix] fix offline env use local mode path ( #22526 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-08-20 13:34:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6d80a7a96 
					 
					
						
						
							
							[Model] Improve olmo and olmo2 ( #23228 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-20 12:47:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7cd17e22d7 
					 
					
						
						
							
							[Model][V1] Support Ernie MTP ( #22169 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhouchong <zhouchong03@baidu.com >
Co-authored-by: zhouchong <zhouchong03@baidu.com > 
						
						
					 
					
						2025-08-20 20:41:55 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						50df09fe13 
					 
					
						
						
							
							Update to flashinfer-python==0.2.12 and disable AOT compile for non-release image ( #23129 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-20 08:05:54 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68fcd3fa73 
					 
					
						
						
							
							[Bugfix] Ensure correctness of Cohere2Vision processing ( #23245 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-20 11:09:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83e69a09d6 
					 
					
						
						
							
							[Model] Support deepseek with eagle ( #21086 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xin Yang <xyangx@amazon.com > 
						
						
					 
					
						2025-08-20 19:01:31 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3aa8c10038 
					 
					
						
						
							
							Fix missing quotes ( #23242 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shiming Zhang <wzshiming@hotmail.com > 
						
						
					 
					
						2025-08-20 10:46:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						103f1ec8d3 
					 
					
						
						
							
							[Model] use autoWeightsLoader for gptoss ( #22446 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: calvin chen <wen.chen@dynamia.ai > 
						
						
					 
					
						2025-08-20 10:16:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d983769c41 
					 
					
						
						
							
							fix cuda graph ( #22721 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fsx950223 <fsx950223@outlook.com > 
						
						
					 
					
						2025-08-20 06:24:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8fd920924c 
					 
					
						
						
							
							[BugFix] Fix stuck stats/metrics after requests are aborted ( #22995 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-20 13:50:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de7b67a023 
					 
					
						
						
							
							[CI/Build] Sync multimodal tests ( #23181 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-20 05:06:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f729023272 
					 
					
						
						
							
							[CI/Build] Also check DP in benchmarks throughput script ( #23038 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-08-20 04:09:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1a3079a15e 
					 
					
						
						
							
							chore: support pytorch format in lora  ( #22790 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jaeeun.kil <rha3122@naver.com >
Signed-off-by: 길재은 <rha3122@naver.com > 
						
						
					 
					
						2025-08-20 04:02:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						941f56858a 
					 
					
						
						
							
							Fix a performance comparison issue in Benchmark Suite ( #23047 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com > 
						
						
					 
					
						2025-08-20 03:14:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a634733f67 
					 
					
						
						
							
							[Attention] Optimize make_local_attention_virtual_batches for Flash Attention ( #23185 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: linzebing <linzebing1995@gmail.com > 
						
						
					 
					
						2025-08-20 02:57:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						64ab3c7253 
					 
					
						
						
							
							[Doc] Update V1 status of various pooling models ( #23189 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-20 10:33:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e58c5a9768 
					 
					
						
						
							
							[Core] Add torch profiler CPU traces for AsyncLLM. ( #21794 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-08-20 02:32:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d46d417b58 
					 
					
						
						
							
							[CI Perf] Only test bfloat16 for tests/compile/test_fusion_all_reduce.py ( #23132 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-19 20:18:52 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0167efe20d 
					 
					
						
						
							
							[Core] Optimize scheduler request removal for single completions ( #21917 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chiliu <chiliu@paypal.com >
Signed-off-by: chiliu <cliu_whu@yeah.net >
Co-authored-by: chiliu <chiliu@paypal.com > 
						
						
					 
					
						2025-08-19 18:25:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c32e6ad1f6 
					 
					
						
						
							
							[Quantization] Bump Compressed Tensors Version ( #23202 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-20 00:39:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1630cc8d0f 
					 
					
						
						
							
							[Benchmarks] Add video inputs to ShareGPTDataset.  ( #23199 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-08-19 23:42:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14e2b0730b 
					 
					
						
						
							
							[BugFix] fix CUTLASS MLA full cudagraph  ( #23200 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-08-19 22:17:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0f4f0191d8 
					 
					
						
						
							
							[CI/Build] Replace lm-eval gsm8k tests with faster implementation ( #23002 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-19 15:07:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a38b8af4c3 
					 
					
						
						
							
							[NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend ( #22357 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com > 
						
						
					 
					
						2025-08-19 18:01:53 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21dce80ea9 
					 
					
						
						
							
							[CI/Build] Add support for Python 3.13 ( #13164 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-19 13:49:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e61bac87ee 
					 
					
						
						
							
							[Misc] Minor refactoring for FlashInfer backend ( #23147 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-19 13:11:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						80141bbf2f 
					 
					
						
						
							
							fix: use cache_salt for gpt-oss ( #23186 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com > 
						
						
					 
					
						2025-08-19 18:12:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b94faf9d50 
					 
					
						
						
							
							[Bugfix] Fix accuracy issue when using flashinfer cutlass moe, TP=1 and modelopt. ( #23125 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-19 14:00:51 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b5f350d67 
					 
					
						
						
							
							[Misc] Enable yapf for FlashInfer backend ( #23193 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-19 10:33:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f7cf5b512e 
					 
					
						
						
							
							[Frontend] Add /collective_rpc API endpoint ( #23075 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-08-19 17:29:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						03d4235fd2 
					 
					
						
						
							
							[Misc] Fix the benchmark's README and improve the error messages for the benchmark's argument checks ( #22654 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tanruixiang <tanruixiang0104@gmail.com > 
						
						
					 
					
						2025-08-19 10:18:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6a1a20973 
					 
					
						
						
							
							[CI/Build] Update transformers to v4.55.2 ( #23093 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-19 10:06:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a70d0bd0a3 
					 
					
						
						
							
							Migrate LlavaOnevisionMultiInputs to TensorSchema ( #21844 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-19 17:02:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24f4d1a224 
					 
					
						
						
							
							Add return_token_ids parameter to OpenAI API endpoints ( #22587 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuge Zhang <scottyugochang@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-08-19 09:48:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f510bc2a1 
					 
					
						
						
							
							[Model] Removes redundant all-reduce operation in Qwen3MoeSparseMoeBlock ( #23169 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com > 
						
						
					 
					
						2025-08-19 16:18:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1298c67795 
					 
					
						
						
							
							[FEAT] [Performance] Enable DP for ViT in Qwen2.5VL ( #22742 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-19 15:25:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d9c61993a 
					 
					
						
						
							
							[Bugfix] Fix benchmark_moe.py  ( #23177 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-19 13:39:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b87cb97a53 
					 
					
						
						
							
							[Model] support new model ovis2.5 ( #23084 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: myselvess <244285088@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-19 13:12:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f856c33ce9 
					 
					
						
						
							
							[Model] Add multi_label_classification support ( #23173 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-19 12:54:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						03752dba8f 
					 
					
						
						
							
							[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel ( #21716 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-08-19 08:22:15 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40f26734b9 
					 
					
						
						
							
							[Misc] Fix seq_lens for graph capture ( #23175 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-19 03:58:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c3f557f08 
					 
					
						
						
							
							[Doc] use power of 2 ( #23172 )  
						
						 
						
						
						
						
					 
					
						2025-08-19 03:16:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21bcc8263f 
					 
					
						
						
							
							[Misc] Avoid accessing req_ids inside a loop ( #23159 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-19 09:39:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5bfe0dea7a 
					 
					
						
						
							
							[bug fix] Fix llama4 spec decoding ( #22691 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com > 
						
						
					 
					
						2025-08-19 08:53:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31fd3265c8 
					 
					
						
						
							
							[Bugfix] Fix broken Minimax-01-VL model ( #22116 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-19 08:49:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31436e8b4f 
					 
					
						
						
							
							[Misc] Add request_id into benchmark_serve.py ( #23065 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yangxia <yangxiast@gmail.com > 
						
						
					 
					
						2025-08-19 08:32:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4efd43e9b4 
					 
					
						
						
							
							Fix GLM-4.5V-FP8 numerical issue ( #22949 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-19 07:56:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c8a787247 
					 
					
						
						
							
							[Benchmark] Add flag --served-model-name to benchmark_serving_multi_turn ( #22889 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: daniels <daniels@pliops.com > 
						
						
					 
					
						2025-08-19 07:48:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01a08739e0 
					 
					
						
						
							
							[misc] split engine_model into json file for nsys profile tool ( #23117 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Grace Ho <grho@nvidia.com >
Signed-off-by: Grace Ho <146482179+gracehonv@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-19 15:44:53 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fda9537c5e 
					 
					
						
						
							
							[Model] Support Pipeline Parallelism for moonshotai/Kimi-VL-A3B-Thinking-2506 ( #23114 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-19 14:24:31 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90bbe0a5ad 
					 
					
						
						
							
							[Log] Warning Once for Cutlass MLA  ( #23137 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-18 23:24:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e75f342261 
					 
					
						
						
							
							Migrate InternVLImagePixelInputs (in nemotron_vl.py) to TensorSchema ( #22023 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-19 13:48:26 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78dba404ad 
					 
					
						
						
							
							[Hardware][IBM Z]Enable v1 for s390x and s390x dockerfile fixes ( #22725 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nikhil Suryawanshi <suryawanshin74@gmail.com > 
						
						
					 
					
						2025-08-19 04:40:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e9d6a3db69 
					 
					
						
						
							
							[TPU] make ptxla not imported when using tpu_commons ( #23081 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@gmail.com >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: Chengji Yao <chengjiyao@gmail.com > 
						
						
					 
					
						2025-08-19 11:46:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4454e9401 
					 
					
						
						
							
							chore: disable enable_cpp_symbolic_shape_guards ( #23048 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiao Liu <xiszishu@gmail.com > 
						
						
					 
					
						2025-08-18 23:08:05 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14006840ea 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 FlashInfer attention backend ( #22776 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-18 19:54:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6603288736 
					 
					
						
						
							
							[CI][V0 Deprecation] Removed V0 Only Chunked Prefill and Prefix Caching Tests ( #22871 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-18 17:39:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95e3095136 
					 
					
						
						
							
							[Misc] Add @tdoublep as a maintainer of hybrid model and Triton-attention related code ( #23122 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-19 08:31:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9b38be8aa 
					 
					
						
						
							
							[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT ( #23041 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-18 17:20:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0dd3f4f5ab 
					 
					
						
						
							
							[Misc] Minor refactoring for prepare_inputs ( #23116 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-18 16:58:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						498259ccce 
					 
					
						
						
							
							Install tpu_info==0.4.0 to fix core dump for TPU ( #23135 )  
						
						 
						
						
						
						
					 
					
						2025-08-18 16:23:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d25e3fd6e 
					 
					
						
						
							
							Use Blackwell FlashInfer MXFP4 MoE by default if available  ( #23008 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-18 15:25:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac6eb49de3 
					 
					
						
						
							
							fix: OpenAI SDK compat (ResponseTextConfig) ( #23126 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: breno.skuk <breno.skuk@hcompany.ai >
Signed-off-by: Breno Baldas Skuk <breno.skuk@hcompany.ai >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-18 15:22:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf756321c7 
					 
					
						
						
							
							[CI Bugfix] Pin openai<1.100 to unblock CI ( #23118 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-18 12:14:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e3bb543f0 
					 
					
						
						
							
							[Bugfix] Support compile for Transformers multimodal ( #23095 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: raushan <raushan@huggingface.co > 
						
						
					 
					
						2025-08-18 13:35:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						569aefd134 
					 
					
						
						
							
							chore: remove unnecessary patch_padding_side for the chatglm model ( #23090 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: carlory <baofa.fan@daocloud.io >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-18 12:32:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3f71f1224 
					 
					
						
						
							
							[Refactor] Get prompt updates earlier ( #23097 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-18 12:31:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a30bd10d8 
					 
					
						
						
							
							[Bugfix] fix IntermediateTensors equal method ( #23027 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-18 02:58:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27e8d1ea3e 
					 
					
						
						
							
							[Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs ( #23053 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-18 09:52:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c79b0d648 
					 
					
						
						
							
							[XPU][CI]add xpu env vars in CI scripts ( #22946 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-08-18 09:47:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5f5664b3e4 
					 
					
						
						
							
							[XPU] Fix compile size for xpu ( #23069 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-08-18 00:04:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89657a557c 
					 
					
						
						
							
							[Misc] Fix backward compatibility from  #23030  ( #23070 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-17 23:33:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08d5f7113a 
					 
					
						
						
							
							[Misc] refactor function name ( #23029 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-17 22:16:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2fd0b81e0 
					 
					
						
						
							
							[Bugfix][CI] Machete kernels: deterministic ordering for more cache hits ( #23055 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Lo <andy@mistral.ai > 
						
						
					 
					
						2025-08-17 22:10:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f1c642254 
					 
					
						
						
							
							[Bugfix] fix Qwen2.5-Omni processor output mapping ( #23058 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: double7 <33449816+DoubleVII@users.noreply.github.com >
Co-authored-by: 杨森 <yangsen.double7@bytedance.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-17 22:09:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7be3a59d8e 
					 
					
						
						
							
							[Misc] enhance static type hint ( #23059 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-17 22:09:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ea0c2753a 
					 
					
						
						
							
							[Misc] Minor code cleanup for _get_prompt_logprobs_dict ( #23064 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-17 18:16:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0fc8fa751a 
					 
					
						
						
							
							fix: gptq marlin weight loading failure ( #23066 )  
						
						 
						
						
						
						
					 
					
						2025-08-17 15:56:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21e39436c8 
					 
					
						
						
							
							[XPU] fix xpu to set cudagraph batch sizes ( #23044 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: calvin chen <wen.chen@dynamia.ai > 
						
						
					 
					
						2025-08-17 21:45:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d243efeda 
					 
					
						
						
							
							[Misc] Convert use_structured_output property into constant ( #23060 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-17 12:41:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c55bc1db26 
					 
					
						
						
							
							[Misc] Remove dead return ( #23061 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-17 10:36:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						292084e72a 
					 
					
						
						
							
							[BugFix] Fix for IMA in FA3 varlen combine ( #22967 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-08-17 08:52:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						16bff144be 
					 
					
						
						
							
							[Misc] fix typo in the multimodal doc ( #23051 )  
						
						 
						
						
						
						
					 
					
						2025-08-17 01:56:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe0411fc6f 
					 
					
						
						
							
							[Bugfix] should use stack instead of concat ( #22972 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 947132885 <947132885@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-17 08:46:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d4061b6e7 
					 
					
						
						
							
							[Kernel] Add cuda kernel for gpt_oss activation ( #22951 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-17 05:03:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						87f48623a5 
					 
					
						
						
							
							[Misc] method name typo fix ( #23042 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-16 21:49:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c32143b9d 
					 
					
						
						
							
							[Refactor] Defer tensor data construction in MultiModalKwargs ( #23030 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-16 21:05:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						94096a47c9 
					 
					
						
						
							
							[UX] Separate marlin moe config logic from triton moe ( #23006 )  
						
						 
						
						
						
						
					 
					
						2025-08-16 22:16:42 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a258ad8bcc 
					 
					
						
						
							
							[Bugfix] fix qwen3 moe fp8 accuracy issue ( #23031 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com > 
						
						
					 
					
						2025-08-16 17:41:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf7f470b22 
					 
					
						
						
							
							[V1] Logits processors extensibility ( #19912 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@redhat.com >
Signed-off-by: Andrew Feldman <afeld2012@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Andrew Feldman <afeld2012@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-16 12:59:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4fc722eca4 
					 
					
						
						
							
							[Kernel/Quant] Remove AQLM ( #22943 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-08-16 19:38:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3253ae765e 
					 
					
						
						
							
							[Flaky CI] Increase timeout tolerance for test_mp_crash_detection+test_default_mm_lora_chat_completions ( #23028 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-16 18:33:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						000cceca8c 
					 
					
						
						
							
							[Bugfix gpt-oss] Fix float32 convert for flashinfer sink support ( #23016 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-16 11:16:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68373d3126 
					 
					
						
						
							
							[Frontend] Added support for HermesToolParser for models without special tokens ( #16890 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: minpeter <kali2005611@gmail.com > 
						
						
					 
					
						2025-08-16 17:38:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						52ce1420e9 
					 
					
						
						
							
							Fix handling of max_num_batched_tokens for pooling tasks ( #23004 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-08-16 17:36:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						829bbd7882 
					 
					
						
						
							
							[New Model]mBART model ( #22883 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-08-16 12:16:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4dff91c93d 
					 
					
						
						
							
							[Refactor] Allow optional MultiModalKwargsItem in IPC ( #23022 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-16 11:30:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de9cb61763 
					 
					
						
						
							
							Add docs for PrefixRepetitionDataset + enable usage with vllm bench throughput ( #23012 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-16 10:21:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2dbccce8a6 
					 
					
						
						
							
							[CI][Bugfix] Skip Ovis2 generation test because of broken remote code ( #22954 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-16 09:44:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						933f45334a 
					 
					
						
						
							
							[Core] Make cudagraph check cuda platform only ( #23005 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@gmail.com >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: Chengji Yao <chengjiyao@gmail.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-16 07:46:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc826a202b 
					 
					
						
						
							
							[Multimodal] Update Tensor schema test to cover arbitrary shape mm inputs ( #22867 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-16 00:44:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d3da472bc 
					 
					
						
						
							
							[Misc] Add --save-dir option to benchmark_moe ( #23020 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-16 07:26:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78863f8c5c 
					 
					
						
						
							
							[BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors ( #22962 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-08-16 06:25:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5157827cfc 
					 
					
						
						
							
							[Build] Env var to disable sccache ( #22968 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-08-16 05:36:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7caec10e7b 
					 
					
						
						
							
							[XPU]avoid circular import during XPU init ( #23017 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-08-16 05:16:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f83e7d849 
					 
					
						
						
							
							[misc] nsys profile output kernel classifier and visualizer ( #22971 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Grace Ho <grho@nvidia.com > 
						
						
					 
					
						2025-08-16 02:52:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e4e37ded56 
					 
					
						
						
							
							[V1] support min_tokens for detokener ( #22014 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: calvin chen <wen.chen@dynamia.ai >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-16 02:28:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f6b5040590 
					 
					
						
						
							
							[Frontend] Avoid list copies in serving_chat.py ( #22947 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-16 02:06:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fbd88728b3 
					 
					
						
						
							
							[Bugfix] Fix DeepSeek MTP ( #22934 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-08-16 01:25:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						070da660c1 
					 
					
						
						
							
							[Kernel] Simplify get_kv_cache_layout and cache use_trtllm_attention env-dependent bit ( #22735 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-16 00:14:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad0297d113 
					 
					
						
						
							
							[Misc] Support passing multiple request ids at once to AsyncLLM.abort() ( #22944 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-15 17:00:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						236b864e4f 
					 
					
						
						
							
							[BugFix] Make run_once thread-safe ( #22978 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <wenji.yyc@alibaba-inc.com >
Signed-off-by: Yichen Yan <wenji.yyc@alibaba-inc.com > 
						
						
					 
					
						2025-08-15 16:56:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e2f7985a2 
					 
					
						
						
							
							Support multiple attention groups for KV sharing ( #22672 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-08-15 16:54:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c280066f9d 
					 
					
						
						
							
							[v1] Move block_hashes from KVCacheManager to Request.block_hashes ( #19728 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-08-15 16:52:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9dc9d2607 
					 
					
						
						
							
							[BugFix] Handle case where async utility call is cancelled ( #22996 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Yinghai Lu <yinghai@thinkingmachines.ai > 
						
						
					 
					
						2025-08-15 17:38:42 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1fc375dc05 
					 
					
						
						
							
							[Structured Outputs] [Bug] Fix misalignment in apply_grammar_bitmask causing unintended masking and NaN logits ( #22963 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rishitdholakia13 <rishit+github@cohere.com > 
						
						
					 
					
						2025-08-15 23:25:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						76144adf76 
					 
					
						
						
							
							ci: Add CUDA + arm64 release builds ( #21201 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eli Uriegas <eliuriegas@meta.com > 
						
						
					 
					
						2025-08-15 23:16:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f5d412bafb 
					 
					
						
						
							
							[BugFix] Fix regression caused by mamba state dtype PR ( #22998 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-15 22:55:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						177e55e3bd 
					 
					
						
						
							
							[Attention] FA3 Attention Sinks Perf Boost ( #22478 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-08-15 17:41:07 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1723ef1aae 
					 
					
						
						
							
							minor: zero workspace buffer init for flashinfer trtllm-gen attn ( #22603 )  
						
						 
						
						
						
						
					 
					
						2025-08-15 21:38:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						00d6cba0cf 
					 
					
						
						
							
							Add PrefixRepetitionRandomDataset to vllm bench serve datasets ( #20638 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-08-15 14:09:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f89ed248f 
					 
					
						
						
							
							[Fix] enable swap_ab for pplx problem size computation ( #22991 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shixian Cui <shixian@amazon.com >
Co-authored-by: Shixian Cui <shixian@amazon.com > 
						
						
					 
					
						2025-08-15 14:02:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a87cd27d9 
					 
					
						
						
							
							[CI] Speed up Whisper tests by reusing server ( #22859 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-15 16:56:31 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a344a1a7da 
					 
					
						
						
							
							Use regex in convert-results-json-to-markdown.py ( #22989 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-15 20:54:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						79899b63f6 
					 
					
						
						
							
							[Bugfix] Added more env vars to hash ( #22449 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Lin <jullin@nvidia.com > 
						
						
					 
					
						2025-08-15 20:08:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e670778cd 
					 
					
						
						
							
							[Core] direct indexing on self.block_table_np in compute_slot_mapping ( #22940 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: linzebing <linzebing1995@gmail.com > 
						
						
					 
					
						2025-08-15 12:12:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						df5afa82e5 
					 
					
						
						
							
							[Log] Debug Once for Randomizing dummy data for DP Rank ( #22860 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-15 11:51:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6cd69f51bf 
					 
					
						
						
							
							[Model] Granite-4 support loading quantized checkpoint ( #22925 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com > 
						
						
					 
					
						2025-08-15 18:47:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ad7285ea2 
					 
					
						
						
							
							[Kernels] Clean up FusedMoeMethodBase and modular kernel setup.  Remove extra arguments from modular kernel methods. ( #22035 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-15 14:46:00 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						48b01fd4d4 
					 
					
						
						
							
							[Structured Output] Make the output of structured output example more complete ( #22481 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-08-15 18:29:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						993d3d122b 
					 
					
						
						
							
							[Benchmarks] Include image data when ShareGPT4V dataset is used. ( #22955 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-08-15 18:23:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68af77e51c 
					 
					
						
						
							
							[FIXBUG] Correctly Apply Grammar Bitmask in Mixed Batches ( #22896 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JartX <sagformas@epdcenter.es > 
						
						
					 
					
						2025-08-15 17:42:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6b04039a72 
					 
					
						
						
							
							[BugFix] Skip the Q component for QKVParallelLinear in the case of QKVCrossParallelLinear since its width is 0 ( #22369 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sstamenk <sstamenk@amd.com > 
						
						
					 
					
						2025-08-15 17:17:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c859a1387 
					 
					
						
						
							
							[V0 Deprecation] Remove advance_step ( #22969 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-15 08:22:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						74f441f4b5 
					 
					
						
						
							
							[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer ( #20059 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fhl <2410591650@qq.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com > 
						
						
					 
					
						2025-08-15 10:01:39 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0632a3e03 
					 
					
						
						
							
							[Frontend] Expose do_log_stats interval to env ( #22905 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Csrayz <jover@cmbchina.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-15 13:00:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e8b40c7fa2 
					 
					
						
						
							
							[CI] Remove duplicated docs build from buildkite ( #22924 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-15 05:58:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						48f4636927 
					 
					
						
						
							
							[Misc] Ignore ep_kernels_workspace ( #22807 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-15 05:58:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75531a6c13 
					 
					
						
						
							
							[V1] [Hybrid] Support using float32 for state in Hybrid Models (Mamba2, Mamba1, Minimax) ( #22928 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com >
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Daniel Afrimi <danielafrimi8@gmail.com >
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-15 12:57:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						22341b996e 
					 
					
						
						
							
							Improve multimodal hasher performance for re-used Image prompts ( #22825 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Staszek Pasko <staszek@gmail.com > 
						
						
					 
					
						2025-08-15 12:32:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						49252cf59e 
					 
					
						
						
							
							[MM] Allow skipping memory profiling for multimodal models. ( #22950 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-15 11:41:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e6dd40016 
					 
					
						
						
							
							[Bugfix] fix cuda 12.6 and 11.8 build ( #22952 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com > 
						
						
					 
					
						2025-08-15 10:10:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa300c438d 
					 
					
						
						
							
							[Bugfix] Unquote file uri before reading image ( #22912 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sayandip Dutta <sayandip199309@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-15 09:28:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe91ce9591 
					 
					
						
						
							
							[V1] - Split Prefill and Decode for Mamba1 models ( #22653 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: amirk <amirk@ai21.com >
Signed-off-by: asafg <asafg@ai21.com >
Co-authored-by: asafg <asafg@ai21.com >
Co-authored-by: Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com > 
						
						
					 
					
						2025-08-15 08:59:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5406ebf5c9 
					 
					
						
						
							
							[CI] Pooling models mteb test uses enforce_eager ( #22878 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-15 01:16:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2c06509e5 
					 
					
						
						
							
							[P/D]Provide bucket algorithm rate limiter  for proxy_server ( #22643 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: frankie-ys <yongshengwang@cmbchina.com >
Signed-off-by: frankie <wangyongsheng686@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Kuntai Du <kuntai@uchicago.edu > 
						
						
					 
					
						2025-08-15 07:01:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2f6c247a9 
					 
					
						
						
							
							Revert "[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module." ( #22956 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-08-15 06:39:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d232dbd19 
					 
					
						
						
							
							[Mamba] - refactor: Renamed mamba_attn to mamba2_attn ( #22818 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: asafg <asafg@ai21.com >
Co-authored-by: asafg <asafg@ai21.com > 
						
						
					 
					
						2025-08-15 06:38:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c3fbfe46b 
					 
					
						
						
							
							[Feature] Full Cuda Graph Support for Cutlass MLA and 6% E2E Throughput Improvement ( #22763 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-15 06:27:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4cef5e6c7 
					 
					
						
						
							
							refactor: Change scaling factors calculation for flashinfer FusedMoE ( #22812 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-15 06:19:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0fe85087a9 
					 
					
						
						
							
							[CI Perf] Prune tests in tests/kernels/attention/ ( #22936 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-14 21:34:53 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d2b0e97ea6 
					 
					
						
						
							
							[CI Perf] Prune tests in tests/kernels/moe/ ( #22939 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-14 21:33:42 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						590bddbfc5 
					 
					
						
						
							
							[CI Perf] Prune tests in tests/kernels/quantization/ ( #22942 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-14 21:25:34 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae05a6d83d 
					 
					
						
						
							
							[BugFix] Fix port lookup in internal DP LB tests ( #22252 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-15 11:17:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0933f9d518 
					 
					
						
						
							
							[BugFix][KVConn] Fix use of get_required_kvcache_layout ( #22734 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-15 01:39:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f1f0d2fab8 
					 
					
						
						
							
							Revert "[Kernel]  Add cuda kernel for gpt_oss activation" ( #22948 )  
						
						 
						
						
						
						
					 
					
						2025-08-14 17:38:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						81f4b96481 
					 
					
						
						
							
							[Kernel]  Add cuda kernel for gpt_oss activation ( #22538 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-14 17:21:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						39cd09dc86 
					 
					
						
						
							
							[Bugfix] use flash attn on sm90 ( #22933 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-14 16:37:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						919234fe17 
					 
					
						
						
							
							[BugFix] Fix initial DP request load imbalance ( #22910 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-14 15:20:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ebcce2cd36 
					 
					
						
						
							
							[Core] Return final response for aborted requests from AsyncLLM.generate ( #22283 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-14 14:49:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4121de512e 
					 
					
						
						
							
							[Quantization]: Support compressed-tensors mixed-precision model loading ( #22468 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-08-14 17:32:09 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						279a5f31b3 
					 
					
						
						
							
							[Kernel] Add nvfp4 gemm flashinfer backends ( #22346 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Lin <jullin@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-14 16:03:55 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b8ff05361a 
					 
					
						
						
							
							[CI] Temporarily disable flaky test  ( #22930 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-08-14 19:59:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						637093ae26 
					 
					
						
						
							
							docs: update fastsafetensors usage instructions ( #22891 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nir Levy <bhr166@gmail.com > 
						
						
					 
					
						2025-08-14 19:56:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						33c63e9547 
					 
					
						
						
							
							[Kernel] [Quantization] Add MXFP4 and bias support for marlin kernel ( #22428 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Animesh Jain <anijain@umich.edu >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: kf <kuanfu.liu@embeddedllm.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com >
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: Roger Wang <hey@rogerw.me >
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: yan <yan.ma@intel.com >
Signed-off-by: Yan Ma <yan.ma@intel.com >
Signed-off-by: Xiao Liu <xiszishu@gmail.com >
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es >
Signed-off-by: Andy Xie <andy.xning@gmail.com >
Signed-off-by: Haibin Lin <haibin.lin@bytedance.com >
Signed-off-by: David Ben-David <davidb@pliops.com >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Signed-off-by: Abirdcfly <fp544037857@gmail.com >
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: huangweixiao <huangweixiao@msh.team >
Signed-off-by: alyosha-swamy <raghav@arcee.ai >
Signed-off-by: Eric Hanley <ericehanley@google.com >
Signed-off-by: Abatom <abzhonghua@gmail.com >
Signed-off-by: CLFutureX <775523362@qq.com >
Signed-off-by: Linkun Chen <github@lkchen.net >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: tlipoca9 <tlipoca9@gmail.com >
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Benji Beck <benjibeck@meta.com >
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: simon-mo <xmo@berkeley.edu >
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Zhang Jason <ning.zhang2@amd.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: asafg <asafg@ai21.com >
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Lain <fusiyuan2000@hotmail.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: QscQ <qscqesze@gmail.com >
Signed-off-by: qingjun <qingjun@minimaxi.com >
Signed-off-by: Syed Muhammad Bin Asif <syedmba7@connect.hku.hk >
Signed-off-by: Lionel Villard <villard@us.ibm.com >
Signed-off-by: ycyaw66 <497410282@qq.com >
Signed-off-by: David Chen <530634352@qq.com >
Signed-off-by: Linkun <github@lkchen.net >
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com >
Signed-off-by: Ming Yang <minos.future@gmail.com >
Signed-off-by: Adrian Garcia <adrian.garcia@inceptionai.ai >
Signed-off-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com >
Signed-off-by: Ricardo Decal <rdecal@anyscale.com >
Signed-off-by: Andrew Chan <andrewkchan.akc@gmail.com >
Signed-off-by: Felix Marty <Felix.Marty@amd.com >
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com >
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: XIn Li <xinli@nvidia.com >
Signed-off-by: Junhao Li <junhao@ubicloud.com >
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Signed-off-by: <zyy1102000@gmail.com >
Signed-off-by: Guy Stone <guys@spotify.com >
Signed-off-by: <yyweiss@gmail.com >
Signed-off-by: yyw <yyweiss@gmail.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com >
Signed-off-by: Pradyun92 <142861237+Pradyun92@users.noreply.github.com >
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com >
Co-authored-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Huzaifa Sidhpurwala <huzaifas@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Animesh Jain <jainanimesh2305@yahoo.com >
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com >
Co-authored-by: XiongfeiWei <isaacwxf23@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: JartX <sagformas@gmail.com >
Co-authored-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: kf <kuanfu.liu@embeddedllm.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com >
Co-authored-by: Yong Hoon Shin <48474650+sarckk@users.noreply.github.com >
Co-authored-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com >
Co-authored-by: Yuxuan Zhang <2448370773@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Yan Ma <yan.ma@intel.com >
Co-authored-by: Xiao <xiszishu@gmail.com >
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com >
Co-authored-by: Ning Xie <andy.xning@gmail.com >
Co-authored-by: H <linhaibin.eric@gmail.com >
Co-authored-by: David Ben-David <sdavidbd@gmail.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
Co-authored-by: TankNee <nee@tanknee.cn >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Co-authored-by: ZiTian.Zhao <zitian.zhao@tencentmusic.com >
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Abirdcfly <fp544037857@gmail.com >
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com >
Co-authored-by: Chenxi Yang <cxyang@cs.utexas.edu >
Co-authored-by: Chenxi Yang <cxyang@meta.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Weixiao Huang <hwx.simle@gmail.com >
Co-authored-by: Raghav Ravishankar <113712354+alyosha-swamy@users.noreply.github.com >
Co-authored-by: ericehanley <ericehanley@google.com >
Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com >
Co-authored-by: Po-Han Huang (NVIDIA) <53919306+nvpohanh@users.noreply.github.com >
Co-authored-by: PiteXChen <44110731+CLFutureX@users.noreply.github.com >
Co-authored-by: lkchen <github@lkchen.net >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: tlipoca9 <160737620+tlipoca9@users.noreply.github.com >
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
Co-authored-by: wang.yuqi <noooop@126.com >
Co-authored-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Siyuan Liu <lsiyuan@google.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Zhang Jason <ning.zhang2@amd.com >
Co-authored-by: Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com >
Co-authored-by: asafg <asafg@ai21.com >
Co-authored-by: Lain <siyuanf@nvidia.com >
Co-authored-by: tc-mb <157115220+tc-mb@users.noreply.github.com >
Co-authored-by: imning3 <hbning@pku.edu.cn >
Co-authored-by: Maximilien de Bayser <mbayser@br.ibm.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Tao He <linzhu.ht@alibaba-inc.com >
Co-authored-by: qscqesze <qingjun@minimaxi.com >
Co-authored-by: Syed Muhammad Bin Asif <92625830+syedmba@users.noreply.github.com >
Co-authored-by: Lionel Villard <villard@us.ibm.com >
Co-authored-by: WeiQing Chen <40507679+david6666666@users.noreply.github.com >
Co-authored-by: ycyaw66 <497410282@qq.com >
Co-authored-by: Moritz Sanft <58110325+msanft@users.noreply.github.com >
Co-authored-by: Ming Yang <minos.future@gmail.com >
Co-authored-by: Adrián García García <adrigarvk8@gmail.com >
Co-authored-by: Michael Goin <mgoin@redhat.com >
Co-authored-by: JaceyShao <65159281+JaceyShao@users.noreply.github.com >
Co-authored-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com >
Co-authored-by: Ricardo Decal <crypdick@users.noreply.github.com >
Co-authored-by: Andrew Chan <andrewkchan.akc@gmail.com >
Co-authored-by: fxmarty-amd <felmarty@amd.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: Zhiyu <zhiyuc@nvidia.com >
Co-authored-by: Shu Wang <shuw@nvidia.com >
Co-authored-by: XIn Li <xinli@nvidia.com >
Co-authored-by: Junhao Li <streaver91@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: Hong Hanh <hanh.usth@gmail.com >
Co-authored-by: Daniel Serebrenik <74646983+pliops-daniels@users.noreply.github.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Guy Stone <guys@spotify.com >
Co-authored-by: yyweiss <70619747+yyweiss@users.noreply.github.com >
Co-authored-by: Pradyun92 <142861237+Pradyun92@users.noreply.github.com >
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com > 
						
						
					 
					
						2025-08-14 11:23:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ab9f2cfd19 
					 
					
						
						
							
							[CI] [Hybrid]  Bump min transformers version for Bamba and Jamba ( #22908 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-14 11:01:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dbe298046c 
					 
					
						
						
							
							[Bugfix] Fix parsing of --disable-mm-preprocessor-cache ( #22909 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-14 08:09:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						625ccd1c4d 
					 
					
						
						
							
							[Bugfix] Replace custom Encoding class with BatchEncoding in MistralTokenizer ( #22786 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zjy0516 <riverclouds.zhu@qq.com > 
						
						
					 
					
						2025-08-14 08:09:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						92ff41abea 
					 
					
						
						
							
							[Model] Modify the gate implementation of glm4_moe ( #22832 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-14 05:28:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						829b9a62d0 
					 
					
						
						
							
							[Perf] Dont create unnecessary pooling params ( #22876 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-08-14 05:28:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						540d54ca8d 
					 
					
						
						
							
							[CI] Re-enable transcriptions test_long_audio_request ( #22890 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-14 11:34:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0783f13960 
					 
					
						
						
							
							[Doc] fix dead link ( #22898 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com > 
						
						
					 
					
						2025-08-14 04:06:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7655dc3e45 
					 
					
						
						
							
							[Bugfix] Add reset prefix cache for online serving ( #22726 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-14 04:04:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4efda821d 
					 
					
						
						
							
							Remove Phi 4 Flash configuration workaround ( #22723 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-14 04:03:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb08487b18 
					 
					
						
						
							
							[BugFix] Threadsafe close async zmq sockets ( #22877 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-14 03:44:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c3a0741c6 
					 
					
						
						
							
							[Bugfix] Fix PixtralHFImagePixelInputs dynamic shape check ( #22827 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-14 02:35:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						00e3f9da46 
					 
					
						
						
							
							vLLM Benchmark suite improvement ( #22119 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: Li, Jiang <bigpyj64@gmail.com > 
						
						
					 
					
						2025-08-14 07:12:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a353bd083d 
					 
					
						
						
							
							[CI] remove flaky v0 test ( #22864 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-08-13 21:41:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1d20c34717 
					 
					
						
						
							
							[CI] Fix tests/distributed/test_ca_buffer_sharing.py ( #22849 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-08-13 20:09:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b6af24fba7 
					 
					
						
						
							
							[CI][Entrypoints]: add filter to generation to filter out invalid tool calls ( #22826 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Will Eaton <weaton@redhat.com > 
						
						
					 
					
						2025-08-13 20:09:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ca2393b47 
					 
					
						
						
							
							[CI/Build] Increase pooling tolerance to pass CI ( #22844 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-13 18:52:48 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31a500c86f 
					 
					
						
						
							
							[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP ( #22437 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-08-13 14:44:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e8614e88b 
					 
					
						
						
							
							Move checklist in PR template ( #22852 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Luka Govedic <lgovedic@redhat.com > 
						
						
					 
					
						2025-08-13 21:38:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6cd5ca3d3 
					 
					
						
						
							
							[ROCm][Bugfix] Fix compilation error in topk softmax fused kernel ( #22819 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com > 
						
						
					 
					
						2025-08-13 13:45:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						df0e0f023e 
					 
					
						
						
							
							[CI/Build] Skip gpt_big model test because of broken HF model ( #22848 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-13 20:36:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4b78d6317 
					 
					
						
						
							
							[CI/Build] Fix param mismatch in test_eagle_correctness ( #22847 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-13 10:55:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12817a8ac7 
					 
					
						
						
							
							[CI] Fix tests/v1/e2e/test_kv_sharing_fast_prefill.py import on test ( #22815 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-13 10:35:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9232d41f4 
					 
					
						
						
							
							[CI/Build] Update VLM common tests ( #22841 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-13 10:03:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9bd9294f0e 
					 
					
						
						
							
							[Bugfix] Fix MiniCPMV Image input inference failed ( #22813 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: HWH <67449739+jio-H@users.noreply.github.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-13 09:41:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da2705198f 
					 
					
						
						
							
							[Misc] clear and separate error messages for input too long and input + max-tokens too long ( #22803 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-13 07:22:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						19b927e52d 
					 
					
						
						
							
							[Core] Use individual MM items in P0/P1 cache and model runner ( #22570 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-13 07:18:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						20d65aa755 
					 
					
						
						
							
							[Frontend] Multithreaded async multimodal load_bytes ( #22710 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com >
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com > 
						
						
					 
					
						2025-08-13 06:09:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b159c0a67a 
					 
					
						
						
							
							Fix GGUF loader for Qwen3 MoE. ( #22785 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gh0u1L5 <Gh0u1L5@outlook.com > 
						
						
					 
					
						2025-08-13 06:08:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6772bb0f7d 
					 
					
						
						
							
							Remove unnecessary CUDA sync of qwen image and video preprocess ( #22792 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cyy <cyyever@outlook.com >
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-13 06:07:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fceafaf582 
					 
					
						
						
							
							[Bugfix][mamba] Fix type annotation of Mamba2Metadata ( #22787 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-13 06:07:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6b794c756c 
					 
					
						
						
							
							[Nixl][CI] Fix tests ( #22806 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-13 06:03:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98deac3879 
					 
					
						
						
							
							[FEATURE] support custom vllm tuned config path for fused moe triton kernels ( #22791 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com > 
						
						
					 
					
						2025-08-13 20:27:25 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						653124bd46 
					 
					
						
						
							
							[Frontend] Add chunked processing to handle long inputs in embedding models ( #22280 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: x22x22 <wadeking@qq.com >
Signed-off-by: Kdump <rootshellexp@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-13 04:14:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b1bdac6af 
					 
					
						
						
							
							[Platform] Custom ops support for FusedMoe ( #22509 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-08-13 04:12:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d94e3026de 
					 
					
						
						
							
							[V1] Add tree drafting tests for eagle spec decoding ( #22705 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Giancarlo Delfin <gdelfin@meta.com > 
						
						
					 
					
						2025-08-13 04:11:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3f52738dce 
					 
					
						
						
							
							[Doc] Add max_lora_rank configuration guide ( #22782 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chiliu <cliu_whu@yeah.net > 
						
						
					 
					
						2025-08-13 04:10:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a01e0018b5 
					 
					
						
						
							
							[Bugfix] Fix Nemotron VL image processing ( #22739 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp > 
						
						
					 
					
						2025-08-13 03:11:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9e7e5baaa8 
					 
					
						
						
							
							[Model] Add missing prefix to glm4_1v ( #22716 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com > 
						
						
					 
					
						2025-08-13 01:23:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d16aa3dae4 
					 
					
						
						
							
							[Model] Add option to run Step3VisionEncoder in DP ( #22697 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zzh142857 <chaorenzhaozhenghao@gmail.com > 
						
						
					 
					
						2025-08-13 00:09:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6807af8f46 
					 
					
						
						
							
							[gpt-oss] upgrade gpt-oss to v0.0.3 and add version check ( #22768 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-12 21:37:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c558cf62e 
					 
					
						
						
							
							[Perf] Support topk softmax fused kernel for broader num_experts ( #22211 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shixian Cui <shixian@amazon.com >
Co-authored-by: Shixian Cui <shixian@amazon.com > 
						
						
					 
					
						2025-08-12 21:34:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						77a6bf07ae 
					 
					
						
						
							
							[Bug] Fix Unexpected Keyword Argument 'w1_bias' ( #22757 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-12 21:31:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4082338a25 
					 
					
						
						
							
							Remove unneeded ROCm platform import when using CUDA ( #22765 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-12 21:26:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6b928798e 
					 
					
						
						
							
							Force TRTLLM attention for gpt-oss on SM100 ( #22678 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-12 21:22:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1361c7273 
					 
					
						
						
							
							[Bugfix] Fix default enable for CUTLASS MLA on SM100 ( #22738 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-12 21:22:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f0f844b16 
					 
					
						
						
							
							Fix cuda illegal mem access with Llama4 TP8 + rms_norm custom op ( #22701 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Po-Han Huang <pohanh@nvidia.com > 
						
						
					 
					
						2025-08-12 21:21:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c5830381af 
					 
					
						
						
							
							[V0 Deprecation] Remove args for multi-step scheduling ( #22779 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-08-12 20:38:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d31f97cf57 
					 
					
						
						
							
							[Misc] Remove tests/multi_step/__init__.py ( #22778 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-08-12 20:21:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71683ca6f6 
					 
					
						
						
							
							[V0 Deprecation] Remove multi-step scheduling ( #22138 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-08-12 20:18:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e18859298d 
					 
					
						
						
							
							Add hardware plugins to installation doc ( #22732 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-12 17:14:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fde0b611a3 
					 
					
						
						
							
							[Model] Decouple glm4v ( #22751 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-12 17:13:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d0a6301588 
					 
					
						
						
							
							Fix Transformers backend tensor parallel for multimodal models ( #22673 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-12 17:12:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45c3936e94 
					 
					
						
						
							
							[Docs] Hide the navigation and toc sidebars on home page ( #22749 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-12 17:12:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba81acbdc1 
					 
					
						
						
							
							[Bugfix] Bump DeepGEMM Version to Fix SMXX Layout Issues ( #22606 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: frankwang28 <frank.wbb@hotmail.com > 
						
						
					 
					
						2025-08-12 15:43:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53c730286c 
					 
					
						
						
							
							[Misc] parametrize 'dtype' in test_flash_mla ( #22641 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: RUTHLESS-BOT <wujiafeng@cmbchina.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-12 16:31:48 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6534d2fc97 
					 
					
						
						
							
							Fix torch version check for SM100 mxfp4  ( #22535 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zifei Tong <zifeitong@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-12 12:54:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						422f22e012 
					 
					
						
						
							
							[CI][Nixl] Check kv cache layout during handshake ( #22745 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-12 12:53:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6bd8ebf026 
					 
					
						
						
							
							[Kernel][AMD] Avoid D2H copy and cumsum kernel ( #22683 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiaozhu <mxz297@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-12 12:53:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dab4f9f764 
					 
					
						
						
							
							[Chore] Update CODEOWNERS to include @yewentao256 for CUDA kernels, attention backends, quantization, and related tests ( #22741 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-13 00:50:31 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c42fe0b63a 
					 
					
						
						
							
							Add more test scenario for tensor schema ( #22733 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: teekenl <teekenlau@gmail.com > 
						
						
					 
					
						2025-08-12 16:34:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a4b4b3729 
					 
					
						
						
							
							Add: SupportsEagle3 interface for explicit EAGLE3 support ( #22642 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rahul Tuli <rtuli@redhat.com > 
						
						
					 
					
						2025-08-12 09:24:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e5d3d63c42 
					 
					
						
						
							
							[Benchmark] Fix terminal colors in benchmark_serving_multi_turn (python 3.12) ( #22730 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: daniels <daniels@pliops.com > 
						
						
					 
					
						2025-08-12 14:41:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d9d40efde 
					 
					
						
						
							
							[Bugfix][CI] Fix test_remote_decode_lifecycle.py::test_short_prompt_lifecycle ( #22727 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-12 07:30:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67c153b88a 
					 
					
						
						
							
							Fix Llama4 FlashInfer FP4 MoE issues ( #22511 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Po-Han Huang <pohanh@nvidia.com > 
						
						
					 
					
						2025-08-12 05:50:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f7ad6a1eb3 
					 
					
						
						
							
							[CI Failure] fix tests/entrypoints/openai/test_skip_tokenizer.py ( #22708 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-12 05:42:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						80bb1e8afe 
					 
					
						
						
							
							Officially support SmolLM3 using the Transformers backend ( #22665 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-12 05:38:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d030b01548 
					 
					
						
						
							
							[BugFix][Nixl][PD] Fix heterogenous TP ( #22663 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-12 05:37:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						767e63b860 
					 
					
						
						
							
							[Docs] Improve docs navigation ( #22720 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-12 04:25:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						007dd90859 
					 
					
						
						
							
							[gpt-oss] Enable gpt-oss on ampere ( #22714 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-12 03:21:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b8a9d0e429 
					 
					
						
						
							
							[Misc] remove GH discussions link ( #22722 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-12 03:15:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						50f2aae1b4 
					 
					
						
						
							
							[LMCache][Example] Align the PYTHONHASHSEED for prefillers and decoders for KV chunks hashing ( #21161 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zejunchen-zejun <zejun.chen@amd.com > 
						
						
					 
					
						2025-08-12 02:05:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						46ae7f6666 
					 
					
						
						
							
							[Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 ( #21783 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rishi Astra <40644327+RishiAstra@users.noreply.github.com > 
						
						
					 
					
						2025-08-12 02:04:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1ece7f30ba 
					 
					
						
						
							
							Fix: AWQ Marlin get_quant_method does not recognize "modules_to_not_convert" ( #21888 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JunHowie <JunHowie@aliyun.com >
Co-authored-by: JunHowie <JunHowie@aliyun.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-12 02:03:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc8372efc3 
					 
					
						
						
							
							[Bugfix] Fix erroneous randomly generated cases in bad word testing ( #22170 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: phantomlei <phantomlei3@gmail.com > 
						
						
					 
					
						2025-08-12 02:03:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d17fa633e 
					 
					
						
						
							
							[V0] Correct CUDA Graph capture for encoder-decoder models ( #22630 )  
						
						 
						
						
						
						
					 
					
						2025-08-12 02:01:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f909b8996 
					 
					
						
						
							
							[New Model] Support Command-A-Vision ( #22660 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: donglu <donglu@cohere.com > 
						
						
					 
					
						2025-08-12 01:39:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						59f3b93636 
					 
					
						
						
							
							[DOC] update v1_guide with INTEL HW ( #22679 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-08-12 01:22:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78077d5417 
					 
					
						
						
							
							Move SchedulerConfig from config/__init__.py to config/scheduler.py ( #22626 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-12 00:23:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d729c43fb 
					 
					
						
						
							
							[Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. ( #22637 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-08-12 00:23:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f4657952b 
					 
					
						
						
							
							[doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f  ( #22707 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sooraj S <94284954+sooraj-satheesh@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com > 
						
						
					 
					
						2025-08-12 00:21:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a7e3bbdd2 
					 
					
						
						
							
							[Doc] Added unmentioned required option "method" in the usage of EAGLE-3 based models ( #21737 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dilute-l <dilu2333@163.com >
Co-authored-by: Dilute-l <dilu2333@163.com > 
						
						
					 
					
						2025-08-12 00:14:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4fbd8bb597 
					 
					
						
						
							
							Fix passing SpeculativeConfig from the CLI ( #22652 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-11 22:13:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad344ef552 
					 
					
						
						
							
							[gpt-oss] Small bug fixes for frontend ( #22512 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-11 22:04:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bbaf9e9cb1 
					 
					
						
						
							
							[gpt-oss] Fix mxfp4 support ( #22700 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-11 21:22:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4678503476 
					 
					
						
						
							
							Migrate MiniCPMVImageInputs to TensorSchema ( #21939 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-11 20:43:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						93d0652433 
					 
					
						
						
							
							[CI] Increase timeout for test_completion_with_image_embeds ( #22670 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-11 20:31:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ea1292ad3e 
					 
					
						
						
							
							[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py ( #22686 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-11 20:20:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc5e4a653c 
					 
					
						
						
							
							Upgrade FlashInfer to v0.2.11 ( #22613 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-11 19:58:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						839ab00349 
					 
					
						
						
							
							Re-enable Xet on TPU tests now that hf_xet has been updated ( #22666 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-11 19:54:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9b94d6ec8f 
					 
					
						
						
							
							Enable 4bit bnb prequant MOE ( #21548 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-11 19:02:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1891a265d3 
					 
					
						
						
							
							[gpt-oss] Add test for response API + harmony (but skipped) ( #22554 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-11 17:47:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95a935fc48 
					 
					
						
						
							
							[gpt-oss] Support streaming in response API ( #22431 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-11 17:46:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						458e74eb90 
					 
					
						
						
							
							Support more parallel styles in Transformers backend TP ( #22651 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-11 10:42:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65abe111a3 
					 
					
						
						
							
							[CI] Skip Tree Attn Test in test_max_len.py to unblock CI ( #22664 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-08-11 10:36:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						807d21b80d 
					 
					
						
						
							
							[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI ( #22611 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-08-11 10:31:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c90fb03df5 
					 
					
						
						
							
							[CI/Build] Skip Mllama HF runner tests with Transformers v4.55.0 ( #22659 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-11 10:00:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						84cf78acee 
					 
					
						
						
							
							[Model] Pooling models default to using chunked prefill & prefix caching if supported. ( #20930 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-11 09:41:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						16fb668b61 
					 
					
						
						
							
							fix: NIXL connector transfers partial block to pass full multi-modal context ( #21074 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: GuanLuo <gluo@nvidia.com > 
						
						
					 
					
						2025-08-11 09:40:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f7dcce7a4a 
					 
					
						
						
							
							[Feature] Add VLLM_USE_DEEP_GEMM_E8M0 Env to Control E8M0 Scale ( #21968 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-11 09:39:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8e13d9fe6d 
					 
					
						
						
							
							[Misc] Further clean up some redundant config definitions ( #22649 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-11 09:22:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3fa5b25845 
					 
					
						
						
							
							Document aarch64 CPU support works ( #22646 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-08-11 07:22:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14a5d903ab 
					 
					
						
						
							
							[Model] NemotronH Support  ( #22349 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com > 
						
						
					 
					
						2025-08-11 04:09:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						951b038298 
					 
					
						
						
							
							[Misc] Move jsontree to utils ( #22622 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-11 03:49:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ebf7605b0d 
					 
					
						
						
							
							[Misc] Move tensor schema tests ( #22612 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-11 00:15:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc1d02ac85 
					 
					
						
						
							
							[Docs] Add comprehensive CLI reference for all large vllm subcommands ( #22601 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-11 00:13:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e55dfa7e5 
					 
					
						
						
							
							[BUGFIX] KeyError 'layers.14.mlp.gate.g_idx' for Qwen3-MoE with GPTQ on ROCm ( #22017 )  
						
						 
						
						
						
						
					 
					
						2025-08-11 00:13:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						384a052971 
					 
					
						
						
							
							[Misc] benchmark_moe supports expert parallel ( #22251 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-11 00:13:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						39052dbca8 
					 
					
						
						
							
							Support token_type_ids in V1 with less code changes ( #21985 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-08-10 22:54:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9c97a1c349 
					 
					
						
						
							
							[ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. ( #22521 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-08-10 22:52:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f919d4cb8f 
					 
					
						
						
							
							[BugFix] Fix logits repetition penalty cuda check ( #22592 )  
						
						 
						
						
						
						
					 
					
						2025-08-10 22:52:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						afa5b7ca0b 
					 
					
						
						
							
							[Misc][gpt-oss] guard import when triton kernel when not up to date  ( #22584 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhewenli <zhewenli@meta.com > 
						
						
					 
					
						2025-08-10 21:29:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b99028069 
					 
					
						
						
							
							[Misc][gpt-oss] Add rules to label gpt-oss related PRs ( #22600 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lifan Shen <lifans@meta.com > 
						
						
					 
					
						2025-08-10 19:49:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5898b135ab 
					 
					
						
						
							
							[BugFix] Fix KVConnectorOutput TPU breakage ( #22598 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-10 19:33:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b799f4b9ea 
					 
					
						
						
							
							[CI/Build] Fix tensorizer test for load_format change ( #22583 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-08-10 19:30:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						06da44f0cb 
					 
					
						
						
							
							Migrate LlavaImageInputs to TensorSchema ( #21770 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-10 19:29:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a554991748 
					 
					
						
						
							
							Migrate LlavaNextVideoPixelInputs to TensorSchema ( #21843 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-10 19:29:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d1af8b7be9 
					 
					
						
						
							
							enable Docker-aware precompiled wheel setup ( #22106 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dougbtv <dosmith@redhat.com > 
						
						
					 
					
						2025-08-10 16:29:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68b254d673 
					 
					
						
						
							
							Fix TensorSchema validation test for symbolic dims ( #22366 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-08-10 17:16:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c50d62f5a 
					 
					
						
						
							
							Remove redundant row_indices unsqueeze operation in MiniCPMO ( #22528 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-08-10 09:20:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4e2916721 
					 
					
						
						
							
							Migrate LlavaNextImageInputs to TensorSchema ( #21774 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-10 09:05:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65a7917be4 
					 
					
						
						
							
							Fix(benchmarks): allow multiple mm contents in OpenAI Chat Completion Benchmarks ( #22534 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: breno.skuk <breno.skuk@hcompany.ai > 
						
						
					 
					
						2025-08-10 09:03:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b76753f0b5 
					 
					
						
						
							
							[Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel ( #22593 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-10 09:00:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b81fe83b2c 
					 
					
						
						
							
							[doc] add alibaba cloud as sponsor ( #22597 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-10 23:13:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0757551c96 
					 
					
						
						
							
							[doc] add beijing meetup links ( #22596 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-10 22:51:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8290d15d2c 
					 
					
						
						
							
							Move CacheConfig from config/__init__.py to config/cache.py ( #22586 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-10 07:36:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						049c245143 
					 
					
						
						
							
							[Misc] Replace flaky image urls in pixtral test ( #22574 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-10 06:18:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						00976db0c3 
					 
					
						
						
							
							[Docs] Fix warnings in docs build ( #22588 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-10 05:49:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d411df0296 
					 
					
						
						
							
							[Misc] Further refine type annotations in parallel state ( #22499 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-10 05:49:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						010e0e39ea 
					 
					
						
						
							
							[Doc] Fix API doc link in side navigation ( #22585 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-08-10 01:35:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						326976291b 
					 
					
						
						
							
							[Misc] code clean duplicate set_current_vllm_config in _set_vllm_config ( #22566 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-10 00:08:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e8d685775 
					 
					
						
						
							
							[Minor] Fix pre-commit error on main ( #22579 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-10 00:08:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c49848396d 
					 
					
						
						
							
							Refactor sliding window configuration to Transformers best practice ( #21927 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-09 20:50:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2a84fb422f 
					 
					
						
						
							
							[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block ( #22394 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@gmail.com >
Co-authored-by: Chengji Yao <chengjiyao@gmail.com > 
						
						
					 
					
						2025-08-09 20:49:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						534c45b962 
					 
					
						
						
							
							Improve fast_topk function with type hints and documentation ( #22530 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-08-09 20:25:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d7363e61c 
					 
					
						
						
							
							[Config] add "qwen" as a native eagle3 target supported model ( #22333 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lechen <lecself@163.com >
Signed-off-by: LeChen <lecself@163.com > 
						
						
					 
					
						2025-08-09 20:21:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c5254b82a 
					 
					
						
						
							
							[oss] Init gpt-oss bf16 support ( #22508 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-09 20:19:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61f67d8acd 
					 
					
						
						
							
							[V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers ( #21401 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-09 20:16:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						42172ad18f 
					 
					
						
						
							
							[FEAT] [Performance] Add triton mrope to replace the torch code path ( #22375 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-08-09 11:50:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fbd8595c5c 
					 
					
						
						
							
							[Bugfix] Fix basic models tests hanging due to mm processor creation ( #22571 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-09 11:42:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a16fa614c 
					 
					
						
						
							
							[Model] Gemma3n MM ( #20495 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ShriKode <shrikode@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: ShriKode <shrikode@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-09 09:56:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2d18256e47 
					 
					
						
						
							
							Move ParallelConfig from config/__init__.py to config/parallel.py ( #22565 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-09 08:33:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56186474f6 
					 
					
						
						
							
							[Docs] Reduce noise in docs and --help from the JSON tip ( #22567 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-09 08:31:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1bf5e1f25b 
					 
					
						
						
							
							[CI] [Hybrid] Speed up hybrid models test by removing large models  ( #22563 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-09 02:04:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6022e6fbc 
					 
					
						
						
							
							GLM-4.5V with new class name at transformers ( #22520 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-09 00:50:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2be07a0db1 
					 
					
						
						
							
							Update docs for Minimax-Text support ( #22562 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-09 00:18:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0edc0cd52b 
					 
					
						
						
							
							[Bugfix] Fix CI moe kernel failure ( #22556 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-09 00:03:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7920e9b1c5 
					 
					
						
						
							
							[Bugfix] Fix failing GPT-OSS initialization test ( #22557 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-09 00:03:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b7c0942b65 
					 
					
						
						
							
							[ROCm][Misc] Rename the context_len to seq_len in ROCm custom paged attention kernel ( #22097 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-08-08 23:15:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a0c5ded5a 
					 
					
						
						
							
							[TPU] Add support for online w8a8 quantization ( #22425 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com > 
						
						
					 
					
						2025-08-08 23:12:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10a02535d4 
					 
					
						
						
							
							Fix loading of quantized BigCode models ( #22463 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eldar Kurtic <eldar@neuralmagic.com > 
						
						
					 
					
						2025-08-08 23:12:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65552b476b 
					 
					
						
						
							
							[Misc] Use config definitions from Transformers library ( #21913 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-08 23:10:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ad7adb67f 
					 
					
						
						
							
							v1: Pass KVConnectorOutput to scheduler-side ( #22157 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-08-08 23:09:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ade99eafa 
					 
					
						
						
							
							[V1] [Hybrid] Support Minimax-Text-01 in V1  ( #22151 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-08 23:08:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3157aebb63 
					 
					
						
						
							
							[Log] Add Warning for Deprecation of DeepGEMM old version ( #22194 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-08 23:07:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a0ffd6285 
					 
					
						
						
							
							Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation ( #22541 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-08 23:05:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23472ff51c 
					 
					
						
						
							
							[Doc] Add usage of implicit text-only mode  ( #22561 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Flora Feng <4florafeng@gmail.com > 
						
						
					 
					
						2025-08-08 23:04:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08b751ba74 
					 
					
						
						
							
							Implicit language-model-only mode via limit-mm-per-prompt ( #22299 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Signed-off-by: Andy Xie <andy.xning@gmail.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com >
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: XIn Li <xinli@nvidia.com >
Signed-off-by: Junhao Li <junhao@ubicloud.com >
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Signed-off-by: Linkun <github@lkchen.net >
Co-authored-by: Ning Xie <andy.xning@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: Zhiyu <zhiyuc@nvidia.com >
Co-authored-by: Shu Wang <shuw@nvidia.com >
Co-authored-by: XIn Li <xinli@nvidia.com >
Co-authored-by: Junhao Li <streaver91@gmail.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Yuxuan Zhang <2448370773@qq.com >
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Po-Han Huang (NVIDIA) <53919306+nvpohanh@users.noreply.github.com >
Co-authored-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Hong Hanh <hanh.usth@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: lkchen <github@lkchen.net > 
						
						
					 
					
						2025-08-08 22:21:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						429e4e2d42 
					 
					
						
						
							
							[Bugfix] Fix ModernBert cuda graph capturing in v1 ( #21901 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-08 22:17:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35afe1b30b 
					 
					
						
						
							
							[BugFix] [P/D] Handle lookahead token count edge-case with Eagle Spec Decoding and P/D ( #22317 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com >
Signed-off-by: Pradyun92 <142861237+Pradyun92@users.noreply.github.com >
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com > 
						
						
					 
					
						2025-08-08 17:04:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						81c57f60a2 
					 
					
						
						
							
							[XPU] upgrade torch 2.8 on for XPU ( #22300 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-08-08 17:03:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						311d875614 
					 
					
						
						
							
							Drop flaky test_healthcheck_response_time ( #22539 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-08-08 16:56:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e3edc0a7a8 
					 
					
						
						
							
							Extract CompilationConfig from config.py ( #22524 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-08 16:34:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						baece8c3d2 
					 
					
						
						
							
							[Frontend] Add unix domain socket support ( #18097 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <yyweiss@gmail.com >
Signed-off-by: yyw <yyweiss@gmail.com > 
						
						
					 
					
						2025-08-08 16:23:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2fcf6b27b6 
					 
					
						
						
							
							[Docs] fix broken links in metrics.md ( #22315 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guy Stone <guys@spotify.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-08 16:22:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41b9655751 
					 
					
						
						
							
							Skip Qwen 1 in CI because remote code is no longer compatible with Transformers ( #22536 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-08 16:20:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd875d2eb7 
					 
					
						
						
							
							[Bugfix] Update FA commit hash ( #22546 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-08 16:10:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f703b923f3 
					 
					
						
						
							
							[Misc] DeepGEMM : Avoid JIT generation in the hot-path ( #22215 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-08-08 16:09:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd9b9de1fb 
					 
					
						
						
							
							[BugFix] Fix IMA FlashMLA full cuda-graph and DP + Update FlashMLA ( #21691 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com > 
						
						
					 
					
						2025-08-08 16:09:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe6d8257a1 
					 
					
						
						
							
							[gpt-oss] Support tool call and implement MCP tool server ( #22427 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-08 15:06:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e290594072 
					 
					
						
						
							
							[Docs] Rename “Distributed inference and serving” to “Parallelism & Scaling” ( #22466 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-08-08 19:26:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f756a682d9 
					 
					
						
						
							
							[gpt-oss] guard import when triton kernel is not installed ( #22529 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-08 11:18:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f0964e29cb 
					 
					
						
						
							
							[Benchmark] Add benchmark tool for multi turn conversations ( #20267 )  
						
						 
						
						
						
						
					 
					
						2025-08-08 10:28:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e789cad6b8 
					 
					
						
						
							
							[gpt-oss] triton kernel mxfp4 ( #22421 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <zyy1102000@gmail.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-08 08:24:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e5ebeeba53 
					 
					
						
						
							
							Remove exception for Python 3.8 typing from linter ( #22506 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-08 03:06:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7be7f3824a 
					 
					
						
						
							
							[Docs] Improve API docs (+small tweaks) ( #22459 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-08 03:02:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ccdae737a0 
					 
					
						
						
							
							[BugFix] Don't cancel asyncio tasks directly from destructors ( #22476 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-08 01:13:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						904063907c 
					 
					
						
						
							
							[Misc] fix openai version ( #22485 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-08-08 01:12:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						43c4f3d77c 
					 
					
						
						
							
							[Misc] Begin deprecation of get_tensor_model_*_group ( #22494 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-08 01:11:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1712543df6 
					 
					
						
						
							
							[CI/Build] Fix multimodal tests ( #22491 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-08 00:31:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						808a7b69df 
					 
					
						
						
							
							[bench] Fix benchmark/serve.py to ignore unavailable results ( #22382 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Linkun <github@lkchen.net > 
						
						
					 
					
						2025-08-07 23:15:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						099c046463 
					 
					
						
						
							
							[Doc] Sleep mode documentation ( #22310 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Hong Hanh <hanh.usth@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-08-08 12:25:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af473f0a85 
					 
					
						
						
							
							[bugfix] Fix Llama3/4 issues caused by FlashInfer 0.2.10 ( #22426 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Po-Han Huang <pohanh@nvidia.com > 
						
						
					 
					
						2025-08-07 20:25:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						157f9c1368 
					 
					
						
						
							
							Fix pre-commit ( #22487 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-07 20:21:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6f287915d8 
					 
					
						
						
							
							Optimize MiniCPMO mask creation with vectorized implementation ( #22464 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-08-07 20:18:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c152e2a8a0 
					 
					
						
						
							
							not tie_word_embeddings for glm-4.5 and glm-4.5v ( #22460 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com > 
						
						
					 
					
						2025-08-07 19:37:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						17eaaef595 
					 
					
						
						
							
							[Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match ( #22065 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-08-07 19:20:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3303f134e0 
					 
					
						
						
							
							[Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) ( #22131 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Junhao Li <junhao@ubicloud.com > 
						
						
					 
					
						2025-08-07 19:18:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2c8ce57c6 
					 
					
						
						
							
							Fix Flashinfer CUTLASS MOE Allgather ( #21963 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shu Wang <shuw@nvidia.com > 
						
						
					 
					
						2025-08-07 19:18:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3b9c17b56 
					 
					
						
						
							
							Support Tensorrt-LLM MoE fp4 for low-latency ( #21331 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: XIn Li <xinli@nvidia.com >
Co-authored-by: XIn Li <xinli@nvidia.com > 
						
						
					 
					
						2025-08-07 19:18:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d57dc2364e 
					 
					
						
						
							
							Add ModelOpt Qwen3 nvfp4 support ( #20101 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com > 
						
						
					 
					
						2025-08-07 19:18:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e2c8f1edec 
					 
					
						
						
							
							[PERF] Use pybase64 to more quickly decode prompt embeddings ( #22469 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Sansom <andrew@protopia.ai > 
						
						
					 
					
						2025-08-07 19:15:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1ee5ead5f8 
					 
					
						
						
							
							[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine ( #21496 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-08-07 19:13:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						acf8aeb79e 
					 
					
						
						
							
							[Misc] normalize multiprocessing Queue usage ( #22371 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-08 01:57:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e3a8dc906 
					 
					
						
						
							
							Remove from_dict from SpeculativeConfig ( #22451 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-07 10:13:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						139d155781 
					 
					
						
						
							
							[Frontend] Use engine argument to control MM cache size ( #22441 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-07 09:47:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c9da6be22 
					 
					
						
						
							
							[Core] Simplify mm processing cache ( #22457 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-07 09:47:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						399d2a10e2 
					 
					
						
						
							
							Fix pre-commit error in main ( #22462 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-07 08:54:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4815b00f54 
					 
					
						
						
							
							[gpt-oss] Generate ResponseOutputItem from Harmony Message ( #22410 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-07 08:33:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4da8bf20d0 
					 
					
						
						
							
							[Tool] Fix auto tool call ( #22434 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-07 07:03:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e0b121812 
					 
					
						
						
							
							[Bugfix] Add missing packed_modules_mapping to DeepseekV2ForCausalLM ( #22352 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Felix Marty <Felix.Marty@amd.com > 
						
						
					 
					
						2025-08-07 06:30:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						766bc8162c 
					 
					
						
						
							
							[Core] Store only the keys for multi-modal data in P0 ( #22198 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-07 01:45:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						289b18e670 
					 
					
						
						
							
							[Docs] Update features/disagg_prefill, add v1 examples and development ( #22165 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Chen <530634352@qq.com > 
						
						
					 
					
						2025-08-07 00:59:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35171b1172 
					 
					
						
						
							
							[Doc] update docs for nightly benchmarks ( #12022 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Chan <andrewkchan.akc@gmail.com > 
						
						
					 
					
						2025-08-07 00:29:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a2c6696bfe 
					 
					
						
						
							
							[Docs] Factor out troubleshooting to its own guide; add section for Ray Observability ( #21578 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-08-07 00:29:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e8398805e 
					 
					
						
						
							
							[Doc] Fix link to prefix caching design ( #22384 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-08-07 00:28:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						136825de75 
					 
					
						
						
							
							[Misc] Enhance code formatting in mxfp4.py  ( #22423 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-07 00:26:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c2dba2dba8 
					 
					
						
						
							
							Add H20-3e fused MoE kernel tuning configs for GLM-4.5 ( #22433 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com >
Co-authored-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com > 
						
						
					 
					
						2025-08-07 00:24:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						434d2f3f7a 
					 
					
						
						
							
							[Docs] Add missing dependency for docs build ( #22435 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-07 00:22:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8e8e0b6af1 
					 
					
						
						
							
							feat: Add --enable-log-outputs flag for logging model generations ( #20707 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Adrian Garcia <adrian.garcia@inceptionai.ai > 
						
						
					 
					
						2025-08-06 23:10:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82216dc21f 
					 
					
						
						
							
							[Misc] Support routing logic simulation ( #21990 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-08-06 23:06:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						370661856b 
					 
					
						
						
							
							[Frontend] Update OpenAI error response to upstream format ( #22099 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com > 
						
						
					 
					
						2025-08-06 23:06:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cbc8457b26 
					 
					
						
						
							
							[Model] Switch to Fused RMS norm in Qwen2.5_VL model. ( #22184 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kf <kuanfu.liu@embeddedllm.com >
Signed-off-by: tjtanaavllm <tunjian.tan@amd.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: kf <kuanfu.liu@embeddedllm.com > 
						
						
					 
					
						2025-08-06 23:05:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d4297e8fe 
					 
					
						
						
							
							[Bench] Split serve.py:main into async/async versions ( #22405 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Linkun <github@lkchen.net > 
						
						
					 
					
						2025-08-06 23:05:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2a4c825523 
					 
					
						
						
							
							[CI] Skip the pooling models that do not support transformers v4.55 ( #22411 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-06 23:05:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4be02a3776 
					 
					
						
						
							
							[Bugfix] EPLB load statistics problem ( #22167 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ycyaw66 <497410282@qq.com >
Signed-off-by: David Chen <530634352@qq.com >
Co-authored-by: ycyaw66 <497410282@qq.com > 
						
						
					 
					
						2025-08-07 04:07:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f6278b6243 
					 
					
						
						
							
							[gpt-oss] Convert user input to harmony format ( #22402 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-06 20:56:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad6c655dde 
					 
					
						
						
							
							preload heavy modules when mp method is forkserver ( #22214 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lionel Villard <villard@us.ibm.com > 
						
						
					 
					
						2025-08-06 20:33:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14bcf93a6a 
					 
					
						
						
							
							Optimize logger init performance by using module-level constants ( #22373 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-08-06 20:32:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ecbea55ca2 
					 
					
						
						
							
							Update hf_xet pin to resolve hangs ( #22356 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-06 20:31:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						609b533cb6 
					 
					
						
						
							
							[Bugfix] Add proper comparison for package versions ( #22314 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Syed Muhammad Bin Asif <syedmba7@connect.hku.hk > 
						
						
					 
					
						2025-08-06 20:31:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e9455ae8f 
					 
					
						
						
							
							[Bugfix]: Fix the streaming output for function calls in the minimax ( #22015 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: QscQ <qscqesze@gmail.com >
Signed-off-by: qingjun <qingjun@minimaxi.com > 
						
						
					 
					
						2025-08-06 20:30:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a00d8b236f 
					 
					
						
						
							
							Use float32 for test_completion.py ( #22385 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-07 11:07:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04cf435d95 
					 
					
						
						
							
							[Bugfix] Fix wrong method name in Intern-S1 image processor ( #22417 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-06 20:05:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7377131a2c 
					 
					
						
						
							
							[Qwen3] Enable dual-chunk-attention support for Qwen3 models. ( #21924 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-08-06 19:58:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6b47ef24de 
					 
					
						
						
							
							[XPU]Fix flash_attn_varlen_func interface on xpu ( #22350 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-08-06 19:28:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1dc8a70b6d 
					 
					
						
						
							
							[Attention] Support multiple attention metadata builders per kv_cache_spec  + proper local attention no hybrid kv cache fix ( #21588 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-08-06 18:40:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f825c6bd22 
					 
					
						
						
							
							Support encoder_only attention for FlexAttention ( #22273 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-08-06 18:37:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41b67f4263 
					 
					
						
						
							
							[model] Support MiniCPM-V 4.0 ( #22166 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: imning3 <hbning@pku.edu.cn > 
						
						
					 
					
						2025-08-06 18:35:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e8961e963a 
					 
					
						
						
							
							Update flashinfer-python==0.2.10 ( #22389 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-06 18:10:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a3835aaa9 
					 
					
						
						
							
							Fix trtllm-gen attention env and add attention sink ( #22378 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Lain <fusiyuan2000@hotmail.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-06 18:07:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c7cc33f4d 
					 
					
						
						
							
							[gpt-oss] fix model config with hf_config ( #22401 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-06 18:04:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						19c9365aa4 
					 
					
						
						
							
							[gpt-oss] add demo tool server ( #22393 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-08-06 17:47:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eec890c1c1 
					 
					
						
						
							
							[Bug] Fix B200 DeepGEMM E8M0 Accuracy Issue ( #22399 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-06 17:03:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						46a13949d5 
					 
					
						
						
							
							[v1] - Mamba1 Attention Metadata ( #21249 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: asafg <asafg@ai21.com >
Co-authored-by: asafg <asafg@ai21.com > 
						
						
					 
					
						2025-08-06 17:03:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31f09c615f 
					 
					
						
						
							
							[gpt-oss] flashinfer mxfp4 ( #22339 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <xmo@berkeley.edu >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu > 
						
						
					 
					
						2025-08-06 12:37:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31f5dc5b2a 
					 
					
						
						
							
							[gpt-oss] Enhance error msg on attention sink init ( #22335 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <xmo@berkeley.edu >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu > 
						
						
					 
					
						2025-08-06 11:41:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec7cb19224 
					 
					
						
						
							
							[gpt-oss] Add loop for built-in tool call ( #22374 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-06 10:32:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2435ea7ed5 
					 
					
						
						
							
							[Bugfix] Make condition in triton kernel constexpr ( #22370 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-08-06 10:00:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a6b72c2ab 
					 
					
						
						
							
							[BugFix] Fix triton compile error in kernel_unified_attention_2/3d caused by attention sinks ( #22368 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-08-06 09:47:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4b9813b5e 
					 
					
						
						
							
							add the codes to check AMD Instinct GPU number ( #22367 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhang Jason <ning.zhang2@amd.com > 
						
						
					 
					
						2025-08-06 08:58:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2cb6ef8996 
					 
					
						
						
							
							[BugFix] Fix FA2 RuntimeError when sinks is provided ( #22365 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-08-06 08:03:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9edd1db02b 
					 
					
						
						
							
							[Minor] Fix type  ( #22347 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-06 02:22:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f263a4b53f 
					 
					
						
						
							
							[gpt-oss] Support chat completion api ( #22342 )  
						
						 
						
						
						
						
					 
					
						2025-08-06 01:57:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54991c548a 
					 
					
						
						
							
							[gpt-oss] add model to supported models doc ( #22336 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-06 01:49:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						178d03fbd6 
					 
					
						
						
							
							[gpt-oss] Add Tool/ConversationContext classes and harmony_utils ( #22340 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-06 01:08:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa00c5d75b 
					 
					
						
						
							
							[Misc] Clean up duplicated hf overrides ( #22311 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-06 07:50:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						134a8ee8fd 
					 
					
						
						
							
							[gpt-oss] Add openai-harmony as default dependency ( #22332 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-06 00:10:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90ec006937 
					 
					
						
						
							
							[gpt-oss] flashinfer attention sink init ( #22330 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com > 
						
						
					 
					
						2025-08-05 23:48:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a47e6ffe93 
					 
					
						
						
							
							[GptOss] Add GptOss reasoning parser to support structure output ( #22322 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-05 23:39:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98a3a81024 
					 
					
						
						
							
							[ROCm] Add attention sink to use_rocm_custom_paged_attention ( #22329 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-05 23:30:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de98252f49 
					 
					
						
						
							
							Add GPT-OSS model code and config [1/N] ( #22327 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-05 23:26:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						796bae07c5 
					 
					
						
						
							
							Update transformers to v4.55 ( #21931 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-05 22:56:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e20924350 
					 
					
						
						
							
							Add attention sink in attention backends ( #22320 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
Co-authored-by: Minseok Lee <47620120+minseokl@users.noreply.github.com >
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com > 
						
						
					 
					
						2025-08-05 22:37:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd16bdc798 
					 
					
						
						
							
							Increase openai-python version ( #22316 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-05 21:43:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e3c876dca3 
					 
					
						
						
							
							Upgrade FA3 for attention sink ( #22313 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-05 21:36:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5d5d419ca6 
					 
					
						
						
							
							[Bugfix][CI/Build][ROCm] Make sure to use the headers from the build folder on ROCm ( #22264 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-08-05 20:39:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						302962e806 
					 
					
						
						
							
							[Bugfix] Skip dead and non-GPU nodes for Ray DP engine allocation ( #22275 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-08-05 20:35:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e6544c797 
					 
					
						
						
							
							[Perf] Parallelize fill_bitmask to accelerate high-throughput guided decoding ( #21862 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-08-05 19:57:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8e6c7e873f 
					 
					
						
						
							
							[Bugfix] Fix MoE BNB version ( #22260 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-05 19:56:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a51530437 
					 
					
						
						
							
							[Bugfix] Fix 3D input passed into cutlass_scaled_mm ( #22278 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-06 10:35:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35509fc5be 
					 
					
						
						
							
							[Bugfix] Remove faulty test for oot attention backend ( #22286 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-06 00:05:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4b29d2784b 
					 
					
						
						
							
							[CI][TPU] Fix docker clean up ( #22271 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-08-05 23:54:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						59a0b8554b 
					 
					
						
						
							
							[bugfix] fix blackwell deepep installation ( #22255 )  
						
						 
						
						
						
						
					 
					
						2025-08-06 01:26:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						469b3ffaaa 
					 
					
						
						
							
							[V1] port xformers backend to v1 ( #21342 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Giancarlo Delfin <gdelfin@meta.com > 
						
						
					 
					
						2025-08-05 10:04:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae87ddd040 
					 
					
						
						
							
							[Refactor] Remove Unused Environment Variable VLLM_NO_DEPRECATION_WARNING ( #22199 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-05 09:40:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a7cb6101ca 
					 
					
						
						
							
							[CI/Build] Update flashinfer to 0.2.9 ( #22233 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-05 09:39:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c494f96fbc 
					 
					
						
						
							
							Use UV_LINK_MODE=copy in Dockerfile to avoid hardlink fail ( #22128 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-05 06:57:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c275ad5ad 
					 
					
						
						
							
							[V0 Deprecation][TPU] Remove V1 flag check from tests ( #22248 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-05 06:53:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						74333ae2f6 
					 
					
						
						
							
							[Misc] correct static type check for GroupCoordinator ( #21946 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-05 03:17:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83156c7b89 
					 
					
						
						
							
							[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel ( #22095 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-08-05 02:45:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4771df7b2b 
					 
					
						
						
							
							[Feature] Non-contiguous Support for FP8 Quantization ( #21961 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-05 02:36:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05fae02175 
					 
					
						
						
							
							Migrate KimiVLImagePixelInputs to TensorSchema ( #21769 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-05 02:36:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d1bf1b9711 
					 
					
						
						
							
							[Docs][TPU] Highlight TPU Software version selection ( #22242 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-05 02:33:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						586f286789 
					 
					
						
						
							
							[Model] Pooling model activation supports per request control by PoolingParams ( #20538 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-08-05 00:37:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						811ac13d03 
					 
					
						
						
							
							[Core] Factor out common logic for MM budget calculation ( #22228 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-04 23:54:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e79a12fc3a 
					 
					
						
						
							
							[UX] Fail if an invalid attention backend is specified ( #22217 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-08-04 23:54:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cdfd6871a5 
					 
					
						
						
							
							[Bugfix] Misaligned params in TreeAttentionImpl ( #22226 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-04 22:40:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4b3e4474d7 
					 
					
						
						
							
							Optimize configuration access with LRU cache in custom ops ( #22204 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-08-04 21:43:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd3db7f469 
					 
					
						
						
							
							[Misc] log more detailed message for ensure_model_parallel_initialized ( #22144 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-04 19:36:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29b97c0995 
					 
					
						
						
							
							[Doc] add backend to doc string of initialize_model_parallel ( #22142 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-04 19:36:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b455cf1c0 
					 
					
						
						
							
							[Misc] Remove pass_config from CompilationConfig dump_json excluded ( #21911 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-08-04 19:17:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a6e108e76 
					 
					
						
						
							
							fix: kimi_k2 return empty tool call list ( #22149 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tlipoca9 <tlipoca9@gmail.com > 
						
						
					 
					
						2025-08-04 19:15:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d7b28f3415 
					 
					
						
						
							
							[Log] DeepGEMM Update Log for Unaligned Problem Size ( #22208 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-04 19:13:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6fa41e0c32 
					 
					
						
						
							
							self.gate dtype update for GLM-4.5 ( #22203 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com > 
						
						
					 
					
						2025-08-04 19:12:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						031ca762d7 
					 
					
						
						
							
							[ROCm][Bugfix] Compilation passes fix ( #22202 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-08-04 19:12:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ad6b8e115 
					 
					
						
						
							
							[FEAT] Refactor ROPE into module ( #22192 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-08-04 19:12:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4f4e7ef27 
					 
					
						
						
							
							[V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) ( #21785 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Linkun Chen <github@lkchen.net > 
						
						
					 
					
						2025-08-04 19:11:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5ea71ff46f 
					 
					
						
						
							
							[V1] reduce block size for tree attention correctness test to fix 'ou… ( #22207 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Giancarlo Delfin <gdelfin@meta.com > 
						
						
					 
					
						2025-08-04 19:11:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7175817637 
					 
					
						
						
							
							Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." ( #22223 )  
						
						 
						
						
						
						
					 
					
						2025-08-04 18:37:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2dffac464c 
					 
					
						
						
							
							[Bugfix] V1 Fix the cursor leakage issue during request scheduling. ( #21173 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: CLFutureX <775523362@qq.com > 
						
						
					 
					
						2025-08-04 18:34:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bdcb42e45d 
					 
					
						
						
							
							[NVIDIA] Auto detect modelopt quant and fix DSR1-FP4 weight loading ( #22073 )  
						
						 
						
						
						
						
					 
					
						2025-08-04 21:02:55 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c09efff976 
					 
					
						
						
							
							[Bugfix][V1][P/D]Fix the uneven polling issue in the toy proxy for P2pNcclConnector ( #21819 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com > 
						
						
					 
					
						2025-08-04 20:17:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						309c1bb822 
					 
					
						
						
							
							[Bug] Update auto_tune.sh to separate benchmarking and profiling. ( #21629 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eric Hanley <ericehanley@google.com > 
						
						
					 
					
						2025-08-04 15:12:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9af654cc38 
					 
					
						
						
							
							[Responses API] Ignore store=True and process the request by default ( #22185 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-04 05:12:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a5fff3bd49 
					 
					
						
						
							
							Fix Arcee model weight loading: Add custom load_weights ( #21725 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: alyosha-swamy <raghav@arcee.ai > 
						
						
					 
					
						2025-08-04 04:09:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1539ced93a 
					 
					
						
						
							
							[Doc] Update pooling model docs ( #22186 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-04 03:37:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54de71d0df 
					 
					
						
						
							
							[Sampler] Support returning all logprobs or logits ( #21792 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-08-04 03:04:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fed5849d3f 
					 
					
						
						
							
							[Bugfix] Fix failing GGUF models test ( #22174 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-04 01:27:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c1b4eb048a 
					 
					
						
						
							
							[feat] move WEIGHT_SCALE_SUPPORTED into raise block to accelerate RLHF weight loading ( #21164 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: huangweixiao <huangweixiao@msh.team > 
						
						
					 
					
						2025-08-04 15:43:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a7b8788d2c 
					 
					
						
						
							
							[Misc] Modify the organization of GLM series  ( #22171 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-03 23:51:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ecb3e9e93 
					 
					
						
						
							
							[CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes ( #22163 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-08-03 22:19:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e5949e5ae0 
					 
					
						
						
							
							Remove index_put from MM embeddings merging ( #22105 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Chenxi Yang <cxyang@meta.com > 
						
						
					 
					
						2025-08-03 22:15:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						49bcd893e7 
					 
					
						
						
							
							[refactor] improve ConstantList exception specificity ( #22156 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-08-03 22:14:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa7012eb6d 
					 
					
						
						
							
							Add tree attention backend for v1 (part 1) ( #20401 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Giancarlo Delfin <gdelfin@meta.com > 
						
						
					 
					
						2025-08-03 22:13:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c2e75b3c11 
					 
					
						
						
							
							remove duplicate code within cleanup_dist_env_and_memory ( #22147 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-03 20:03:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d7db16a92 
					 
					
						
						
							
							[PD] add test for chat completions endpoint ( #21925 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Abirdcfly <fp544037857@gmail.com > 
						
						
					 
					
						2025-08-03 19:57:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						845420ac2c 
					 
					
						
						
							
							[RLHF] Fix torch.dtype not serializable in example ( #22158 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-08-04 02:43:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e27d25a0dc 
					 
					
						
						
							
							[fix] fix correct assertion syntax error in attention utils. ( #22154 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-08-03 19:24:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6f5478298d 
					 
					
						
						
							
							Use aiohttp connection pool for benchmarking ( #21981 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-08-03 19:23:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a39ba85fe 
					 
					
						
						
							
							[Bugfix] Fix failing multimodal standard test ( #22153 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-08-03 19:04:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3c18c9cb0 
					 
					
						
						
							
							fuse fp32 for GLM-4.5 e_score_correction_bias ( #22143 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com > 
						
						
					 
					
						2025-08-03 09:04:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83f7bbb318 
					 
					
						
						
							
							Add chat doc in quick start ( #21213 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-08-03 07:47:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b5dfb94fa0 
					 
					
						
						
							
							[CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation ( #22145 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-08-03 05:34:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d98843b31 
					 
					
						
						
							
							[Responses API] Disable response store by default ( #22137 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-03 04:04:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aefeea0fde 
					 
					
						
						
							
							[V1] [P/D] Refactor KV Connector Path ( #21980 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com > 
						
						
					 
					
						2025-08-03 04:03:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24d1dffbeb 
					 
					
						
						
							
							[executor] feat: add supports_pp attr to executors ( #21786 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Haibin Lin <haibin.lin@bytedance.com > 
						
						
					 
					
						2025-08-03 18:04:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7de45db9a5 
					 
					
						
						
							
							[Misc] update doc comment for send ( #22026 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-08-03 00:55:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						789562c28c 
					 
					
						
						
							
							Support CUTLASS NVFP4 (w4a4) for Blackwell Geforce GPUs (SM120) ( #21309 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es > 
						
						
					 
					
						2025-08-03 00:54:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3f36c325fa 
					 
					
						
						
							
							[Benchmark] Support ready check timeout in vllm bench serve ( #21696 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-03 00:52:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3dddbf1f25 
					 
					
						
						
							
							[Misc] Add tensor schema test coverage for multimodal models ( #21754 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-03 00:52:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						337eb23bcc 
					 
					
						
						
							
							[Fix] Fix llama4 modelopt weight loading error ( #22107 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-03 00:50:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ff46b8826 
					 
					
						
						
							
							[Misc] Bump ray to 2.48.0 ( #22123 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-08-02 19:42:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						554df8a6a2 
					 
					
						
						
							
							Revert "[compile][startup] Disable C++ compilation of symbolic shapes" ( #22122 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiao Liu <xiszishu@gmail.com > 
						
						
					 
					
						2025-08-02 09:03:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73e1b9b1d4 
					 
					
						
						
							
							[xpu]support moe models on XPU platform ( #21643 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yan <yan.ma@intel.com >
Signed-off-by: Yan Ma <yan.ma@intel.com > 
						
						
					 
					
						2025-08-02 07:49:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4abfd8796f 
					 
					
						
						
							
							[V1] [Hybrid] Validate compatibility of attention backend batch reordering at init time ( #21557 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-08-02 05:29:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f5d0f4784f 
					 
					
						
						
							
							[Frontend] Improve error message for too many mm items ( #22114 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-02 02:20:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b690e34824 
					 
					
						
						
							
							[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead ( #21075 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com > 
						
						
					 
					
						2025-08-02 01:59:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						25373b6c6c 
					 
					
						
						
							
							for glm-4.1V update ( #22000 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-02 01:46:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58eee5f2e0 
					 
					
						
						
							
							[PERF] Use faster way of decode in tokenizer: avoid useless list-to-list conversion ( #20000 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai > 
						
						
					 
					
						2025-08-02 01:43:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						067c34a155 
					 
					
						
						
							
							docs: remove deprecated disable-log-requests flag ( #22113 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-02 00:19:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c64861d63c 
					 
					
						
						
							
							[Bugfix] Mamba2 remove bugged initial state condition in chunk scan ( #22034 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com > 
						
						
					 
					
						2025-08-01 23:55:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8564dc9448 
					 
					
						
						
							
							Fix test_kv_sharing_fast_prefill flakiness ( #22038 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-08-01 23:55:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ac8437352 
					 
					
						
						
							
							[Misc] Getting and passing ray runtime_env to workers ( #22040 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-08-01 23:54:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3a6f2120b 
					 
					
						
						
							
							[FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. ( #22069 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaavllm <tunjian.tan@amd.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com > 
						
						
					 
					
						2025-08-01 23:53:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0edaf752d7 
					 
					
						
						
							
							[Attention][DBO] Add support for "splitting" the CommonAttentionMetadata ( #21153 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-08-01 19:47:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e8d8c4afb 
					 
					
						
						
							
							[Test] Add Unit Test for Batched DeepGEMM ( #21559 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-02 10:45:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d524ce79f 
					 
					
						
						
							
							[BugFix] Improve internal DP load balancing ( #21617 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-01 19:45:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f9c38c392 
					 
					
						
						
							
							[Speculators][Speculative Decoding] Add Qwen Eagle3 Support ( #21835 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-08-01 19:43:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a65f46be5e 
					 
					
						
						
							
							[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path ( #21955 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-08-01 19:42:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57393715e8 
					 
					
						
						
							
							[Misc] VLLM_TARGET_DEVICE.lower() ( #22101 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-08-01 19:41:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ee2eb6ecd8 
					 
					
						
						
							
							[Model] Qwen2.5 VL SiLU-and-Mul ( #22066 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kf <kuanfu.liu@embeddedllm.com >
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: kf <kuanfu.liu@embeddedllm.com > 
						
						
					 
					
						2025-08-01 19:34:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23322431c8 
					 
					
						
						
							
							[V1][CUDA] Full cudagraph support for FlashInfer ( #21367 )  
						
						 
						
						
						
						
					 
					
						2025-08-01 21:49:34 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3654847db5 
					 
					
						
						
							
							feat: Add Support GPTQ Quantization MOE on ROCM vllm serve ( #21733 )  
						
						 
						
						
						
						
					 
					
						2025-08-01 21:12:19 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eefbf4a68b 
					 
					
						
						
							
							[Perf] Optimize reshape_and_cache_flash CUDA Kernel ( #22036 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-01 19:18:51 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88faa466d7 
					 
					
						
						
							
							[CI] Initial tests for SM100 Blackwell runner ( #21877 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-01 16:18:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						881e1af43a 
					 
					
						
						
							
							[BugFix] Harden distributed DP startup ( #21538 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-01 21:40:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d84b97a3e3 
					 
					
						
						
							
							Add lora test for tp>1 case for TPU. ( #21970 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-08-01 18:56:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d331759488 
					 
					
						
						
							
							Introduce RayPPCommunicator for ray-based PP ( #21660 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-08-01 11:50:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9659bc7f27 
					 
					
						
						
							
							[compile][startup] Disable C++ compilation of symbolic shapes ( #20836 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Animesh Jain <anijain@umich.edu > 
						
						
					 
					
						2025-08-01 10:38:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3277e8f9e1 
					 
					
						
						
							
							Fix pre-commit failure for SECURTIY.md ( #22102 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-08-01 10:36:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d705996df 
					 
					
						
						
							
							[Misc] Minor enhancement of benchmark_moe ( #22068 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-02 01:35:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						38c8bce8b6 
					 
					
						
						
							
							Enable headless models for pooling in the Transformers backend ( #21767 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-01 10:31:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac45c44d98 
					 
					
						
						
							
							[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch ( #21837 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-08-01 10:14:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6664664b4 
					 
					
						
						
							
							security policy: take 1 ( #21119 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-08-01 10:09:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b879ecd6e2 
					 
					
						
						
							
							[Bugfix] fix when skip tokenizer init ( #21922 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-08-01 10:09:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3f8e952179 
					 
					
						
						
							
							[Bugfix] Fix glm4.1v video inference issue ( #22067 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-08-01 09:33:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						326a1b001d 
					 
					
						
						
							
							Improve documentation of ModelConfig.try_get_generation_config to prevent future confusion ( #21526 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-01 09:32:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2d7b09b998 
					 
					
						
						
							
							Deprecate --disable-log-requests and replace with --enable-log-requests ( #21739 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-01 17:16:37 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97608dc276 
					 
					
						
						
							
							[Docs] use uv in CPU installation docs ( #22089 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-08-01 07:55:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3146519add 
					 
					
						
						
							
							[BugFix] Don't change title of top-level process ( #22032 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-08-01 07:37:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8026a335a1 
					 
					
						
						
							
							[BugFix] Update AttnFusionPass cache key ( #21947 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-08-01 07:11:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a59cd9d9f7 
					 
					
						
						
							
							[Refactor] Fix Compile Warning #1444-D ( #21462 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-01 06:10:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c54d9759d 
					 
					
						
						
							
							[Bugfix][PD] set max_completion_tokens=1 if req has this value ( #21841 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Abirdcfly <fp544037857@gmail.com > 
						
						
					 
					
						2025-08-01 06:08:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0a6d305e0f 
					 
					
						
						
							
							feat(multimodal): Add customizable background color for RGBA to RGB conversion ( #22052 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinheng Li <ahengljh@gmail.com >
Co-authored-by: Jinheng Li <ahengljh@gmail.com > 
						
						
					 
					
						2025-08-01 06:07:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f81c1bb055 
					 
					
						
						
							
							[Bugfix] Check NVIDIA artifactory is accessible before using flashinfer cubin kernels ( #21893 )  
						
						 
						
						
						
						
					 
					
						2025-08-01 08:28:45 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb0e0d46fc 
					 
					
						
						
							
							Fix get_kwargs for case where type hint is list[Union[str, type]] ( #22016 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-01 05:26:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						26b5f7bd2a 
					 
					
						
						
							
							[BUG] [ROCm] Fix import bug on ROCm ( #22083 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-08-01 05:25:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dfbc1f8880 
					 
					
						
						
							
							[Speculative Decoding] Add speculators config support ( #21345 )  
						
						 
						
						
						
						
					 
					
						2025-08-01 08:25:18 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						87c94bc879 
					 
					
						
						
							
							Revert "Update sampling_metadata.py ( #21937 )" ( #22088 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-08-01 05:24:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						28b18cc741 
					 
					
						
						
							
							[Quantization] Enable BNB support for InternS1 ( #21953 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-08-01 11:09:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4931486988 
					 
					
						
						
							
							[Doc] Added warning of speculating with draft model ( #22047 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dilute-l <dilu2333@163.com >
Co-authored-by: Dilute-l <dilu2333@163.com > 
						
						
					 
					
						2025-08-01 02:11:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0f81b310db 
					 
					
						
						
							
							[Misc] Remove upper bound in openai package version ( #22060 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-08-01 02:11:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e6680f9e25 
					 
					
						
						
							
							[Bugfix] Add log prefix in non-dp mode engine core ( #21889 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wuhang <wuhang6@huawei.com > 
						
						
					 
					
						2025-08-01 09:04:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27a145e893 
					 
					
						
						
							
							[Doc] Add example for Step3-VL ( #22061 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-08-01 08:35:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da31f6ad3d 
					 
					
						
						
							
							Revert precompile wheel changes ( #22055 )  
						
						 
						
						
						
						
					 
					
						2025-08-01 08:26:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98df153abf 
					 
					
						
						
							
							[Frontend] Align tool_choice="required" behavior with OpenAI when tools is empty ( #21052 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sungyoon Jeong <sungyoon.jeong@furiosa.ai > 
						
						
					 
					
						2025-08-01 07:54:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0f63e4a35 
					 
					
						
						
							
							[Core] Avoid repeated len(block_token_ids) check in hash_request_tokens ( #21781 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: linzebing <linzebing1995@gmail.com > 
						
						
					 
					
						2025-08-01 00:23:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4e081cb15 
					 
					
						
						
							
							[Bugfix] Disable multi-modal preprocessor cache for DP ( #21896 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-08-01 08:03:56 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						79731a79f0 
					 
					
						
						
							
							[Doc] Fix a syntax error of example code in structured_outputs.md ( #22045 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangzi <3220100013@zju.edu.cn >
Co-authored-by: wangzi <3220100013@zju.edu.cn > 
						
						
					 
					
						2025-08-01 00:01:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53d7c39271 
					 
					
						
						
							
							Update sampling_metadata.py ( #21937 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aviad Rossmann <aviadr@neureality.ai > 
						
						
					 
					
						2025-07-31 23:23:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61dcc280fa 
					 
					
						
						
							
							[Doc] Add Voxtral to Supported Models page ( #22059 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-31 23:10:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0f46a780d4 
					 
					
						
						
							
							[Model] [Quantization] Support quantization for Gemma3n ( #21974 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-07-31 22:45:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1a7fe4af5 
					 
					
						
						
							
							[BugFix] fix: aot passes kvcache dtype information ( #19750 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mickael Seznec <mickael@mistral.ai > 
						
						
					 
					
						2025-08-01 05:45:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82de9b9d46 
					 
					
						
						
							
							[Misc] Automatically resolve HF processor init kwargs ( #22005 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-31 22:44:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad57f23f6a 
					 
					
						
						
							
							[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache ( #20873 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: charent <19562666+charent@users.noreply.github.com > 
						
						
					 
					
						2025-07-31 19:48:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3700642013 
					 
					
						
						
							
							[Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM ( #21787 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-08-01 01:13:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0bd409cf01 
					 
					
						
						
							
							Move flashinfer-python to optional extra vllm[flashinfer] ( #21959 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-31 18:02:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e360316ab9 
					 
					
						
						
							
							Add DeepGEMM to Dockerfile in vllm-base image ( #21533 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-31 18:01:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c3e0e9337e 
					 
					
						
						
							
							[Feature] Add Flashinfer MoE Support for Compressed Tensor NVFP4 ( #21639 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-31 15:26:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e672daf62 
					 
					
						
						
							
							Add FlashInfer allreduce RMSNorm Quant fusion ( #21069 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-07-31 13:58:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2dff2e21d9 
					 
					
						
						
							
							[Bugfix] Fix MTP weight loading  ( #21941 )  
						
						 
						
						
						
						
					 
					
						2025-07-31 16:33:53 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71470bc4af 
					 
					
						
						
							
							[Misc] Add unit tests for chunked local attention ( #21692 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-07-31 11:39:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9e0726e5bf 
					 
					
						
						
							
							[Meta] Official Eagle mm support, first enablement on llama4 ( #20788 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: morgendave <morgendave@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.me > 
						
						
					 
					
						2025-07-31 10:35:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53c21e492e 
					 
					
						
						
							
							Update torch_xla pin to 20250730 ( #21956 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-07-31 17:26:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0780bb5783 
					 
					
						
						
							
							Removing amdproduction Tests ( #22027 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-07-31 09:53:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58bb902186 
					 
					
						
						
							
							fix(setup): improve precompiled wheel setup for Docker builds ( #22025 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dougbtv <dosmith@redhat.com > 
						
						
					 
					
						2025-07-31 09:52:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7349d5268b 
					 
					
						
						
							
							[ez] Remove a trailing space from compilation/decorators.py ( #22028 )  
						
						 
						
						
						
						
					 
					
						2025-07-31 09:46:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9484641616 
					 
					
						
						
							
							[Model] Add step3 vl ( #21998 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: oliveryuan <yuansong@step.ai >
Co-authored-by: oliveryuan <yuansong@step.ai > 
						
						
					 
					
						2025-07-31 23:19:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						207b750e19 
					 
					
						
						
							
							[NVIDIA] Add SM100 Flashinfer MoE per tensor scale fp8 backend ( #21458 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-31 06:00:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5daffe7cf6 
					 
					
						
						
							
							[BugFix] Fix case where collective_rpc returns None ( #22006 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-31 12:51:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2836dd73f1 
					 
					
						
						
							
							[Model][CI] Let more pooling models support v1 ( #21747 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-31 01:51:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d2aab336ad 
					 
					
						
						
							
							[CI/Build] get rid of unused VLLM_FA_CMAKE_GPU_ARCHES ( #21599 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com > 
						
						
					 
					
						2025-07-31 15:00:08 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9532a6d563 
					 
					
						
						
							
							[Deprecation] Remove deprecated args and methods ( #21907 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-30 23:46:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e36fcbee6 
					 
					
						
						
							
							[Bugfix]: fix metadata file copy in test_sharded_state_loader ( #21830 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-31 06:22:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						055bd3978e 
					 
					
						
						
							
							[CI Bugfix] Fix CI OOM for test_shared_storage_connector_hashes ( #21973 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-31 11:45:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0f7919fca0 
					 
					
						
						
							
							[Misc] Expand SUPPORTED_HIDDEN_SIZES  for DeepEP low-latency kernels ( #21818 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-30 20:41:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61445453df 
					 
					
						
						
							
							[UX] Rename CUTLASS_MLA_VLLM_V1 to CUTLASS_MLA ( #21966 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-30 20:40:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec02e536df 
					 
					
						
						
							
							[Bugfix] Relax lang pin for voxtral ( #21833 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-30 20:38:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9cb497bfa3 
					 
					
						
						
							
							[Example] Add async_llm_streaming.py example for AsyncLLM streaming in python ( #21763 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-30 18:39:46 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ca9e2be3ed 
					 
					
						
						
							
							[Core] Move EngineCoreRequest to Request conversion out of EngineCore ( #21627 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: linzebing <linzebing1995@gmail.com > 
						
						
					 
					
						2025-07-30 15:00:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						601f856d56 
					 
					
						
						
							
							[Bugfix] Fix None value handling in trace span creation for cancelled requests ( #20272 )  
						
						 
						
						
						
						
					 
					
						2025-07-30 14:44:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						287f527f54 
					 
					
						
						
							
							[Feature] Add async tensor parallelism for scaled mm ( #20155 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cascade812 <cascade812@outlook.com > 
						
						
					 
					
						2025-07-30 17:23:41 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f12d9256b3 
					 
					
						
						
							
							[Misc] Use dracut on CentOS and skip clone if repo exists for EP kernel installation ( #21635 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-07-30 13:15:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9b753e7a7 
					 
					
						
						
							
							For VLLM_USE_PRECOMPILED, only compiled .so files should be extracted ( #21964 )  
						
						 
						
						
						
						
					 
					
						2025-07-30 13:04:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56bd537dde 
					 
					
						
						
							
							[Misc] Support more collective_rpc return types ( #21845 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-30 10:20:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f0d516715 
					 
					
						
						
							
							[TPU] Support Pathways in vLLM ( #21417 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wenxindongwork <wenxindong@google.com > 
						
						
					 
					
						2025-07-30 10:02:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4135232b9 
					 
					
						
						
							
							feat(distributed): add get_required_kvcache_layout class method to kv connector api ( #20433 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wxsm <wxsms@foxmail.com > 
						
						
					 
					
						2025-07-30 16:41:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4904e53c32 
					 
					
						
						
							
							[Bugfix] SharedStorage Connector for V1 PD multimodal ( #21611 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fake0fan <645327136@qq.com >
Signed-off-by: herotai214 <herotai214@gmail.com >
Co-authored-by: herotai214 <herotai214@gmail.com > 
						
						
					 
					
						2025-07-30 09:18:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						004203e953 
					 
					
						
						
							
							[CI/Build] Fix registry tests ( #21934 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-30 09:10:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c765aec65 
					 
					
						
						
							
							[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types ( #21816 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chiliu <chiliu@paypal.com >
Co-authored-by: chiliu <chiliu@paypal.com > 
						
						
					 
					
						2025-07-30 08:54:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad510309ee 
					 
					
						
						
							
							Override attention metadata for fast prefill in some KV sharing setups ( #21590 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-07-30 08:54:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						366f6b3a4d 
					 
					
						
						
							
							[Bugfix] Fix multi-api server not working for text models ( #21933 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-30 08:42:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e599eebe8 
					 
					
						
						
							
							[Bugfix] Fix OOM tests in initialization test ( #21921 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-30 07:35:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88edf5994c 
					 
					
						
						
							
							[Docs] Reduce the size of the built docs ( #21920 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-30 07:35:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ff08e51940 
					 
					
						
						
							
							[NVIDIA] Fix Llama4 Scout FP4 functionality issues ( #21499 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Po-Han Huang <pohanh@nvidia.com > 
						
						
					 
					
						2025-07-30 07:33:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f4a1c9a04 
					 
					
						
						
							
							[Misc] Improve code readability of KVCacheManager ( #21673 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tanruixiang <tanruixiang0104@gmail.com >
Signed-off-by: Ruixiang Tan <819464715@qq.com >
Signed-off-by: GitHub <noreply@github.com > 
						
						
					 
					
						2025-07-30 07:20:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						36ede45989 
					 
					
						
						
							
							Reduce time wasted in GitHub Actions using concurrency ( #21919 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-30 07:18:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e40b26073 
					 
					
						
						
							
							[CI/Build] Only run markdownlint in CI ( #21892 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-30 07:17:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0271c2ff2f 
					 
					
						
						
							
							[Test] Add Benchmark and Unit Test for per_token_group_quant ( #21860 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-30 07:15:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e91d3c9cda 
					 
					
						
						
							
							[misc] skip p2p check by default ( #21904 )  
						
						 
						
						
						
						
					 
					
						2025-07-30 22:05:04 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf668b5bf5 
					 
					
						
						
							
							[Feature] Support multiple api keys in server ( #18548 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yan Pashkovsky <yanp.bugz@gmail.com > 
						
						
					 
					
						2025-07-30 07:03:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da3e0bd6e5 
					 
					
						
						
							
							[Bugfix] we should use metavar is not choices ( #21902 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-30 06:51:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fcfd1eb9c5 
					 
					
						
						
							
							[Doc] Remove vLLM prefix and add citation for PagedAttention ( #21910 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-30 06:36:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d979dd6beb 
					 
					
						
						
							
							[Feature][EPLB] Add eplb support for Qwen3 ( #20815 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: aladerran <aladerran@gmail.com > 
						
						
					 
					
						2025-07-30 06:27:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b876860c62 
					 
					
						
						
							
							[Hardware][CPU] Build fix for ARM without BF16 ( #21848 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-07-30 06:22:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						13986365a9 
					 
					
						
						
							
							Add @patrickvonplaten as maintainer of mistral's related files. ( #21928 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com > 
						
						
					 
					
						2025-07-30 20:42:51 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c8fe389d6 
					 
					
						
						
							
							[Docs] Fix the example code of streaming chat completions in reasoning ( #21825 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangzi <3220100013@zju.edu.cn >
Co-authored-by: wangzi <3220100013@zju.edu.cn >
Co-authored-by: Zi Wang <66560864+BruceW-07@users.noreply.github.com > 
						
						
					 
					
						2025-07-30 12:11:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5bbaf492a6 
					 
					
						
						
							
							[Doc] Update partial support ( #21916 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-30 01:32:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						533db0935d 
					 
					
						
						
							
							[benchmark] add max-concurrency in result table ( #21095 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-07-30 01:15:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc91da5499 
					 
					
						
						
							
							[Model] Remove DSV2 unused code ( #21903 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-30 00:55:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						547795232d 
					 
					
						
						
							
							[Tests] Fixing bug inside MultiModalProfiler. ( #21842 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com > 
						
						
					 
					
						2025-07-30 00:44:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30ef30ed5a 
					 
					
						
						
							
							[CI] rollback lint-and-deploy pipeline using amd machine ( #21912 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-07-30 00:37:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						02f82fe438 
					 
					
						
						
							
							[Doc] Update Intern-S1 info  ( #21908 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-29 23:58:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ca5f82c2a 
					 
					
						
						
							
							[Misc] Remove redundant config definitions ( #21891 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-29 23:54:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6f8d261882 
					 
					
						
						
							
							Update vLLM Benchmark Suite for Xeon based on 0.9.2 release  ( #21486 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tsai, Louie <louie.tsai@intel.com > 
						
						
					 
					
						2025-07-30 05:57:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4cd7fe6cea 
					 
					
						
						
							
							[Docs] Expand introduction to Ray in Multi-node deployment section ( #21584 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-29 22:07:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						16f3250527 
					 
					
						
						
							
							[CI/Build] Fix pre-commit failure in docs ( #21897 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-29 21:53:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e3bc17ceea 
					 
					
						
						
							
							Add @sighingnow as maintainer of qwen's related files. ( #21895 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-07-29 21:30:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05cbbe20c5 
					 
					
						
						
							
							[XPU] use ZE_AFFINITY_MASK for device select on xpu ( #21815 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-30 03:56:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65f311ce59 
					 
					
						
						
							
							[Frontend] Add LLM.reward specific to reward models ( #21720 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-29 20:56:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b0a155534 
					 
					
						
						
							
							[Perf] Using __nv_fp8_e4m3 instead of c10::e4m3 for per_token_group_quant ( #21867 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-29 21:50:46 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44bc46da60 
					 
					
						
						
							
							[Bugfix] Actually disable processing cache when API server is scaled out ( #21839 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-29 20:36:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b7b23da4d2 
					 
					
						
						
							
							[Bugfix] Fix comment typo of get_num_common_prefix_blocks() ( #21827 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: MingzhenHan <hanmingzhen2002@outlook.com > 
						
						
					 
					
						2025-07-29 20:35:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdde18229e 
					 
					
						
						
							
							[Bugfix] Fix shape mismatch assertion error when loading Gemma3n model with BitsAndBytes quantization ( #21808 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sydarb <areebsyed237@gmail.com > 
						
						
					 
					
						2025-07-30 11:35:21 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b917da442b 
					 
					
						
						
							
							Expose PyTorch profiler configuration to environment variables ( #21803 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Csrayz <33659823+Csrayz@users.noreply.github.com > 
						
						
					 
					
						2025-07-29 19:46:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb58e3a651 
					 
					
						
						
							
							[Docs] Update docker.md with HF_TOKEN, new model, and podman fix ( #21856 )  
						
						 
						
						
						
						
					 
					
						2025-07-29 19:45:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						76080cff79 
					 
					
						
						
							
							[DOC] Fix path of v1 related figures ( #21868 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-29 19:45:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba5c5e5404 
					 
					
						
						
							
							[Docs] Switch to better markdown linting pre-commit hook ( #21851 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-29 19:45:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						555e7225bc 
					 
					
						
						
							
							[v1][attention] Support Hybrid Allocator + FlashInfer ( #21412 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-07-30 01:45:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e36abf993 
					 
					
						
						
							
							[Bugfix] Correct max tokens for non-contiguous embeds ( #21798 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com >
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com > 
						
						
					 
					
						2025-07-30 01:16:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						452b2a3180 
					 
					
						
						
							
							[ci] mark blackwell test optional for now ( #21878 )  
						
						 
						
						
						
						
					 
					
						2025-07-29 18:03:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d0cc9e150 
					 
					
						
						
							
							[ci] add b200 test placeholder ( #21866 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-07-29 17:11:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9266d98048 
					 
					
						
						
							
							[BugFix] Fix interleaved sliding window not set for Gemma3n ( #21863 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-07-29 16:34:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						176bbce1db 
					 
					
						
						
							
							Revert "[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure ( #21647 )" ( #21850 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-29 21:56:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a1873db23d 
					 
					
						
						
							
							docker: docker-aware precompiled wheel support ( #21127 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dougbtv <dosmith@redhat.com > 
						
						
					 
					
						2025-07-29 14:45:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a33ea28b1b 
					 
					
						
						
							
							Add flashinfer_python to CUDA wheel requirements ( #21389 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-29 12:51:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b49cb1c6b 
					 
					
						
						
							
							[Doc] update Contributing page's testing section ( #18272 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-07-29 10:32:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f03e9cf2bb 
					 
					
						
						
							
							[Doc] Add FusedMoE Modular Kernel Documentation ( #21623 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-29 10:32:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37f86d9048 
					 
					
						
						
							
							[Docs] use uv in GPU installation docs ( #20277 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-07-29 10:32:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58b11b24a6 
					 
					
						
						
							
							[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend ( #21525 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-07-29 10:34:00 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad341c5194 
					 
					
						
						
							
							[Bugfix]fix mixed bits and visual language model quantization in AutoRound ( #21802 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com > 
						
						
					 
					
						2025-07-29 07:26:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						759b87ef3e 
					 
					
						
						
							
							[TPU] Add an optimization doc on TPU ( #21155 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-29 07:23:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f693b067a2 
					 
					
						
						
							
							[Docs] Merge design docs for a V1 only future ( #21832 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-29 07:22:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04e38500ee 
					 
					
						
						
							
							[Bugfix] VLLM_V1 supports passing other compilation levels ( #19340 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-29 09:35:58 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ab714131e4 
					 
					
						
						
							
							[Doc] Update compatibility matrix for pooling and multimodal models ( #21831 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-29 06:29:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						755fa8b657 
					 
					
						
						
							
							[KVCache] Make KVCacheSpec hashable ( #21791 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-07-29 19:58:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2470419119 
					 
					
						
						
							
							[Docs] Fix the outdated URL for installing from vLLM binaries ( #21523 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-29 04:56:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61a6905ab0 
					 
					
						
						
							
							[Model] Refactor JambaForCausalLM ( #21394 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-29 18:25:07 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37efc63b64 
					 
					
						
						
							
							[V0 deprecation] Guided decoding ( #21347 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Reza Barazesh <rezabarazesh@meta.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-29 03:15:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4528f0cac 
					 
					
						
						
							
							[Model]: Fused MoE for nomic-embed-text-v2-moe ( #18321 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-29 03:13:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a2480251ec 
					 
					
						
						
							
							[Doc] Link to RFC for pooling optimizations ( #21806 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-28 23:53:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7234fe2685 
					 
					
						
						
							
							[Misc] Rework process titles ( #21780 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-29 05:14:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f1e2c095ec 
					 
					
						
						
							
							Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema ( #21684 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-28 22:09:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12a223ef9b 
					 
					
						
						
							
							[AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM ( #21766 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-29 03:35:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e18f085103 
					 
					
						
						
							
							skip fusedmoe layer for start_load_kv ( #21378 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: calvin chen <wen.chen@dynamia.ai > 
						
						
					 
					
						2025-07-28 18:59:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						afa2607596 
					 
					
						
						
							
							[CI] Parallelize Kernels MoE Test ( #21764 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-28 18:56:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						48b763d6b5 
					 
					
						
						
							
							[Refactor] Merge Compressed Tensor FP8 CompressedTensorsW8A8Fp8MoEMethod and CompressedTensorsW8A8Fp8MoECutlassMethod ( #21775 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-28 19:47:21 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						947e982ede 
					 
					
						
						
							
							[Docs] Minimize spacing for supported_hardware.md table ( #21779 )  
						
						 
						
						
						
						
					 
					
						2025-07-28 18:46:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6c9122d50 
					 
					
						
						
							
							[Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning ( #20396 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com >
Co-authored-by: Duncan Moss <djm.moss@gmail.com > 
						
						
					 
					
						2025-07-28 23:13:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8aa1485fcf 
					 
					
						
						
							
							[Perf] Disable chunked local attention by default with llama4 ( #21761 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-28 18:49:04 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89ac266b26 
					 
					
						
						
							
							[Feat]: Add support for Dynamic Quant 4 bit CPU kleidiai kernels ( #17112 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-28 20:55:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6f36cfa26 
					 
					
						
						
							
							[Bugfix] DeepGEMM is not enabled on B200 due to _lazy_init() ( #21472 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Clayton Coleman <smarterclayton@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-28 20:51:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b18b417fbf 
					 
					
						
						
							
							Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" ( #21778 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu > 
						
						
					 
					
						2025-07-28 20:15:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ba1c88a93 
					 
					
						
						
							
							[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure ( #21647 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-28 20:11:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0e58f9729 
					 
					
						
						
							
							[Bug] Enforce contiguous input for dynamic_scaled_fp8_quant and static_scaled_fp8_quant ( #21773 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-28 19:55:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b361f14e39 
					 
					
						
						
							
							[AMD][BugFix] Fix omission  of wvSplitK kernel for small batch sizes (1-4) due to torch.compile ( #21350 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Randall Smith <Randall.Smith@amd.com > 
						
						
					 
					
						2025-07-28 15:38:20 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01c753ed98 
					 
					
						
						
							
							update flashinfer to v0.2.9rc2 ( #21701 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Weiliang Liu <weiliangl@nvidia.com > 
						
						
					 
					
						2025-07-28 19:31:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						94b71ae106 
					 
					
						
						
							
							Use metavar to list the choices for a CLI arg when custom values are also accepted ( #21760 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-28 19:31:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7d44c691b0 
					 
					
						
						
							
							[P/D] Log warnings related to prefill KV expiry ( #21753 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-28 18:40:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e17a4d3bf9 
					 
					
						
						
							
							[Bugfix] Fix granite speech shape validation ( #21762 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-28 14:19:21 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec261b0291 
					 
					
						
						
							
							[XPU] IPEX-optimized Punica Wrapper on XPU ( #21703 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chzhang <chaojun.zhang@intel.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-28 16:43:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04fe61aa3d 
					 
					
						
						
							
							[CI/Build] Fix plugin tests ( #21758 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-28 15:08:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						25708d317a 
					 
					
						
						
							
							[Bugfix] Mistral crashes on tool with no description ( #21167 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: HugoMichard <hugo@harfanglab.fr > 
						
						
					 
					
						2025-07-28 08:03:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e18a5d058 
					 
					
						
						
							
							[Misc] Reduce logs for model resolution ( #21765 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-28 07:59:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34a20c49b3 
					 
					
						
						
							
							[Logs] Change flashinfer sampler logs to once ( #21759 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-28 06:59:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31084b3b1f 
					 
					
						
						
							
							[Bugfix][CI/Build] Update peft version in test requirement ( #21729 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-28 06:17:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bccc43c033 
					 
					
						
						
							
							[Bugfix]check health for engine core process exiting unexpectedly ( #21728 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wuhang <wuhang6@huawei.com > 
						
						
					 
					
						2025-07-28 06:17:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1395dd9c28 
					 
					
						
						
							
							[Docs] Add revision date to rendered docs ( #21752 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-28 06:12:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ace2eaf35 
					 
					
						
						
							
							[Bugfix] Improve JSON extraction in LlamaToolParser ( #19024 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: keru <keyang.ru@oracle.com >
Co-authored-by: keru <keyang.ru@oracle.com > 
						
						
					 
					
						2025-07-28 12:36:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						656c24f1b5 
					 
					
						
						
							
							[Ernie 4.5] Name Change for Base 0.3B Model ( #21735 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vasqu <antonprogamer@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-28 12:22:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63fe3a700f 
					 
					
						
						
							
							[PD] let p2p nccl toy proxy handle /chat/completions ( #21734 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-28 11:45:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ae970ed15 
					 
					
						
						
							
							[Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme ( #21744 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-28 04:26:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65e8466c37 
					 
					
						
						
							
							[Bugfix] Fix environment variable setting in CPU Dockerfile ( #21730 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-28 11:02:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b769dccf3 
					 
					
						
						
							
							[Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts ( #21717 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-28 11:02:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2cc571199b 
					 
					
						
						
							
							[feature] add log non default args in LLM ( #21680 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-28 02:21:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4ed731546 
					 
					
						
						
							
							[Model] Prioritize Transformers fallback over suffix matching ( #21719 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-28 02:15:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d128d0d554 
					 
					
						
						
							
							Migrate KeyeImageInputs and KeyeVideoInputs to TensorSchema ( #21686 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-28 01:16:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6c050286a 
					 
					
						
						
							
							[v1][mamba] Added mamba_type into MambaSpec ( #21715 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: asafg <asafg@ai21.com >
Co-authored-by: asafg <asafg@ai21.com > 
						
						
					 
					
						2025-07-28 08:15:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						139a7f07bd 
					 
					
						
						
							
							[BugFix] Fix ChunkedLocalAttention when the hybrid kv-cache is disabled ( #21707 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-28 07:18:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						150d9e6337 
					 
					
						
						
							
							[Bugfix] fix max-file-size type from str to int ( #21675 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-28 00:06:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						139a97ec56 
					 
					
						
						
							
							[Bugfix] Fix shape checking for Fuyu ( #21709 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-28 00:05:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						18cc33dd60 
					 
					
						
						
							
							[bugfix] fix profile impact benchmark results ( #21507 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-27 22:44:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7656cf4cf3 
					 
					
						
						
							
							[Bugfix] [issue-21565] Fix the incompatibility issue with stream and named function calling when Thinking is disabled ( #21573 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangzi <3220100013@zju.edu.cn >
Co-authored-by: wangzi <3220100013@zju.edu.cn > 
						
						
					 
					
						2025-07-27 22:43:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ea57a56d9 
					 
					
						
						
							
							Migrate Idefics3ImagePixelInputs and Idefics3ImageEmbeddingInputs to … ( #21683 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-27 22:37:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75856bc2cb 
					 
					
						
						
							
							Migrate GraniteSpeechAudioInputs to TensorSchema ( #21682 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-27 22:37:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						304dcdf575 
					 
					
						
						
							
							Migrate GLMVImagePixelInputs to TensorSchema ( #21679 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-27 22:36:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88e46c7c8d 
					 
					
						
						
							
							Migrate Glm4vImageInputs, Glm4vVideoInputs to TensorSchema ( #21678 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk 
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-27 22:36:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d8937de4c8 
					 
					
						
						
							
							Migrate Gemma3ImagePixelInputs to TensorSchema ( #21676 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-27 22:36:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e626d286f5 
					 
					
						
						
							
							[FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel ( #21242 )  
						
						 
						
						
						
						
					 
					
						2025-07-28 05:07:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7ffe93d9c 
					 
					
						
						
							
							[Model] Support TP/PP/mamba2 kernel for PLaMo2 ( #19674 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shinichi Hemmi <shemmi@preferred.jp >
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
Co-authored-by: Calvin Metzger <metzger@preferred.jp >
Co-authored-by: Sixue Wang <cecilwang@preferred.jp > 
						
						
					 
					
						2025-07-28 05:00:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						15a72ac478 
					 
					
						
						
							
							[V1] Exception Handling when Loading KV Cache from Remote Store ( #21534 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: liuyumoye <adeline_ly2023@outlook.com >
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com > 
						
						
					 
					
						2025-07-27 20:34:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04ff4be310 
					 
					
						
						
							
							[Misc]  Add fused_moe configs for Qwen3-Coder-480B-A35B-Instruct-FP8 ( #21700 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-27 20:12:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						93269bb43e 
					 
					
						
						
							
							Fix GLM tool parser ( #21668 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Chenhui Zhang <zhang.chenhui@outlook.com > 
						
						
					 
					
						2025-07-28 10:46:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82acf2184d 
					 
					
						
						
							
							Fix typo for limit-mm-per-prompt in docs ( #21697 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joachim Studnia <joachim@mistral.ai > 
						
						
					 
					
						2025-07-27 19:45:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86ae693f20 
					 
					
						
						
							
							[Deprecation][2/N] Replace --task with --runner and --convert ( #21470 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-27 19:42:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f605ee309 
					 
					
						
						
							
							[Attention] Make CutlassMLA the default backend for SM100 (blackwell) ( #21626 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-27 20:13:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a9b2a1d704 
					 
					
						
						
							
							[Misc] Refactor vllm config str ( #21666 )  
						
						 
						
						
						
						
					 
					
						2025-07-27 09:51:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57c22e57f9 
					 
					
						
						
							
							Fix CUDA permute/unpermute for use with DeepGemm Moe ( #17934 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn > 
						
						
					 
					
						2025-07-27 07:08:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bda9d0535f 
					 
					
						
						
							
							[Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor ( #21631 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-27 05:25:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d847a3125 
					 
					
						
						
							
							[VLM] Add video support for Intern-S1 ( #21671 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-27 11:49:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5f8c9a425e 
					 
					
						
						
							
							Migrate Florence2ImagePixelInputs to TensorSchema ( #21663 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-27 02:43:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cbf951ba2 
					 
					
						
						
							
							[Misc] add default value for file pattern arg ( #21659 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-27 05:14:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a8936e5193 
					 
					
						
						
							
							Refactor: Remove numpy dependency from LoggingStatLogger ( #20529 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com > 
						
						
					 
					
						2025-07-27 04:06:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01a395e9e7 
					 
					
						
						
							
							[CI/Build][Doc] Clean up more docs that point to old bench scripts ( #21667 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-07-27 04:02:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						971948b846 
					 
					
						
						
							
							Handle non-serializable objects in vllm bench ( #21665 )  
						
						 
						
						
						
						
					 
					
						2025-07-27 03:35:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eed2f463b2 
					 
					
						
						
							
							[VLM] Support HF format Phi-4-MM model ( #17121 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-26 20:07:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						20950b29fb 
					 
					
						
						
							
							Migrate ChameleonImagePixelInputs to TensorSchema ( #21657 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-26 19:34:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3339cba3ff 
					 
					
						
						
							
							Migrate FuyuImagePatchInputs to TensorSchema ( #21662 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-26 19:34:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b8caf9095 
					 
					
						
						
							
							Migrate DeepseekVL2ImageInputs to TensorSchema ( #21658 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-26 19:34:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ccf27cc4d4 
					 
					
						
						
							
							Migrate Blip2ImagePixelInputs and Blip2ImageEmbeddingInputs to TensorSchema ( #21656 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-27 10:33:52 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c657369841 
					 
					
						
						
							
							support torch.compile for bailing moe ( #21664 )  
						
						 
						
						
						
						
					 
					
						2025-07-26 23:54:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c66f28fa5 
					 
					
						
						
							
							Remove xformers requirement for Mistral-format Pixtral and Mistral3 ( #21154 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wenchen Lo <charles761013@gmail.com > 
						
						
					 
					
						2025-07-26 17:20:29 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de509ae8eb 
					 
					
						
						
							
							[NVIDIA] Explicitly disable shuffled weights for flashinfer blockscale moe fp8 kernels ( #21411 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kaixih <kaixih@nvidia.com > 
						
						
					 
					
						2025-07-26 07:10:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7c4f9ee86 
					 
					
						
						
							
							[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI ( #21355 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-07-26 07:10:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9094d11c5d 
					 
					
						
						
							
							[Bugfix][Apple Silicon] fix missing symbols when build from source on Mac with Apple Silicon ( #21380 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yeju Zhou <yejuzhou@outlook.com > 
						
						
					 
					
						2025-07-26 07:09:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56e544f24b 
					 
					
						
						
							
							[Refactor] Remove moe_align_block_size_triton ( #21335 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-26 07:08:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97d6c30cc9 
					 
					
						
						
							
							[BugFix] Fix shared storage connector load kv only load attention layer ( #21428 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Chen <530634352@qq.com > 
						
						
					 
					
						2025-07-26 07:07:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a40a8506df 
					 
					
						
						
							
							[Misc] Improve memory profiling debug message ( #21429 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-07-26 07:07:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c215f5c877 
					 
					
						
						
							
							[Bug] Fix has_flashinfer_moe Import Error when it is not installed ( #21634 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-26 07:06:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cd6eaba54 
					 
					
						
						
							
							Support encoder-only models without KV-Cache ( #21270 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-07-26 21:09:52 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f27fdfc3ed 
					 
					
						
						
							
							[Bugfix] Investigate Qwen2-VL failing test ( #21527 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-26 06:09:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de10ff0b7c 
					 
					
						
						
							
							Migrate AyaVisionImagePixelInputs to TensorSchema for shape validation ( #21622 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-26 06:08:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d197280fa 
					 
					
						
						
							
							Migrate AriaImagePixelInputs to TensorSchema for shape validation ( #21620 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-26 06:08:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e98def439c 
					 
					
						
						
							
							[Take 2] Correctly kill vLLM processes after benchmarks ( #21646 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-07-26 06:06:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05c1126f29 
					 
					
						
						
							
							[Misc] remove unused try-except in pooling config check ( #21618 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-26 12:20:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						875af38e01 
					 
					
						
						
							
							Support Intern-S1 ( #21628 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Your Name <you@example.com >
Co-authored-by: Roger Wang <hey@rogerw.me >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-26 19:14:04 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7728dd77bb 
					 
					
						
						
							
							[TPU][Test] Divide TPU v1 Test into 2 parts. ( #21431 )  
						
						 
						
						
						
						
					 
					
						2025-07-26 06:20:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f6e6b33fb 
					 
					
						
						
							
							[Bugfix] Fix isinstance check for tensor types in _load_prompt_embeds to use dtype comparison ( #21612 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexandre Juan <a.juan@netheos.net > 
						
						
					 
					
						2025-07-25 20:11:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a55c95096b 
					 
					
						
						
							
							Correctly kill vLLM processes after finishing serving benchmarks ( #21641 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-07-25 19:06:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97349fe2bc 
					 
					
						
						
							
							[Docs] add offline serving multi-modal video input expamle Qwen2.5-VL ( #21530 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Chen <530634352@qq.com > 
						
						
					 
					
						2025-07-25 18:37:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						62965de5fe 
					 
					
						
						
							
							[Model] Ultravox: Support Llama 4 and Gemma 3 backends ( #17818 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai >
Signed-off-by: Patrick Li <patrick8289@gmail.com >
Co-authored-by: Patrick Li <patrick8289@gmail.com > 
						
						
					 
					
						2025-07-25 18:12:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ae75fa6d0 
					 
					
						
						
							
							[Feature] Add support for MoE models in the calibration-free RTN-based quantization ( #20766 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex Kogan <alex.kogan@oracle.com > 
						
						
					 
					
						2025-07-25 18:09:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f1b286b2fb 
					 
					
						
						
							
							[TPU] Update ptxla nightly version to 20250724 ( #21555 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-25 17:09:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7742d6113 
					 
					
						
						
							
							[Bugfix] Always set RAY_ADDRESS for Ray actor before spawn ( #21540 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-25 17:08:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cea96a0156 
					 
					
						
						
							
							[Bugfix] Fix sync_and_slice_intermediate_tensors ( #21537 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-25 17:07:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2eddd437ba 
					 
					
						
						
							
							Add interleaved RoPE test for Llama4 (Maverick) ( #21478 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-07-25 17:07:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75d29cf4e1 
					 
					
						
						
							
							[Perf] Cuda Kernel for Int8 Per Token Group Quant ( #21476 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-25 17:07:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41d3082c41 
					 
					
						
						
							
							Add Unsloth to RLHF.md ( #21636 )  
						
						 
						
						
						
						
					 
					
						2025-07-25 17:06:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7cfea0df39 
					 
					
						
						
							
							[TPU][Test] Rollback PR-21550. ( #21619 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-25 13:22:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5ac3168ee3 
					 
					
						
						
							
							[Docs] add auto-round quantization readme  ( #21600 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wenhua Cheng <wenhua.cheng@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-25 08:52:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						396ee94180 
					 
					
						
						
							
							[CI] Unifying Dockerfiles for ARM and X86 Builds ( #21343 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-07-25 07:33:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e189b50f53 
					 
					
						
						
							
							Add support for Prithvi in Online serving mode ( #21518 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-25 07:01:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						136d750f5f 
					 
					
						
						
							
							[Kernel] Improve machete memory bound perf ( #21556 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: czhu-cohere <conway.zhu@cohere.com > 
						
						
					 
					
						2025-07-25 06:53:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3caeb82e7 
					 
					
						
						
							
							[ROCm][AITER] Enable fp8 kv cache on rocm aiter backend. ( #20295 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fsx950223 <fsx950223@outlook.com >
Signed-off-by: amd-ruitang3 <Rui.Tang2@amd.com >
Co-authored-by: amd-ruitang3 <Rui.Tang2@amd.com > 
						
						
					 
					
						2025-07-25 06:50:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eab2f3980c 
					 
					
						
						
							
							[Model] Replace Mamba2 RMSNorm Gated with Fused Triton Kernel ( #20839 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Signed-off-by: Yu Chin Fabian Lim <fabian.lim@gmail.com >
Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: Yu Chin Fabian Lim <fabian.lim@gmail.com > 
						
						
					 
					
						2025-07-25 06:49:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9fe98d4250 
					 
					
						
						
							
							[Frontend] Add request_id to the Request object so they can be controlled better via external load balancers ( #21009 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-07-25 06:49:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29c6fbe58c 
					 
					
						
						
							
							[MODEL] New model support for naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B ( #20931 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: bigshanedogg <bigshane319@gmail.com > 
						
						
					 
					
						2025-07-25 06:05:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c72f049cb4 
					 
					
						
						
							
							[Model] Fix Ernie4.5MoE e_score_correction_bias parameter ( #21586 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhouchong <zhouchong03@baidu.com >
Co-authored-by: zhouchong <zhouchong03@baidu.com > 
						
						
					 
					
						2025-07-25 06:02:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f3a683b7c9 
					 
					
						
						
							
							[Bugfix][Logprobs] Fix logprobs op to support more backend ( #21591 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: MengqingCao <cmq0113@163.com > 
						
						
					 
					
						2025-07-25 05:53:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						46d81d6951 
					 
					
						
						
							
							[V1] Get supported tasks from model runner instead of model config ( #21585 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-25 05:36:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c3f2628d5 
					 
					
						
						
							
							[Quantization] Enable BNB support for more MoE models ( #21370 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-25 03:57:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7311f74468 
					 
					
						
						
							
							[Bugfix] GGUF: fix AttributeError: 'PosixPath' object has no attribute 'startswith' ( #21579 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-07-25 03:42:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ed01e32f7 
					 
					
						
						
							
							Add H20-3e fused MoE kernel tuning configs for Qwen3-Coder-480B-A35B-Instruct ( #21598 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com > 
						
						
					 
					
						2025-07-25 02:36:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e38e96a3c0 
					 
					
						
						
							
							[Tests] Harden DP tests ( #21508 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-25 02:27:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40d86ee412 
					 
					
						
						
							
							[TPU][Bugfix] fix OOM issue in CI test ( #21550 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-24 23:01:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						85d051f026 
					 
					
						
						
							
							[Misc] Removed undefined cmake variables MOE_PERMUTE_ARCHS ( #21262 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang Chen <yangche@fb.com > 
						
						
					 
					
						2025-07-24 22:54:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5140f54b89 
					 
					
						
						
							
							[CI/Build] fix cpu_extension for apple silicon ( #21195 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ignaciosica <mignacio.sica@gmail.com > 
						
						
					 
					
						2025-07-24 22:53:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						947edd099e 
					 
					
						
						
							
							[Misc][Tools] make max-model-len a parameter in auto_tune script ( #21321 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-24 22:46:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fde60ee775 
					 
					
						
						
							
							[Model] Fix a check for None but the return value was empty list in Gemma3 MM vision_embeddings ( #21479 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hongmin Fan <fanhongmin@google.com > 
						
						
					 
					
						2025-07-25 13:46:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b38bc652ac 
					 
					
						
						
							
							[Model] Support tensor parallel for timm ViT in Deepseek_vl2 ( #21494 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wzqd <1057337859@qq.com > 
						
						
					 
					
						2025-07-24 22:45:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						adaf2c6d4f 
					 
					
						
						
							
							[Bugfix] fix modelscope snapshot_download serialization ( #21536 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-24 22:44:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						42343f1f89 
					 
					
						
						
							
							[CI] Update CODEOWNERS for CPU and Intel GPU ( #21582 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-24 21:58:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						965bc71b04 
					 
					
						
						
							
							Integrate TensorSchema with shape validation for Phi3VImagePixelInputs ( #21232 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benji Beck <benjibeck@meta.com > 
						
						
					 
					
						2025-07-24 21:43:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						807a328bb6 
					 
					
						
						
							
							[Docs] Add requirements/common.txt to run unit tests ( #21572 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhou Fang <fang.github@gmail.com > 
						
						
					 
					
						2025-07-24 20:51:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0be2c4d09 
					 
					
						
						
							
							[TPU][Test] Temporarily suspend this MoE model in test_basic.py. ( #21560 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-24 20:44:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9c8b2c2a8a 
					 
					
						
						
							
							[DP] Support api-server-count > 0 in hybrid DP LB mode ( #21510 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-24 20:18:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2212cd6cfb 
					 
					
						
						
							
							[Bugfix] DeepGemm utils : Fix hardcoded type-cast ( #21517 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-24 20:17:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce3a9b1378 
					 
					
						
						
							
							[Kernel] adding fused_moe configs for upcoming granite4 ( #21332 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-24 20:16:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ce90e5b01 
					 
					
						
						
							
							Fix GLM-4 PP Missing Layer When using with PP. ( #21531 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com > 
						
						
					 
					
						2025-07-24 20:07:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						633f6e804b 
					 
					
						
						
							
							[Bug] Fix DeepGemm Init Error ( #21554 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-24 20:07:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b57296bb9a 
					 
					
						
						
							
							[Docs] Fix site_url for RunLLM ( #21564 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 20:05:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34ddcf9ff4 
					 
					
						
						
							
							[Frontend] run-batch supports V1 ( #21541 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-24 20:05:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe56180c7f 
					 
					
						
						
							
							[MoE] More balanced expert sharding ( #21497 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai > 
						
						
					 
					
						2025-07-24 15:56:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						07d80d7b0e 
					 
					
						
						
							
							[TPU][TEST] HF_HUB_DISABLE_XET=1 the test 3. ( #21539 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-24 15:33:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2dd72d23d9 
					 
					
						
						
							
							update flashinfer to v0.2.9rc1 ( #21485 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Weiliang Liu <weiliangl@nvidia.com > 
						
						
					 
					
						2025-07-24 14:06:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6c7fb8cff 
					 
					
						
						
							
							[Docs] Add Expert Parallelism Initial Documentation ( #21373 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 12:36:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a7272c23d0 
					 
					
						
						
							
							[Docs][minor] Fix broken gh-file link in distributed serving docs ( #21543 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-24 10:36:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6066284914 
					 
					
						
						
							
							[P/D] Support CPU Transfer in NixlConnector ( #18293 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Juncheng Gu <juncgu@gmail.com >
Signed-off-by: Richard Liu <ricliu@google.com >
Co-authored-by: Richard Liu <39319471+richardsliu@users.noreply.github.com >
Co-authored-by: Richard Liu <ricliu@google.com > 
						
						
					 
					
						2025-07-24 17:58:42 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e9ea8e69d 
					 
					
						
						
							
							[P/D] Move FakeNixlWrapper to test dir ( #21328 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-24 08:53:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d9f9a3fd96 
					 
					
						
						
							
							[XPU] Conditionally import CUDA-specific passes to avoid import errors on xpu platform ( #21036 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chzhang <chaojun.zhang@intel.com > 
						
						
					 
					
						2025-07-24 23:23:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b25f1fe75 
					 
					
						
						
							
							Update flashinfer CUTLASS MoE Kernel ( #21408 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shu Wang. <shuw@nvidia.com > 
						
						
					 
					
						2025-07-24 08:13:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e8cb0d0495 
					 
					
						
						
							
							[Bug] Fix Compressed Tensor NVFP4 cutlass_fp4_group_mm illegal memory access ( #21465 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-24 08:13:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						684174115d 
					 
					
						
						
							
							[Docs] Rewrite Distributed Inference and Serving guide ( #20593 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 08:13:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cdb79ee63d 
					 
					
						
						
							
							[Docs] Update Tensorizer usage documentation ( #21190 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sanger Steel <sangersteel@gmail.com >
Signed-off-by: William Goldby <willgoldby@gmail.com >
Co-authored-by: William Goldby <willgoldby@gmail.com > 
						
						
					 
					
						2025-07-24 06:56:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a19a6c670 
					 
					
						
						
							
							[Fix] Update mamba_ssm to 2.2.5 ( #21421 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 03:25:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ded067fd2 
					 
					
						
						
							
							[Bugfix] Fix CUDA arch flags for MoE permute ( #21426 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-07-24 03:23:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						13abd0eaf9 
					 
					
						
						
							
							[Model] Officially support Emu3 with Transformers backend ( #21319 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 03:22:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61b8cea3b4 
					 
					
						
						
							
							[Attention] Optimize FlashInfer MetadataBuilder Build call ( #21137 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-24 03:21:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						526078a96c 
					 
					
						
						
							
							bump flashinfer to v0.2.8 ( #21385 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 03:20:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6da0078523 
					 
					
						
						
							
							[Feat] Allow custom naming of vLLM processes ( #21445 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-24 03:15:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73e3949d07 
					 
					
						
						
							
							[Misc] Improve comment for DPEngineCoreActor._set_cuda_visible_devices() ( #21501 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-24 03:13:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6eca337ce0 
					 
					
						
						
							
							Replace --expand-tools-even-if-tool-choice-none with --exclude-tools-when-tool-choice-none for v0.10.0 ( #20544 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: okada <kokuzen@gmail.com >
Signed-off-by: okada shintarou <okada@preferred.jp >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 02:56:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						85bda9e7d0 
					 
					
						
						
							
							remove GLM-4.5 quantization wrong Code ( #21435 )  
						
						 
						
						
						
						
					 
					
						2025-07-24 01:52:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						610852a423 
					 
					
						
						
							
							[Core] Support model loader plugins ( #21067 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 01:49:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f0f4de8f26 
					 
					
						
						
							
							[Misc] Fix duplicate FusedMoEConfig debug messages ( #21455 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-24 01:27:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc5f756db4 
					 
					
						
						
							
							[v1][Core] Clean up usages of SpecializedManager ( #21407 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhou Fang <fang.github@gmail.com > 
						
						
					 
					
						2025-07-24 00:40:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e74bfc70e4 
					 
					
						
						
							
							[TPU][Bugfix] fix moe layer ( #21340 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-07-24 00:38:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90eeea8f85 
					 
					
						
						
							
							[Bugfix][ROCm] Fix for warp_size uses on host ( #21205 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-24 00:37:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dde295a934 
					 
					
						
						
							
							Deduplicate Transformers backend code using inheritance ( #21461 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-24 00:16:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d8d0a24c0 
					 
					
						
						
							
							Add think chunk ( #21333 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Denize <julien.denize@mistral.ai > 
						
						
					 
					
						2025-07-23 21:51:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11ef7a611e 
					 
					
						
						
							
							[BugFix] Set CUDA_VISIBLE_DEVICES before spawning the subprocesses ( #21211 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-23 21:44:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc2f159f8a 
					 
					
						
						
							
							Dump input metadata on crash for async scheduling ( #21258 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-23 21:10:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d5b981f8b1 
					 
					
						
						
							
							[DP] Internal Load Balancing Per Node [one-pod-per-node] ( #21238 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-07-23 20:57:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eec6942014 
					 
					
						
						
							
							[BugFix] Fix KVConnector TP worker aggregation ( #21473 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-23 20:56:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fd48d99ffd 
					 
					
						
						
							
							[BugFix]: Batch generation from prompt_embeds fails for long prompts ( #21390 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: KazusatoOko <kazusto.oko@sakana.ai >
Co-authored-by: KazusatoOko <kazusto.oko@sakana.ai > 
						
						
					 
					
						2025-07-23 20:43:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f8c15c4efb 
					 
					
						
						
							
							[Bugfix] Fix example disagg_example_p2p_nccl_xpyd.sh zombie process ( #21437 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Chen <530634352@qq.com > 
						
						
					 
					
						2025-07-23 20:42:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa08a954f9 
					 
					
						
						
							
							[Bugfix] Fix casing warning ( #21468 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Bonanni <mbonanni@redhat.com > 
						
						
					 
					
						2025-07-23 20:41:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						13e4ee1dc3 
					 
					
						
						
							
							[XPU][UT] increase intel xpu CI test scope ( #21492 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com > 
						
						
					 
					
						2025-07-23 20:24:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						772ce5af97 
					 
					
						
						
							
							[Misc] Add dummy maverick test to CI ( #21324 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-23 20:22:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63d92abb7c 
					 
					
						
						
							
							[Frontend] Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding ( #21374 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Deven Labovitch <deven@videa.ai > 
						
						
					 
					
						2025-07-23 20:22:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11599b0e1f 
					 
					
						
						
							
							feat(gguf_loader): accept HF repo paths & URLs for GGUF ( #20793 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hardik <hardikgupta1999@gmail.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-23 20:21:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f3137cdd81 
					 
					
						
						
							
							[Core] Freeze gc during cuda graph capture to speed up init ( #21146 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Codex <codex@openai.com >
Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-23 17:20:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82ec66f514 
					 
					
						
						
							
							[V0 Deprecation] Remove Prompt Adapters ( #20588 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-23 16:36:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78c13e30e1 
					 
					
						
						
							
							[V1] Fix local chunked attention always disabled ( #21419 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-07-23 15:59:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c9b807b34 
					 
					
						
						
							
							[Core] Add reload_weights RPC method ( #20096 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-23 14:24:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14bf19e39f 
					 
					
						
						
							
							[TPU][TEST] Fix the downloading issue in TPU v1 test 11.  ( #21418 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-23 11:29:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ac7713e32 
					 
					
						
						
							
							Add test case for compiling multiple graphs ( #21044 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-07-23 11:00:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8560a5b258 
					 
					
						
						
							
							[Core][Model] PrithviMAE Enablement on vLLM v1 engine ( #20577 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Christian Pinto <christian.pinto@ibm.com > 
						
						
					 
					
						2025-07-23 11:00:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						316b1bf706 
					 
					
						
						
							
							[Tests] Add tests for headless internal DP LB ( #21450 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-23 07:49:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c734ee09b 
					 
					
						
						
							
							[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. ( #21364 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com > 
						
						
					 
					
						2025-07-23 06:34:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f59ec35b7f 
					 
					
						
						
							
							[V1] Check all pooling tasks during profiling ( #21299 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-23 05:53:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2671334d45 
					 
					
						
						
							
							[Model] add Hunyuan V1 Dense Model support. ( #21368 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Asher Zhang <asherszhang@tencent.com > 
						
						
					 
					
						2025-07-23 03:54:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2cc5016a19 
					 
					
						
						
							
							[Docs] Clean up v1/metrics.md ( #21449 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-23 03:37:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6929f8b437 
					 
					
						
						
							
							[Misc] fixed nvfp4_moe test failures due to invalid kwargs ( #21246 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang Chen <yangche@fb.com > 
						
						
					 
					
						2025-07-23 01:41:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32ec9e2f2a 
					 
					
						
						
							
							Mamba V2 Test not Asserting Failures.  ( #21379 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com > 
						
						
					 
					
						2025-07-23 01:40:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						accac82928 
					 
					
						
						
							
							[Sampler] Introduce logprobs mode for logging ( #21398 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-23 01:39:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23637dcdef 
					 
					
						
						
							
							[Docs] Fix bullets and grammars in tool_calling.md ( #21440 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-23 01:23:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6364af92f8 
					 
					
						
						
							
							Fixed typo in profiling logs ( #21441 )  
						
						 
						
						
						
						
					 
					
						2025-07-23 01:18:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7aaa2bd5a8 
					 
					
						
						
							
							[Bugfix] ensure tool_choice is popped when tool_choice:null is passed in json payload ( #19679 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-07-23 00:30:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f5c14de6a 
					 
					
						
						
							
							add clear messages for deprecated models ( #21424 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-07-23 00:03:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f002e9a870 
					 
					
						
						
							
							[Cleanup] Only log MoE DP setup warning if DP is enabled ( #21315 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-23 00:02:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a1f3610fc6 
					 
					
						
						
							
							[Core] Add basic unit test for maybe_evict_cached_block ( #21400 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-23 00:02:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ecedd1806 
					 
					
						
						
							
							[Bugfix] Fix nightly transformers CI failure ( #21427 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-23 00:01:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						107111a859 
					 
					
						
						
							
							Changing "amdproduction" allocation. ( #21409 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-07-22 20:48:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2dec7c1a5d 
					 
					
						
						
							
							[Bugfix][CUDA] fixes CUDA FP8 kv cache dtype supported ( #21420 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com > 
						
						
					 
					
						2025-07-22 20:34:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08d2bd78da 
					 
					
						
						
							
							[BUGFIX] deepseek-v2-lite failed due to fused_qkv_a_proj name update ( #21414 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-07-22 20:33:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f76a05f4f 
					 
					
						
						
							
							[BugFix] Update python to python3 calls for image; fix prefix & input calculations. ( #21391 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eric Hanley <ericehanley@google.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-22 20:33:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f154bb9ff0 
					 
					
						
						
							
							Simplify weight loading in Transformers backend ( #21382 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-22 20:29:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ec7170ff1 
					 
					
						
						
							
							[Bugfix][ROCm][Build] Fix build regression on ROCm ( #21393 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-22 20:27:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c401c64b4c 
					 
					
						
						
							
							[CI/Build] Fix model executor tests ( #21387 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-22 20:25:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b77c7d327f 
					 
					
						
						
							
							[BugFix] Fix ray import error mem cleanup bug ( #21381 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com > 
						
						
					 
					
						2025-07-22 16:19:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35bc8bd5fb 
					 
					
						
						
							
							[Misc] Copy HF_TOKEN env var to Ray workers ( #21406 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-22 16:18:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4594fc3b28 
					 
					
						
						
							
							[Model] Add Qwen3CoderToolParser ( #21396 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: simon-mo <xmo@berkeley.edu > 
						
						
					 
					
						2025-07-22 15:05:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae268b6326 
					 
					
						
						
							
							Fix Flashinfer Allreduce+Norm enable disable calculation based on fi_allreduce_fusion_max_token_num ( #21325 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: XIn Li <xinli@nvidia.com > 
						
						
					 
					
						2025-07-22 12:42:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35366ae57c 
					 
					
						
						
							
							[CI/Build] Fix test failure due to updated model repo ( #21375 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-22 08:39:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2226d5bd85 
					 
					
						
						
							
							[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers ( #21353 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ariG23498 <aritra.born2fly@gmail.com > 
						
						
					 
					
						2025-07-22 08:27:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44554a0068 
					 
					
						
						
							
							Add tokenization_kwargs to encode for embedding model truncation ( #21033 )  
						
						 
						
						
						
						
					 
					
						2025-07-22 08:24:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						226b452a20 
					 
					
						
						
							
							Revert "[Refactor] Fix Compile Warning #1444-D ( #21208 )" ( #21384 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-22 08:22:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f38ee34a0a 
					 
					
						
						
							
							[feat] Enable mm caching for transformers backend ( #21358 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: raushan <raushan@huggingface.co > 
						
						
					 
					
						2025-07-22 08:18:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b194557a6c 
					 
					
						
						
							
							Adds parallel model weight loading for runai_streamer ( #21330 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: bbartels <benjamin@bartels.dev >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-22 08:15:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						774d0c014b 
					 
					
						
						
							
							[Perf] Cuda Kernel for Per Token Group Quant ( #21083 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-22 07:27:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c8db17cfd 
					 
					
						
						
							
							[feat]: add SM100 support for cutlass FP8 groupGEMM ( #20447 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com >
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-22 07:27:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4fb56914c5 
					 
					
						
						
							
							[perf] Add fused MLA QKV + strided layernorm ( #21116 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-22 07:07:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0df4d9b06b 
					 
					
						
						
							
							[Misc] unify variable for LLM instance v2 ( #21356 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-22 06:32:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed25054577 
					 
					
						
						
							
							[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool ( #21222 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-22 06:17:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10904e6d75 
					 
					
						
						
							
							[benchmark] Port benchmark request sent optimization to benchmark_serving ( #21209 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-22 05:28:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a32237665d 
					 
					
						
						
							
							[Core] Optimize update checks in LogitsProcessor ( #21245 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-22 05:27:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc8a8ce5ec 
					 
					
						
						
							
							[Misc] Remove deprecated args in v0.10 ( #21349 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-07-22 05:26:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32142b3c62 
					 
					
						
						
							
							[Bugfix] Fix eviction cached blocked logic ( #21357 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-07-22 01:18:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82b8027be6 
					 
					
						
						
							
							Add arcee model ( #21296 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: alyosha-swamy <raghav@arcee.ai >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-22 00:57:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3779eb8c81 
					 
					
						
						
							
							[Feature][eplb] add verify ep or tp or dp ( #21102 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-21 23:41:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9e23ad9655 
					 
					
						
						
							
							Update fp4 quantize API ( #21327 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shu Wang <shuw@nvidia.com > 
						
						
					 
					
						2025-07-21 23:40:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e69a92a1ce 
					 
					
						
						
							
							[Bug] DeepGemm: Fix Cuda Init Error ( #21312 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-21 23:36:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8425f785ad 
					 
					
						
						
							
							[Misc] DeepEPHighThroughtput - Enable Inductor pass ( #21311 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-21 23:35:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c17231e827 
					 
					
						
						
							
							Fix kv_cache_dtype handling for out-of-tree HPU plugin ( #21302 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Konrad Zawora <kzawora@habana.ai >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-07-21 23:35:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e5b5ca580 
					 
					
						
						
							
							[Refactor] Fix Compile Warning #1444-D ( #21208 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-21 23:33:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						488d8a986a 
					 
					
						
						
							
							[V1] [Hybrid] Add new test to verify that hybrid views into KVCacheTensor are compatible ( #21300 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-21 23:31:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af376ca19d 
					 
					
						
						
							
							[Core] Minimize number of dict lookup in _maybe_evict_cached_block ( #21281 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-21 22:37:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7b2042681 
					 
					
						
						
							
							Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE ( #20762 ) ( #21334 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-07-21 21:49:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90f1e55421 
					 
					
						
						
							
							[Intel GPU] Ray Compiled Graph avoid NCCL for Intel GPU ( #21338 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ratnampa <ratnam.parikh@intel.com > 
						
						
					 
					
						2025-07-21 21:48:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e70dcd6e6 
					 
					
						
						
							
							[Doc] Fix CPU doc format ( #21316 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-21 21:47:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						25d585ab7b 
					 
					
						
						
							
							[XPU] Enable external_launcher to serve as an executor via torchrun ( #21021 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chzhang <chaojun.zhang@intel.com > 
						
						
					 
					
						2025-07-21 21:47:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d0a01a5f2 
					 
					
						
						
							
							[v1][sampler] Inplace logprobs comparison to get the token rank ( #21283 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-21 13:47:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ec82edda5 
					 
					
						
						
							
							[perf] Speed up align sum kernels ( #21079 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Himanshu Jaju <hj@mistral.ai > 
						
						
					 
					
						2025-07-21 11:19:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						005ae9be6c 
					 
					
						
						
							
							Fix bad lm-eval fork ( #21318 )  
						
						 
						
						
						
						
					 
					
						2025-07-21 10:47:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29d1ffc5b4 
					 
					
						
						
							
							[DP] Fix Prometheus Logging ( #21257 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-07-21 09:11:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						304dce7ec0 
					 
					
						
						
							
							[Attention] Clean up iRoPE in V1 ( #21188 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-21 09:10:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ece16c4fe 
					 
					
						
						
							
							[Misc] Add dummy maverick test ( #21199 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-21 09:08:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0e827e07c 
					 
					
						
						
							
							[BugFix] make utils.current_stream thread-safety ( #21252 ) ( #21253 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simpx <simpxx@gmail.com > 
						
						
					 
					
						2025-07-21 09:07:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a15a50fc17 
					 
					
						
						
							
							[CPU] Enable shared-memory based pipeline parallel for CPU backend ( #21289 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-21 09:07:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6dda13c86b 
					 
					
						
						
							
							[Misc] Add sliding window to flashinfer test ( #21282 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-21 08:37:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6b46c4b653 
					 
					
						
						
							
							Add Nvidia ModelOpt config adaptation ( #19815 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com > 
						
						
					 
					
						2025-07-21 10:02:58 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d97841078b 
					 
					
						
						
							
							[Misc] unify variable for LLM instance ( #20996 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-21 12:18:33 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e6b90a2805 
					 
					
						
						
							
							[Docs] Make tables more space efficient in supported_models.md ( #21291 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-21 02:25:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be54a951a3 
					 
					
						
						
							
							[Docs] Fix hardcoded links in docs ( #21287 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-21 02:23:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						042af0c8d3 
					 
					
						
						
							
							[Model][1/N] Support multiple poolers at model level ( #21227 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-21 02:22:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						378d33c392 
					 
					
						
						
							
							[Bugfix] Fix missing placeholder in logger debug ( #21280 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-20 22:50:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						940af1f03a 
					 
					
						
						
							
							Add the instruction to run e2e validation manually before release ( #21023 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-07-20 22:29:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						92615d7fe8 
					 
					
						
						
							
							[Docs] Add RFC Meeting to Issue Template ( #21279 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-07-20 21:58:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8188196a1c 
					 
					
						
						
							
							[CI] Cleanup modelscope version constraint in Dockerfile ( #21243 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-07-20 20:13:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ba34b1241 
					 
					
						
						
							
							[bugfix] fix syntax warning caused by backslash ( #21251 )  
						
						 
						
						
						
						
					 
					
						2025-07-20 17:12:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9499e26e2a 
					 
					
						
						
							
							[Model] Support VLMs with transformers backend ( #20543 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: raushan <raushan@huggingface.co >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-20 13:25:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51ba839555 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for bart ( #18299 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: calvin chen <120380290@qq.com > 
						
						
					 
					
						2025-07-20 08:15:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d1fb65bde3 
					 
					
						
						
							
							Enable v1 metrics tests ( #20953 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-07-20 03:22:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a1d8940ae 
					 
					
						
						
							
							[TPU] support fp8 kv cache quantization ( #19292 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-20 03:01:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b504eb770 
					 
					
						
						
							
							[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. ( #21233 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-19 16:09:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10eb24cc91 
					 
					
						
						
							
							GLM-4 Update ( #20736 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-07-19 22:40:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e8cbb58f3 
					 
					
						
						
							
							[BugFix] Fix full cuda graph slot_mapping ( #21228 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com > 
						
						
					 
					
						2025-07-19 14:13:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						752c6ade2e 
					 
					
						
						
							
							[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small ( #21217 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-19 13:53:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						881e3cbe3b 
					 
					
						
						
							
							[V1] [Hybrid] Enable piecewise CUDA Graph for mamba layers  ( #21194 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-19 19:27:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f414a12ad 
					 
					
						
						
							
							[BugFix] Make PD work with Ray ( #21072 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-07-19 08:46:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a971ed692 
					 
					
						
						
							
							[Docs] Update the link to the 'Prometheus/Grafana' example ( #21225 )  
						
						 
						
						
						
						
					 
					
						2025-07-19 06:58:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da6579bf41 
					 
					
						
						
							
							[CI/CD][bugfix]fix: error argument to loads has incompatible type ( #21223 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Signed-off-by: Sungjae Lee <sung-jae.lee@navercorp.com > 
						
						
					 
					
						2025-07-19 05:16:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c81259d33a 
					 
					
						
						
							
							Fix/remove some broken model executor tests ( #21224 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rabi Mishra <ramishra@redhat.com > 
						
						
					 
					
						2025-07-19 12:15:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e3a0e43d7f 
					 
					
						
						
							
							[bugfix] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code ( #21032 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-19 05:13:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3d82108e7 
					 
					
						
						
							
							[Bugfix][Frontend] Fix openai CLI arg middleware ( #21220 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-19 02:40:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d0734c562 
					 
					
						
						
							
							[NVIDIA] Add SM100 Flashinfer MoE blockscale fp8 backend for low latency ( #20645 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kaixih <kaixih@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-19 02:33:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7d94577138 
					 
					
						
						
							
							Add torch golden impl for moe_align_block_size kernel test ( #20653 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shixian Cui <shixian@amazon.com >
Co-authored-by: Shixian Cui <shixian@amazon.com > 
						
						
					 
					
						2025-07-19 02:32:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						59f935300c 
					 
					
						
						
							
							[BugFix] Fix potential cuda-graph IMA ( #21196 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-19 02:18:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						18e519ec86 
					 
					
						
						
							
							[Bugfix] Fix ndarray video color from VideoAsset ( #21064 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-19 02:17:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1eaff27815 
					 
					
						
						
							
							[V0 deprecation] Remove long context LoRA ( #21169 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-19 02:15:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf8cc32674 
					 
					
						
						
							
							Fix a couple of Voxtral tests ( #21218 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-07-19 09:13:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a2cb2649d 
					 
					
						
						
							
							[Misc][Tools][Benchmark] Add readme file for auto_tune script ( #20779 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-07-19 09:06:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e04107d97 
					 
					
						
						
							
							[Model] EXAONE 4.0 model support ( #21060 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Deepfocused <rlawhdrhs27@gmail.com >
Signed-off-by: woongsik <rlawhdrhs27@gmail.com > 
						
						
					 
					
						2025-07-19 14:25:44 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37bd8d6e4c 
					 
					
						
						
							
							[Bug] DeepGemm: Fix TypeError: per_block_cast_to_fp8() missing 1 required positional argument: 'use_ue8m0' for SM100 ( #21187 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-18 23:25:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						468e2400fe 
					 
					
						
						
							
							[BugFix][CPU] Fix TorchSDPABackendImpl doesn't have use_irope  ( #21200 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-18 23:18:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dcc6cfb991 
					 
					
						
						
							
							[Kernel][Performance] Tweak MoE Batched silu_mul_fp8_quant_deep_gemm kernel ( #21193 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-18 23:09:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd572c0ab3 
					 
					
						
						
							
							[V0 Deprecation] Remove V0 Spec Decode workers ( #21152 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-18 21:47:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ffe905a41 
					 
					
						
						
							
							[Bugfix][Model] Fix LoRA for Mistral-Small-3.1-24B-Instruct-2503 ( #21183 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-07-18 21:15:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a9fda1423 
					 
					
						
						
							
							[Core] Support Local Chunked Attention for Hybrid KV Cache ( #19351 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucia Fang <fanglu@fb.com >
Signed-off-by: Lu Fang <fanglu@meta.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lu Fang <fanglu@meta.com > 
						
						
					 
					
						2025-07-18 20:48:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						466e878f2a 
					 
					
						
						
							
							[Quantization] Enable BNB support for more MoE models ( #21100 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-18 17:52:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						217937221b 
					 
					
						
						
							
							Elastic Expert Parallel Initial Support ( #20775 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-18 17:46:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5782581acf 
					 
					
						
						
							
							[Bugfix] Voxtral on Blackwell GPUs (RTX 50 series) ( #21077 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: hax0r31337 <liulihaocaiqwq@gmail.com > 
						
						
					 
					
						2025-07-18 18:40:18 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0f199f197b 
					 
					
						
						
							
							[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue ( #21005 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <jialino@meta.com > 
						
						
					 
					
						2025-07-18 12:34:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2eb2b5ad7 
					 
					
						
						
							
							[Kernel] Apply torch.Tag.needs_fixed_stride_order only for torch==2.6.0 ( #19346 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-18 14:10:21 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21274ab476 
					 
					
						
						
							
							[CI] Update CODEOWNERS for vllm/compilation ( #21185 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-18 06:51:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed8cbfedf8 
					 
					
						
						
							
							Let GraniteMoeAttention use YaRN ( #21174 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-18 05:52:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45badd05d0 
					 
					
						
						
							
							[Core] Set pooling params based on task and model ( #21128 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-18 05:41:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4adc66f64d 
					 
					
						
						
							
							[Bugfix] Allocate less memory in non-batched CUTLASS MoE ( #21121 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-07-18 18:55:52 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55ad648715 
					 
					
						
						
							
							[Doc] Fix typo in model name ( #21178 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-18 03:55:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5895afd780 
					 
					
						
						
							
							[Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. ( #20750 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-18 09:10:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ca4eb82bcb 
					 
					
						
						
							
							[Model] Re-add the implicit conversion feature for as_seq_cls_model ( #21103 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-18 07:15:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba2dfbb0c2 
					 
					
						
						
							
							[Misc] Make MM embedding merge interface explicit in model runner ( #21147 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <hey@rogerw.me >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-18 07:13:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1bf65138f6 
					 
					
						
						
							
							[benchmark] Sending request strictly follows the random intervals ( #21108 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com > 
						
						
					 
					
						2025-07-18 06:22:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54cf1cae62 
					 
					
						
						
							
							[Misc] Do not print async output warning for v1 ( #21151 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-17 21:57:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5780121c95 
					 
					
						
						
							
							[Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm ( #20911 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shixian Cui <shixian@amazon.com >
Co-authored-by: Shixian Cui <shixian@amazon.com > 
						
						
					 
					
						2025-07-18 04:34:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7d8724e78 
					 
					
						
						
							
							[Core] FlashInfer CUTLASS fused MoE backend (NVFP4) ( #20037 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shuw <shuw@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-17 21:32:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b38baabcf9 
					 
					
						
						
							
							[Doc] Add inplace weights loading example ( #19640 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-17 21:12:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89cab4d01f 
					 
					
						
						
							
							[Attention] Make local attention backend agnostic ( #21093 )  
						
						 
						
						
						
						
					 
					
						2025-07-18 00:10:42 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9a21e9173 
					 
					
						
						
							
							[Docs] Update supported models documentation with missing models ( #20844 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-07-17 20:12:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c4e3b12524 
					 
					
						
						
							
							[Docs] Add minimal demo of Ray Data API usage ( #21080 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-17 20:09:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8dfb45ca33 
					 
					
						
						
							
							[Bugfix] Fix the tensor non-contiguous issue for Flashinfer TRT-LLM backend attention kernel ( #21133 )  
						
						 
						
						
						
						
					 
					
						2025-07-18 00:35:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a8fc94639 
					 
					
						
						
							
							[Log] Debugging Log with more Information ( #20770 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-18 00:19:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4de7146351 
					 
					
						
						
							
							[V0 deprecation] Remove V0 HPU backend ( #21131 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-17 16:37:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac9fb732a5 
					 
					
						
						
							
							On environments where numa cannot be detected we get 0 ( #21115 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-07-17 18:52:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3a6c695f4 
					 
					
						
						
							
							[Misc] Qwen MoE model supports LoRA ( #20932 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-17 18:32:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90bd2ab6e3 
					 
					
						
						
							
							[Model] Update pooling model interface ( #21058 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-17 16:05:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9fb2d22032 
					 
					
						
						
							
							[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE ( #20762 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-07-17 09:56:44 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2d6a38209b 
					 
					
						
						
							
							[Docs] Move code block out of admonition now that it's short ( #21118 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-17 06:12:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89e3c4e9b4 
					 
					
						
						
							
							[Misc] Avoid unnecessary import ( #21106 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-07-17 12:57:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe8a2c544a 
					 
					
						
						
							
							[Docs] Improve docstring formatting for FusedMoEParallelConfig.make ( #21117 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-17 04:13:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ef00b5cac 
					 
					
						
						
							
							[VLM] Add Nemotron-Nano-VL-8B-V1 support ( #20349 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Huang <kylhuang@nvidia.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-17 03:07:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a7fb3ab9e 
					 
					
						
						
							
							[Model] Add ToolParser and MoE Config for Hunyuan A13B  ( #20820 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Asher Zhang <asherszhang@tencent.com > 
						
						
					 
					
						2025-07-17 09:10:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11dfdf21bf 
					 
					
						
						
							
							[Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels  ( #20903 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-17 08:10:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdc5b43d20 
					 
					
						
						
							
							[Bugfix]: Fix final_res_batch list index out of range error ( #21055 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-17 00:29:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c5b8b5953a 
					 
					
						
						
							
							[Misc] Fix PhiMoE expert mapping ( #21085 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-17 05:47:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4fcef49ec4 
					 
					
						
						
							
							[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation ( #21048 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com > 
						
						
					 
					
						2025-07-17 13:29:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a4e5c5f3c 
					 
					
						
						
							
							[V1][P/D]Enhance Performance and code readability for P2pNcclConnector ( #20906 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com > 
						
						
					 
					
						2025-07-16 22:13:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						76b494444f 
					 
					
						
						
							
							[Attention] Refactor attention metadata builder interface ( #20466 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-17 04:44:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						28a6d5423d 
					 
					
						
						
							
							[Bugfix] Fix Machete zero point issue for GPTQ models on SM90 ( #21066 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-16 19:54:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58760e12b1 
					 
					
						
						
							
							[TPU] Start using python 3.12 ( #21000 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-07-16 19:37:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a50d918225 
					 
					
						
						
							
							[Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile ( #21013 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-16 19:37:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9ba8104ed 
					 
					
						
						
							
							[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group ( #21024 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: KevinXiong-C <kevin_xiong1997@outlook.com > 
						
						
					 
					
						2025-07-16 19:36:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e7dfbe7b4 
					 
					
						
						
							
							Update PyTorch to torch==2.7.1 for CUDA ( #21011 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-17 02:30:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72ad273582 
					 
					
						
						
							
							Remove torch_xla.tpu.version() from pallas.py. ( #21065 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-17 00:25:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01513a334a 
					 
					
						
						
							
							Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor) ( #12010 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nir David <ndavid@habana.ai >
Signed-off-by: Uri Livne <ulivne@habana.ai >
Co-authored-by: Uri Livne <ulivne@habana.ai > 
						
						
					 
					
						2025-07-16 15:33:41 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac2bf41e53 
					 
					
						
						
							
							[Model] Remove model sampler ( #21059 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-16 19:03:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a931b4cdcf 
					 
					
						
						
							
							Remove Qwen Omni workaround that's no longer necessary ( #21057 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-16 16:25:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0f8a79646 
					 
					
						
						
							
							[fix] fix qwen image_embeds input ( #21049 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai > 
						
						
					 
					
						2025-07-16 15:17:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						18bdcf4113 
					 
					
						
						
							
							feat - add a new endpoint get_tokenizer_info to provide tokenizer/chat-template information ( #20575 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: m-misiura <mmisiura@redhat.com > 
						
						
					 
					
						2025-07-16 21:52:14 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c3198b6c4 
					 
					
						
						
							
							[Model] Consolidate pooler implementations ( #20927 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-16 13:39:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						260127ea54 
					 
					
						
						
							
							[Docs] Add intro and fix 1-2-3 list in frameworks/open-webui.md ( #19199 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-16 06:11:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d0dc4cfca4 
					 
					
						
						
							
							Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests ( #20831 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-07-16 00:14:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d31a647124 
					 
					
						
						
							
							[BugFix] Fix import error on non-blackwell machines ( #21020 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-15 22:27:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						85431bd9ad 
					 
					
						
						
							
							[TPU] fix kv_cache_update kernel block size choosing logic ( #21007 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-16 04:39:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c11013db8b 
					 
					
						
						
							
							[Meta] Llama4 EAGLE Support ( #20591 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: qizixi <qizixi@meta.com > 
						
						
					 
					
						2025-07-15 21:14:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1eb2b9c102 
					 
					
						
						
							
							[CI] update typos config for CI pre-commit and fix some spells ( #20919 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-07-15 21:12:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ebf313790 
					 
					
						
						
							
							Avoid direct comparison of floating point numbers ( #21002 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-07-15 21:12:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cfbcb9ed87 
					 
					
						
						
							
							[Voxtral] Add more tests ( #21010 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-15 21:11:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						76ddeff293 
					 
					
						
						
							
							[Doc] Remove duplicate docstring ( #21012 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-15 20:09:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f46098335b 
					 
					
						
						
							
							[Bugfix] Fix Mistral3 support on SM100/SM120 ( #20998 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-15 20:08:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e9534c7202 
					 
					
						
						
							
							[CI][HPU] update for v0 deprecate by switching to VLLM_TARGET_DEVICE=empty ( #21006 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi.Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-07-15 20:07:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7976446015 
					 
					
						
						
							
							Add Dockerfile argument for VLLM_USE_PRECOMPILED environment ( #20943 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dougbtv <dosmith@redhat.com > 
						
						
					 
					
						2025-07-15 19:53:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fcb9f879c1 
					 
					
						
						
							
							[Bugfix] Correct per_act_token in CompressedTensorsW8A8Fp8MoECutlassM… ( #20937 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-07-15 19:53:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ed94f9d0a 
					 
					
						
						
							
							[Docs] Enhance Anyscale documentation, add quickstart links for vLLM ( #21018 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-15 19:46:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa839565f2 
					 
					
						
						
							
							[Misc] Refactor: Improve argument handling for conda command ( #20481 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-15 19:43:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75a99b98bf 
					 
					
						
						
							
							[Chore] Remove outdated transformers check ( #20989 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-07-15 19:42:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b5c3b68359 
					 
					
						
						
							
							[Misc] bump xgrammar version to v0.1.21 ( #20992 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-15 19:42:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6cbc4d4bea 
					 
					
						
						
							
							[Model] Add ModelConfig class for GraniteMoeHybrid to override default max_seq_len_to_capture ( #20923 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-15 19:19:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						153c6f1e61 
					 
					
						
						
							
							[Frontend] Remove print left in FrontendArgs.add_cli_args ( #21004 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-15 19:18:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34cda778a0 
					 
					
						
						
							
							[Frontend] OpenAI Responses API supports input image ( #20975 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-15 18:59:36 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30800b01c2 
					 
					
						
						
							
							[Nvidia] Integrate SM100 cudnn prefill API to MLA prefill ( #20411 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Elfie Guo <elfieg@nvidia.com >
Co-authored-by: Elfie Guo <eflieg@nvidia.com > 
						
						
					 
					
						2025-07-15 17:56:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10be209493 
					 
					
						
						
							
							[Bug Fix] get_distributed_init_method should get the ip from get_ip i… ( #20889 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Li <lcpingping@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-07-15 21:23:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						19c863068b 
					 
					
						
						
							
							[Frontend] Support cache_salt in /v1/completions and /v1/responses ( #20981 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 21:01:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f29fd8a7f8 
					 
					
						
						
							
							[BugFix] fix 3 issues: (1) using metadata for causal-conv1d, (2) indexing overflow in v1 vLLM, and (3) init_states in v0 ( #20838 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com >
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com > 
						
						
					 
					
						2025-07-15 16:08:26 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed10f3cea1 
					 
					
						
						
							
							[ROCm] warpSize is being made non constexpr in ROCm 7.0 ( #20330 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-15 14:01:44 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b637e9dcb8 
					 
					
						
						
							
							Add full serve CLI reference back to docs ( #20978 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 17:42:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e36c8687e 
					 
					
						
						
							
							[Deprecation] Remove nullable_kvs ( #20969 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 17:21:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5bac61362b 
					 
					
						
						
							
							Configure Gemini ( #20971 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 09:37:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						313ae8c16a 
					 
					
						
						
							
							[Deprecation] Remove everything scheduled for removal in v0.10.0 ( #20979 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 15:57:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c847e34b39 
					 
					
						
						
							
							[CI/Build] Fix wrong path in Transformers Nightly Models Test ( #20994 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-15 08:53:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7e3e6d263 
					 
					
						
						
							
							Voxtral ( #20970 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-15 07:35:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ffd963fa0 
					 
					
						
						
							
							[v1][core] Support for attention free models ( #20811 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Christian Pinto <christian.pinto@ibm.com > 
						
						
					 
					
						2025-07-15 14:20:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56fe4bedd6 
					 
					
						
						
							
							[Deprecation] Remove TokenizerPoolConfig ( #20968 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-15 14:00:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d91278181d 
					 
					
						
						
							
							[doc] Add more details for Ray-based DP ( #20948 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-15 05:37:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						20149d84d9 
					 
					
						
						
							
							[MISC] Add init files for python package ( #20908 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com > 
						
						
					 
					
						2025-07-15 12:16:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3534c39a20 
					 
					
						
						
							
							[V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli  ( #20840 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-15 04:04:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c586b55667 
					 
					
						
						
							
							[TPU] Optimize kv cache update kernel ( #20415 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yifei Teng <tengyifei88@gmail.com > 
						
						
					 
					
						2025-07-15 03:56:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						33d560001e 
					 
					
						
						
							
							[Docs] Improve documentation for ray cluster launcher helper script ( #20602 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-15 03:55:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f148c44c6a 
					 
					
						
						
							
							[frontend] Refactor CLI Args for a better modular integration ( #20206 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-07-15 02:23:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						235bfd5dfe 
					 
					
						
						
							
							[Docs] Improve documentation for RLHF example ( #20598 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-15 01:54:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68d28e37b0 
					 
					
						
						
							
							[frontend] Add --help=page option for paginated help output ( #20961 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-15 00:42:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37a7d5d74a 
					 
					
						
						
							
							[Misc] Refactor AllReduceFusionPass. Remove parameter ( #20918 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-07-15 06:57:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d4d309409f 
					 
					
						
						
							
							Implement Async Scheduling ( #19970 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-14 23:01:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						85bd6599e4 
					 
					
						
						
							
							[Model] Add AutoWeightsLoader support for BERT, RoBERTa ( #20534 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer He <islandhe@gmail.com >
Signed-off-by: <islandhe@gmail.com >
Signed-off-by: Jen H <islandhe@gmail.com > 
						
						
					 
					
						2025-07-15 13:34:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						91b3d190ae 
					 
					
						
						
							
							[cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir ( #20940 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Boyuan Feng <boyuan@meta.com > 
						
						
					 
					
						2025-07-15 13:02:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc017915f5 
					 
					
						
						
							
							[Doc] Clearer mistral3 and pixtral model support description ( #20926 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-14 21:56:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ad0a4588b 
					 
					
						
						
							
							[Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer ( #20934 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com > 
						
						
					 
					
						2025-07-15 03:27:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						016b8d1b7f 
					 
					
						
						
							
							Enabled BnB NF4 inference on Gaudi ( #20172 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai > 
						
						
					 
					
						2025-07-14 20:26:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						80305c1b24 
					 
					
						
						
							
							[CI] Fix flaky test_streaming_response test ( #20913 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-14 20:15:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37e2ecace2 
					 
					
						
						
							
							feat: add image zoom to improve image viewing experience ( #20763 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-14 20:14:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						054c8657e3 
					 
					
						
						
							
							[Docs] Add Kuberay to deployment integrations ( #20592 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-14 20:13:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d4170fad39 
					 
					
						
						
							
							Use w8a8 quantized matmul Pallas kernel ( #19170 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-07-15 03:06:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						946aadb4a0 
					 
					
						
						
							
							[CI/Build] Split Entrypoints Test into LLM and API Server ( #20945 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-15 02:44:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bcdfb2a330 
					 
					
						
						
							
							[Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM ( #20933 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-15 01:42:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba8c300018 
					 
					
						
						
							
							[BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache ( #20942 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-15 01:26:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8cdc371217 
					 
					
						
						
							
							SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP ( #20769 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-07-15 01:06:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61e20828da 
					 
					
						
						
							
							Fall back if flashinfer comm module not found ( #20936 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yong Hoon Shin <yhshin@meta.com > 
						
						
					 
					
						2025-07-14 23:11:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55e1c66da5 
					 
					
						
						
							
							[Docs] remove outdated performance benchmark ( #20935 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kuntai Du <kuntai@uchicago.edu > 
						
						
					 
					
						2025-07-14 22:14:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86f3ac21ce 
					 
					
						
						
							
							Fix overflow indexing in causal_conv1d kernel ( #20938 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-07-14 21:43:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						149f2435a5 
					 
					
						
						
							
							[Misc] Relax translations tests ( #20856 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-14 20:08:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c0569dbc82 
					 
					
						
						
							
							[Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts ( #20725 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-14 19:47:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8bb43b9c9e 
					 
					
						
						
							
							Add benchmark dataset for mlperf llama tasks ( #20338 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-14 19:10:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						559756214b 
					 
					
						
						
							
							Change default model to Qwen3-0.6B ( #20335 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-07-14 16:54:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d0cf239c6 
					 
					
						
						
							
							[CI/Build] Add Transformers nightly tests in CI ( #20924 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-14 16:33:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3fc964433a 
					 
					
						
						
							
							[Misc] Clean up Aimv2 config registration in Ovis config ( #20921 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-14 15:36:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0caf61c08a 
					 
					
						
						
							
							[CI] Update codeowner for compilation code ( #20929 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-14 08:33:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						667624659b 
					 
					
						
						
							
							[CI] cc folks on changes to vllm/compilation ( #20925 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Zou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-14 07:52:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						38efa28278 
					 
					
						
						
							
							[Model] Add Ling implementation ( #20680 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vito.yy <vito.yy@antgroup.com > 
						
						
					 
					
						2025-07-14 22:10:32 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e8cc53af5e 
					 
					
						
						
							
							[Misc] Log the reason for falling back to FlexAttention ( #20699 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-14 04:16:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4851cfe68 
					 
					
						
						
							
							[Bugfix]: Fix messy code when using logprobs ( #20910 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-14 11:06:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9887e8ec50 
					 
					
						
						
							
							[Misc] Remove unused function ( #20909 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-14 10:48:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f326ab9c88 
					 
					
						
						
							
							[Bugfix] Bump up mistral_common to support v13 tokenizer ( #20905 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-14 10:45:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dcf2a5e208 
					 
					
						
						
							
							[CI/Build] Fix OOM issue in Jina-VL test ( #20907 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-14 10:32:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e9438e0b0 
					 
					
						
						
							
							[MISC] Move bind_kv_cache to worker module ( #20900 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-07-14 09:40:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						697ef765ee 
					 
					
						
						
							
							[Refactor][V1] Move outlines utils for V1 imports ( #20878 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-07-14 00:58:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a99b9f7dee 
					 
					
						
						
							
							[Quantization] add BNB for MixtralForCausalLM ( #20893 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-14 07:34:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c488b928a7 
					 
					
						
						
							
							[ROCm] [Bugfix] [Critical]: Fix mamba compilation bug ( #20883 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-07-14 15:23:28 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c7fa47161 
					 
					
						
						
							
							Fix: Add missing EOFError handling in CLI complete command ( #20896 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-14 07:09:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88fc8a97e3 
					 
					
						
						
							
							Removing redundant python version check ( #20888 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dannyso05 <dansong1177@gmail.com > 
						
						
					 
					
						2025-07-14 06:15:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66f6fbd393 
					 
					
						
						
							
							[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) ( #20511 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com > 
						
						
					 
					
						2025-07-14 02:45:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8632e831ba 
					 
					
						
						
							
							[Core] Add update_config RPC method ( #20095 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com > 
						
						
					 
					
						2025-07-14 00:49:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4bbfc36b16 
					 
					
						
						
							
							[V1] Hybrid allocator without prefix caching ( #20661 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com > 
						
						
					 
					
						2025-07-13 16:55:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						80d38b8ac8 
					 
					
						
						
							
							[V1] [ROCm] [AITER] Upgrade AITER to commit 916bf3c and bugfix APIs ( #20880 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-07-13 15:19:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						211b6a6113 
					 
					
						
						
							
							[Bugfix] fix define of RerankDocument ( #20877 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: liuchenlong <liuchenlong@xiaohongshu.com >
Co-authored-by: liuchenlong <liuchenlong@xiaohongshu.com > 
						
						
					 
					
						2025-07-13 14:32:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						247102f07f 
					 
					
						
						
							
							[Bugfix] Fix: add patch_rope_scaling after hf override ( #20857 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wang Siyuan <wsy0227@sjtu.edu.cn >
Signed-off-by: Wang Siyuan <sywang0227@gmail.com > 
						
						
					 
					
						2025-07-13 00:13:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd4c1e6fdb 
					 
					
						
						
							
							Support for LlamaForSequenceClassification ( #20807 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: thechaos16 <thechaos16@gmail.com > 
						
						
					 
					
						2025-07-13 00:09:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99b4f080d8 
					 
					
						
						
							
							Renable google/gemma-3-1b-it accuracy test. ( #20866 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-12 21:48:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						020f58abcd 
					 
					
						
						
							
							[Core] Support multiple tasks per model ( #20771 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-12 19:40:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c1acd6d7d4 
					 
					
						
						
							
							[Refactor] Change the way of import triton ( #20774 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-12 19:39:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b3b778d4a 
					 
					
						
						
							
							[Bugfix] Fix a couple PPLX+CUTLASS MoE bugs ( #20825 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-07-12 19:39:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						42d440c22b 
					 
					
						
						
							
							[Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant ( #20841 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-12 19:38:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f45a332886 
					 
					
						
						
							
							[Sched] Enhance the logic to remove stopped requests from queues ( #20739 )  
						
						 
						
						
						
						
					 
					
						2025-07-12 15:33:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e2c176e1f 
					 
					
						
						
							
							[Bugfix] Restrict Machete to only run on Hopper ( #20830 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-12 17:34:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a86754a12b 
					 
					
						
						
							
							[docs] convert supported configs to table ( #20858 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-12 06:54:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c2a2f19aba 
					 
					
						
						
							
							[Bugfix] Fix Tensor Parallelism Padding Consistency in Granite Models ( #20843 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-07-12 06:11:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c11a738b3 
					 
					
						
						
							
							[Model] New model support for microsoft/Phi-4-mini-flash-reasoning ( #20702 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Congcong Chen <congcongchen@microsoft.com > 
						
						
					 
					
						2025-07-12 06:02:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b639327ad9 
					 
					
						
						
							
							Revert "Use NVCC --compress-mode to reduce binary size by 30%  #20694 " ( #20853 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 23:07:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4afe687a82 
					 
					
						
						
							
							Enable ModelOpt Llama4 fp8 checkpoint deployment ( #20419 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com > 
						
						
					 
					
						2025-07-11 23:07:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5de8d9f111 
					 
					
						
						
							
							Remove extra tensor on CPU ( #20693 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-07-12 14:06:34 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c1c8ca57ff 
					 
					
						
						
							
							[cold start time] add envs.VLLM_COMPILE_DEPYF to guard decompile ( #20790 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Boyuan Feng <boyuan@meta.com > 
						
						
					 
					
						2025-07-11 23:06:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3a5a47e48 
					 
					
						
						
							
							[Bugfix] Fix torch.compile x LoRA for PyTorch 2.8  ( #20823 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-07-11 23:06:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb25e95688 
					 
					
						
						
							
							[Docs] Update basic.md ( #20846 )  
						
						 
						
						
						
						
					 
					
						2025-07-11 23:05:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d4891cd03 
					 
					
						
						
							
							[Bug] Fix DeepGemm for EP low latency case ( #20833 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-11 23:05:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f56d2996ca 
					 
					
						
						
							
							[Misc] Respect no_use_tqdm_on_load flag while capturing CUDA graph ( #20834 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Linkun <github@lkchen.net > 
						
						
					 
					
						2025-07-11 23:04:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						147afb448b 
					 
					
						
						
							
							[Bugfix] Replace unavailable video url in multimodal test ( #20854 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-12 05:25:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c7d942da8 
					 
					
						
						
							
							[Frontend] Abstract prompt and SpeechToTextConfig for transcriptions models ( #20637 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-11 21:33:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						890323dc1b 
					 
					
						
						
							
							[Bugfix] : Fix typo - logger.warn_once -> logger.warning_once ( #20852 )  
						
						 
						
						
						
						
					 
					
						2025-07-11 20:56:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01cae37713 
					 
					
						
						
							
							[CI/Build] Ensure compatability with Transformers v4.53 ( #20541 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-11 20:53:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11c0198615 
					 
					
						
						
							
							[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading ( #20682 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-11 20:52:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1235c3e10 
					 
					
						
						
							
							[Bugfix] Lazy import fused_experts in BitsAndBytesMoEMethod to avoid break not-cuda-alike devices  ( #20822 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-11 20:52:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44d02f54db 
					 
					
						
						
							
							[Misc] Restrict deep_gemm's log output ( #20827 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-11 20:50:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a8593237c0 
					 
					
						
						
							
							Add pynccl all-gatherv and reducescatterv ( #20154 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Trevor Morris <tmorris@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 18:59:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc0f41d10a 
					 
					
						
						
							
							Integration SM100 FlashInfer fused allreduce RMSNorm ( #20691 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-07-11 18:58:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b828e30d5 
					 
					
						
						
							
							[CI Bug] Fix Async Engine, Inputs, Utils, Worker Test: 'State' object has no attribute 'enable_server_load_tracking' ( #20845 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-11 18:57:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5f0af36af5 
					 
					
						
						
							
							Update kimi-k2 tool calling docs, enable unit tests ( #20821 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn >
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team > 
						
						
					 
					
						2025-07-11 20:16:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d21b2664c 
					 
					
						
						
							
							[Bugfix] Fix OOM in language generation test ( #20814 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-11 11:21:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9907fc4494 
					 
					
						
						
							
							[Docs] Data Parallel deployment documentation ( #20768 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-11 09:42:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d47661f0cd 
					 
					
						
						
							
							[Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM ( #20646 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 10:05:33 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53fa457391 
					 
					
						
						
							
							[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility ( #20449 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-11 07:51:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6fb162447b 
					 
					
						
						
							
							[doc] fix ordered list issue ( #20819 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-11 06:49:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66177189c5 
					 
					
						
						
							
							[Bugfix] Add missing field to TritonLanguagePlaceholder ( #20812 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-11 05:25:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4f0b5f9aa 
					 
					
						
						
							
							Temporarily suspend google/gemma-3-1b-it. ( #20722 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-11 11:21:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cbd14ed561 
					 
					
						
						
							
							[Bugfix] Refactor /invocations to be task-agnostic ( #20764 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-11 03:20:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7bd4c37ae7 
					 
					
						
						
							
							[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100).  ( #19825 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: shuw <shuw@nvidia.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 09:23:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8020e98c9f 
					 
					
						
						
							
							[Quantization][1/N] MoE support BNB-Inflight Quantization ( #20061 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-11 08:01:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						762be26a8e 
					 
					
						
						
							
							[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging ( #20777 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Luka Govedic <lgovedic@redhat.com >
Signed-off-by: luka <lgovedic@redhat.com > 
						
						
					 
					
						2025-07-11 00:15:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a9e6b2abf 
					 
					
						
						
							
							[doc] fold long code block ( #20795 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-10 23:16:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5d09152ff1 
					 
					
						
						
							
							[V1] Enable Mamba2 layers other than MambaMixer2 in the v1 engine ( #20660 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com > 
						
						
					 
					
						2025-07-11 05:53:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31d5c1797f 
					 
					
						
						
							
							[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf ( #19830 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Luka Govedic <lgovedic@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 04:56:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35514b682a 
					 
					
						
						
							
							[XPU] XCCL support enabled in torch 2.8.0.dev nightly builds ( #20705 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ratnampa <ratnam.parikh@intel.com > 
						
						
					 
					
						2025-07-10 20:39:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e2de455c34 
					 
					
						
						
							
							[Feature] Integrate SM100 DeepGEMM support ( #20087 )  
						
						 
						
						
						
						
					 
					
						2025-07-10 20:18:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b032352cc 
					 
					
						
						
							
							[Attention] MLA - Flashinfer Ragged Prefill ( #20034 )  
						
						 
						
						
						
						
					 
					
						2025-07-10 20:17:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						922f316441 
					 
					
						
						
							
							[Model] Support HF format of minimax ( #20211 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 02:55:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5923ab9524 
					 
					
						
						
							
							[fix]: disable cutlass block scaled group gemm for EP ( #20781 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com > 
						
						
					 
					
						2025-07-11 02:39:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0cf893cae1 
					 
					
						
						
							
							Add kimi-k2 tool parser ( #20789 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn >
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn >
Co-authored-by: wangzhengtao <wangzhengtao@msh.team > 
						
						
					 
					
						2025-07-11 10:36:23 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf75cd2098 
					 
					
						
						
							
							[CI Bugfix] Specify same TORCH_CUDA_ARCH_LIST for flashinfer aot and install ( #20772 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-11 01:16:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b854321ffe 
					 
					
						
						
							
							[Docs] Lazy import gguf ( #20785 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-07-10 16:06:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b6fe23d05 
					 
					
						
						
							
							[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. ( #20786 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-10 14:52:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f0c98cae27 
					 
					
						
						
							
							[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce  ( #20648 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-10 14:40:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						574ad60db9 
					 
					
						
						
							
							[KVConnector] Always call connector clear_metadata() at end of step ( #20756 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: David Ben-David <sdavidbd@gmail.com > 
						
						
					 
					
						2025-07-10 22:37:27 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdadb6f43a 
					 
					
						
						
							
							[Bugfix] Fused MoE Modular Kernel chunking loop ( #20392 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-10 20:31:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41060c6e08 
					 
					
						
						
							
							[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] ( #19126 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-07-10 21:09:37 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3de2ed767f 
					 
					
						
						
							
							[Bugfix] Remove assertion of expert_map being None ( #20714 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <yming@meta.com >
Signed-off-by: Ming Yang <minos.future@gmail.com > 
						
						
					 
					
						2025-07-10 19:55:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						299252ea82 
					 
					
						
						
							
							[CI] Fix pre commit issue ( #20782 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-10 12:48:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6902ce79f 
					 
					
						
						
							
							[V0][V1][Core] Add outlines integration for V1, and update V0 integration. ( #15975 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com > 
						
						
					 
					
						2025-07-10 15:30:26 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e53c89a74 
					 
					
						
						
							
							[Bugfix] [CI] Fix Tensorizer LoRA test ( #20760 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sanger Steel <sangersteel@gmail.com > 
						
						
					 
					
						2025-07-10 19:07:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c66e38ea4c 
					 
					
						
						
							
							[Test] Remove docker build from test. ( #20542 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-10 11:21:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						251595368f 
					 
					
						
						
							
							Fix DeepSeek-R1-0528 chat template ( #20717 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com > 
						
						
					 
					
						2025-07-10 17:47:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4bed167768 
					 
					
						
						
							
							[Model][VLM] Support JinaVL Reranker ( #20260 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shineran96 <shinewang96@gmail.com > 
						
						
					 
					
						2025-07-10 10:43:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b140416abf 
					 
					
						
						
							
							[Model] Add reason parser for Hunyuan A13B Model. ( #20625 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Asher Zhang <asherszhang@tencent.com > 
						
						
					 
					
						2025-07-10 16:33:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b8366b61a 
					 
					
						
						
							
							[ROCm][Regression] Remove tensor creation that harms performance on ROCm ( #20741 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-10 09:22:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7753a9809 
					 
					
						
						
							
							[Hardware][CPU] Vllm int8 quantization enablement for ARM CPU ( #14129 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: nishith-fujitsu <nishith.jaiswal@fujitsu.com > 
						
						
					 
					
						2025-07-10 15:59:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4b9a9435bb 
					 
					
						
						
							
							Update Dockerfile FlashInfer to v0.2.8rc1 ( #20718 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-10 08:09:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3482fd7e4e 
					 
					
						
						
							
							[Doc] Add engine args back in to the docs ( #20674 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-10 08:02:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						77f77a951e 
					 
					
						
						
							
							[Misc] Clean up mark to fork process in BNB tests ( #20692 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-10 13:59:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1a4f35e2ea 
					 
					
						
						
							
							Normalize lm-eval command between baseline and correctness test ( #18560 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-10 13:27:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be1e128dfb 
					 
					
						
						
							
							[CI Bugfix] Skip failing Tensorizer+LoRA test ( #20724 )  
						
						 
						
						
						
						
					 
					
						2025-07-10 21:15:03 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65393ee064 
					 
					
						
						
							
							[doc] fix ordered list ( #20749 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-10 03:13:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc221ad72d 
					 
					
						
						
							
							[Bugfix][Build][Non-CUDA] Only referencing CMAKE_CUDA_COMPILER_VERSION on CUDA where it is defined ( #20738 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-07-10 02:58:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7571a4a7e5 
					 
					
						
						
							
							[CI/Build] Fix Basic Models Test ( #20728 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-10 09:57:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f67d986dd1 
					 
					
						
						
							
							[Misc] loose new-model tagger conditions ( #20747 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-10 02:54:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc876d0f29 
					 
					
						
						
							
							[KVConnector] Aggregate finished requests on the scheduler ( #19555 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Or Ozeri <oro@il.ibm.com > 
						
						
					 
					
						2025-07-10 09:22:18 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdfd409f8f 
					 
					
						
						
							
							[TPU][Core]Make load weight exceed hbm error more instructive for customers ( #20644 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-07-10 07:01:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ffbcc9e757 
					 
					
						
						
							
							[BugFix] Fix VllmConfig() construction on all platforms ( #20695 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-10 07:00:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						59389c927b 
					 
					
						
						
							
							[BugFix][CPU] Fix CPU worker dependency on cumem_allocator ( #20696 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-10 14:24:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f2720def9 
					 
					
						
						
							
							[Frontend] Support Tool Calling with both tool_choice='required' and $defs. ( #20629 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-10 13:56:35 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad6c2e1a0b 
					 
					
						
						
							
							Correct PPMissingLayer handling in Deepseek-V2-Lite PP deployment ( #20665 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-07-09 20:34:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						49e8c7ea25 
					 
					
						
						
							
							Use NVCC --compress-mode to reduce binary size by 30% ( #20694 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-09 18:26:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						805d62ca88 
					 
					
						
						
							
							[Misc] DP : Add ExpertTokensMetadata ( #20332 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun <vsundarr@redhat.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-10 00:33:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b7d9e9416f 
					 
					
						
						
							
							[CI/Build] Fix FlashInfer double build in Dockerfile ( #20651 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-09 17:41:56 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c12a765aa 
					 
					
						
						
							
							[Misc] Simplify the prefix caching logic on draft tokens ( #20701 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-09 14:48:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd587c93ef 
					 
					
						
						
							
							[BugFix]: Properly set engine_id when using multi connector ( #19487 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: leiyiming <leiyiming@kingsoft.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-09 20:32:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						332d4cb17b 
					 
					
						
						
							
							[Feature][Quantization] MXFP4 support for MOE models ( #17888 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Felix Marty <felmarty@amd.com >
Signed-off-by: Bowen Bao <bowenbao@amd.com >
Signed-off-by: Felix Marty <Felix.Marty@amd.com >
Co-authored-by: Bowen Bao <bowenbao@amd.com > 
						
						
					 
					
						2025-07-09 13:19:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf03ff3575 
					 
					
						
						
							
							[Kernel] Add Conch backend for mixed-precision linear layer ( #19818 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jacob Manning <jmanning+oss@stackav.com > 
						
						
					 
					
						2025-07-09 13:17:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47043eb678 
					 
					
						
						
							
							[Kernel] Triton implementation of causal-conv1d for Mamba-based models ( #18218 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com >
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-07-09 12:53:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31b96d1c64 
					 
					
						
						
							
							Support Llama 4 for cutlass_moe_fp4 ( #20453 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-09 15:53:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e59ba9e142 
					 
					
						
						
							
							[CI/Build] Enlarge tolerance for a CPU multi-modal test ( #20684 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-09 17:48:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						403b481573 
					 
					
						
						
							
							Remove heading form installation inc.md file ( #20697 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-09 10:42:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						138709f8d1 
					 
					
						
						
							
							[Doc] Update CPU doc ( #20676 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-09 10:28:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0bbac1c1b4 
					 
					
						
						
							
							[Bench] Add NVFP4 GEMM benchmark script ( #20578 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-09 13:23:48 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3e4e85ece 
					 
					
						
						
							
							[XPU][CI] enhance xpu test support ( #20652 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com >
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai > 
						
						
					 
					
						2025-07-09 16:53:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb58f5953d 
					 
					
						
						
							
							[TPU][Bugfix] fix test_pallas ( #20666 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-09 09:32:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ac9c33f78 
					 
					
						
						
							
							[Bugfix] Fix handling of Tensorizer arguments for LoadConfig ( #20643 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sanger Steel <sangersteel@gmail.com > 
						
						
					 
					
						2025-07-09 15:36:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						efe73d0575 
					 
					
						
						
							
							[doc] update doc format ( #20673 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-09 08:08:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						853487bc1b 
					 
					
						
						
							
							[Docs] Improve docs for RLHF co-location example ( #20599 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-09 08:06:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ff2af6d2b 
					 
					
						
						
							
							[Benchmark] Parameterization of streaming loading of multimodal datasets ( #20528 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com > 
						
						
					 
					
						2025-07-09 13:35:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70ca5484f5 
					 
					
						
						
							
							[Doc] Update notes ( #20668 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-09 03:46:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5358cce5ff 
					 
					
						
						
							
							[V1] [Doc] Update V1 docs for Mamba models ( #20499 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-07-09 01:02:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2155e95ef1 
					 
					
						
						
							
							[Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. ( #20662 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-09 07:39:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f95570a52d 
					 
					
						
						
							
							[Docs] fix minimax tool_calling docs error ( #20667 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qingjun <qingjun@minimaxi.com > 
						
						
					 
					
						2025-07-09 00:37:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b6e7e3d58f 
					 
					
						
						
							
							[Intel GPU] support ray as distributed executor backend for XPU. ( #20659 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-09 00:36:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e760fcef22 
					 
					
						
						
							
							[XPU] Use spawn with XPU multiprocessing ( #20649 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com > 
						
						
					 
					
						2025-07-09 00:34:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6bbf1795b7 
					 
					
						
						
							
							[Misc] Fix the size of batched_dummy_mm_inputs in profile_run ( #20434 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: bk-201 <joy25810@foxmail.com > 
						
						
					 
					
						2025-07-08 20:15:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9e0ef888f0 
					 
					
						
						
							
							Fix bullets in incremental_build.md ( #20642 )  
						
						 
						
						
						
						
					 
					
						2025-07-09 11:03:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97abeb1daa 
					 
					
						
						
							
							[feat] enable SM100 CUTLASS block scaled group gemm for smaller batch sizes ( #20640 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com > 
						
						
					 
					
						2025-07-09 11:03:35 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34dad19e7b 
					 
					
						
						
							
							[Bugfix] set default set cuda_graph_sizes to min(self.max_num_seqs * 2, 512) ( #20628 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: izhuhaoran <izhuhaoran@qq.com > 
						
						
					 
					
						2025-07-09 11:02:51 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6db31e7a27 
					 
					
						
						
							
							[Hardware][PPC64LE] Enable V1 for ppc64le and ARM ( #20554 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Nikhil Gupta <nikhil.gupta2@arm.com > 
						
						
					 
					
						2025-07-08 20:00:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						977180c912 
					 
					
						
						
							
							[Docs] Improve documentation for multi-node service helper script ( #20600 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-08 19:44:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c40784c794 
					 
					
						
						
							
							[BugFix][Intel GPU] Use refactored API for dist_backend in V1 worker ( #20596 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ratnampa <ratnam.parikh@intel.com > 
						
						
					 
					
						2025-07-08 19:44:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						baed180aa0 
					 
					
						
						
							
							[tech debt] Revisit lora request model checker ( #20636 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com > 
						
						
					 
					
						2025-07-09 09:42:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b407479ef 
					 
					
						
						
							
							[misc]refactor Platform.set_device method ( #20262 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-09 01:39:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5eaf570050 
					 
					
						
						
							
							Replace multiply_add with homogeneous_multiply_add to Address Clang Template Parameter Issue ( #20142 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-09 00:30:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d8ee5a2ca4 
					 
					
						
						
							
							[TPU][Bugfix] disable phi-3 test ( #20632 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-08 23:14:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9fca83256 
					 
					
						
						
							
							[Bugfix] Fix GLM-4.1-V video prompt update ( #20635 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-08 23:13:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32dffc2772 
					 
					
						
						
							
							[Core] Rename get_max_tokens_per_item for backward compatibility ( #20630 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-08 23:11:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c438183e99 
					 
					
						
						
							
							[Bugfix] Fix topk_ids indices_type for CUTLASS w8a8 FP8 MoE ( #20166 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <yming@meta.com > 
						
						
					 
					
						2025-07-08 23:10:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						baba0389f7 
					 
					
						
						
							
							[CI] Increase the threshold of the MTEB RERANK tests ( #20615 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-08 08:10:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6c22f16d3 
					 
					
						
						
							
							Revert invalid spellchecker fix on deepseek_vl2 ( #20618 )  
						
						 
						
						
						
						
					 
					
						2025-07-08 15:07:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd382e0fe3 
					 
					
						
						
							
							[Model] Implement missing get_language_model for Keye-VL ( #20631 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-08 07:47:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						849590a2a7 
					 
					
						
						
							
							Update torch/xla pin to 20250703 ( #20589 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-07-08 07:44:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4c23314c0 
					 
					
						
						
							
							[xpu]feat: support multi-lora on xpu ( #20616 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yan <yan.ma@intel.com > 
						
						
					 
					
						2025-07-08 22:07:10 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b942c094e3 
					 
					
						
						
							
							Stop using title frontmatter and fix doc that can only be reached by search ( #20623 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-08 03:27:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4bab81660 
					 
					
						
						
							
							Remove unnecessary explicit title anchors and use relative links instead ( #20620 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-08 02:49:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b91cb3fa5c 
					 
					
						
						
							
							[Docs] Improve documentation for Deepseek R1 on Ray Serve LLM ( #20601 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-08 02:09:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71d1d75b7a 
					 
					
						
						
							
							[PD][Nixl] Remote consumer READ timeout for clearing request blocks  ( #20139 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-08 08:56:40 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72d14d0eed 
					 
					
						
						
							
							[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load ( #19619 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sanger Steel <sangersteel@gmail.com >
Co-authored-by: Eta <esyra@coreweave.com > 
						
						
					 
					
						2025-07-07 22:47:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e34d130c16 
					 
					
						
						
							
							[TPU] Temporary fix vmem oom for long model len by reducing page size ( #20278 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-07-08 05:16:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7721ef1786 
					 
					
						
						
							
							[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files ( #20560 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-07 22:13:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8369b7c2a9 
					 
					
						
						
							
							[Misc] improve error msg ( #20604 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-07 21:45:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3eb4ad53f3 
					 
					
						
						
							
							[Docs] Add Anyscale to frameworks ( #20590 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-07 20:09:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90a2769f20 
					 
					
						
						
							
							[Docs] Add Ray Serve LLM section to openai compatible server guide ( #20595 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-07 20:08:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e60d422f19 
					 
					
						
						
							
							[Docs] Improve docstring for ray data llm example ( #20597 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-07 20:06:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d914c81a2 
					 
					
						
						
							
							[Docs] Rewrite offline inference guide ( #20594 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ricardo Decal <rdecal@anyscale.com > 
						
						
					 
					
						2025-07-07 20:06:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e428cdd7a 
					 
					
						
						
							
							[Doc] Syntax highlight request responses as JSON instead of bash ( #20582 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 20:02:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						93b9d9f499 
					 
					
						
						
							
							[Bugfix]: Fix messy code when using logprobs ( #19209 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-07-08 11:02:15 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af107d5a0e 
					 
					
						
						
							
							Make distinct code and console admonitions so readers are less likely to miss them ( #20585 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 19:55:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31c5d0a1b7 
					 
					
						
						
							
							[Optimize] Don't send token ids when kv connector is not used ( #20586 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-07 19:04:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						afb7cff1b9 
					 
					
						
						
							
							[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe ( #20167 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ming Yang <yming@meta.com > 
						
						
					 
					
						2025-07-08 01:07:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d2e841a10a 
					 
					
						
						
							
							[Misc] Improve logging for dynamic shape cache compilation ( #20573 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kyolebu <kyu@redhat.com > 
						
						
					 
					
						2025-07-08 00:48:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14601f5fba 
					 
					
						
						
							
							[Config] Refactor mistral configs  ( #20570 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com > 
						
						
					 
					
						2025-07-07 15:25:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						042d131f39 
					 
					
						
						
							
							Fix links in multi-modal model contributing page ( #18615 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 21:13:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8e807cdfa4 
					 
					
						
						
							
							[Misc] feat output content in stream response ( #19608 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-07 20:45:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e601efcb10 
					 
					
						
						
							
							[Misc] Add fully interleaved support for multimodal 'string' content format ( #14047 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru >
Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru > 
						
						
					 
					
						2025-07-07 19:43:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						22dd9c2730 
					 
					
						
						
							
							[Kernel] Optimize Prefill Attention in Unified Triton Attention Kernel ( #20308 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com > 
						
						
					 
					
						2025-07-07 19:08:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6d795d593 
					 
					
						
						
							
							[DP] Copy environment variables to Ray DPEngineCoreActors ( #20344 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-07-07 10:14:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a37d75bbec 
					 
					
						
						
							
							[Front-end] microbatch tokenization ( #19334 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zt2370 <ztang2370@gmail.com > 
						
						
					 
					
						2025-07-07 17:54:10 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						edd270bc78 
					 
					
						
						
							
							[Bugfix] Prevent IndexError for cached requests when pipeline parallelism is disabled ( #20486 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-07-07 09:41:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						110df74332 
					 
					
						
						
							
							[Model][Last/4] Automatic conversion of CrossEncoding model ( #19675 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-07 14:46:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1ad69e8375 
					 
					
						
						
							
							[Doc] Fix some MkDocs snippets used in the installation docs ( #20572 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 07:44:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b8a498c9b2 
					 
					
						
						
							
							[Doc] Add outline for content tabs ( #20571 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 07:43:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						923147b5e8 
					 
					
						
						
							
							[Doc] Fix internal links so they don't always point to latest ( #20563 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 04:15:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45877ef740 
					 
					
						
						
							
							[Doc] Use gh-pr and gh-issue everywhere we can in the docs ( #20564 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 03:54:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e4bef1bea 
					 
					
						
						
							
							[Doc] Remove extra whitespace from CI failures doc ( #20565 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-07-07 03:35:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ff79a136e 
					 
					
						
						
							
							[Misc] Set the minimum openai version ( #20539 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-07 09:15:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						448acad31e 
					 
					
						
						
							
							[Misc] remove unused jinaai_serving_reranking ( #18878 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Abirdcfly <fp544037857@gmail.com > 
						
						
					 
					
						2025-07-07 09:14:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb0b2d2f08 
					 
					
						
						
							
							[Docs] Clean up tables in supported_models.md ( #20552 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-07 01:46:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3112271f6e 
					 
					
						
						
							
							[XPU] log clean up for XPU platform ( #20553 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yan <yan.ma@intel.com > 
						
						
					 
					
						2025-07-07 01:38:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1fd471e957 
					 
					
						
						
							
							Add docstrings to url_schemes.py to improve readability ( #20545 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-07 08:31:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c5ebec064 
					 
					
						
						
							
							[XPU][CI] add v1/core test in xpu hardware ci ( #20537 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com > 
						
						
					 
					
						2025-07-07 01:16:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e610deb72 
					 
					
						
						
							
							[CI/Build] Enable phi2 lora test ( #20540 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-07 05:10:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e2c19ce22 
					 
					
						
						
							
							[Refactor]Abstract Platform Interface for Distributed Backend and Add xccl Support for Intel XPU ( #19410 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dbyoung18 <yang5.yang@intel.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-07 04:32:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47db8c2c15 
					 
					
						
						
							
							[Misc] add a tip for pre-commit ( #20536 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-06 19:42:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						462b269280 
					 
					
						
						
							
							Implement OpenAI Responses API [1/N] ( #20504 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-06 18:32:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c18b3b8e8b 
					 
					
						
						
							
							[Bugfix] Add use_cross_encoder flag to use correct activation in ClassifierPooler ( #20527 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-06 14:01:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9528e3a05e 
					 
					
						
						
							
							[BugFix][Spec Decode] Fix spec token ids in model runner ( #20530 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-06 19:44:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9fb52e523a 
					 
					
						
						
							
							[V1] Support any head size for FlexAttention backend ( #20467 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-06 09:54:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e202dd2736 
					 
					
						
						
							
							[V0 deprecation] Remove V0 CPU/XPU/TPU backends ( #20412 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-06 08:48:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						43813e6361 
					 
					
						
						
							
							[Misc] call the pre-defined func ( #20518 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-06 10:25:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cede942b87 
					 
					
						
						
							
							[Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py ( #20516 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-07-06 09:20:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe1e924811 
					 
					
						
						
							
							[Frontend] Support image object in llm.chat ( #19635 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com >
Signed-off-by: Flora Feng <4florafeng@gmail.com > 
						
						
					 
					
						2025-07-06 06:47:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4548c03c50 
					 
					
						
						
							
							[TPU][Bugfix] fix the MoE OOM issue ( #20339 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-05 21:19:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40b86aa05e 
					 
					
						
						
							
							[BugFix] Fix: ImportError when building on hopper systems ( #20513 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-07-06 12:17:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						432870829d 
					 
					
						
						
							
							[Bugfix] Fix missing per_act_token parameter in compressed_tensors_moe ( #20509 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-07-06 12:08:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f73d02aadc 
					 
					
						
						
							
							[BUG]  Fix   #20484 . Support empty sequence in cuda penalty kernel ( #20491 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai > 
						
						
					 
					
						2025-07-05 19:38:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c5ebe040ac 
					 
					
						
						
							
							test_attention compat with coming xformers change ( #20487 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-07-05 19:37:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d763cb891 
					 
					
						
						
							
							[Misc] remove unused import ( #20517 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-05 19:17:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf4cd53982 
					 
					
						
						
							
							[Misc] Add logger.exception for TPU information collection failures ( #20510 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-05 07:24:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32c9be2200 
					 
					
						
						
							
							[v1] Re-add fp32 support to v1 engine through FlexAttention ( #19754 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-05 09:41:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8aeaa910a2 
					 
					
						
						
							
							Fix unknown attribute of topk_indices_dtype in CompressedTensorsW8A8Fp8MoECutlassMethod ( #20507 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com > 
						
						
					 
					
						2025-07-05 14:03:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						906e05d840 
					 
					
						
						
							
							[Misc] Remove the unused LoRA test code ( #20494 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-05 13:48:16 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ef9a2990ae 
					 
					
						
						
							
							[doc] small fix ( #20506 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-04 20:56:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e90870491 
					 
					
						
						
							
							[Misc] Add security warning for development mode endpoints ( #20508 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-04 20:52:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3f05c9248 
					 
					
						
						
							
							[Doc] fix mutltimodal_inputs.md gh examples link ( #20497 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guy Stone <guys@spotify.com > 
						
						
					 
					
						2025-07-04 16:41:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c108781c85 
					 
					
						
						
							
							[CI Bugfix] Fix pre-commit failures on main ( #20502 )  
						
						 
						
						
						
						
					 
					
						2025-07-04 14:17:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d184b95b8 
					 
					
						
						
							
							[feat]: CUTLASS block scaled group gemm for SM100 ( #19757 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Duncan Moss <djm.moss@gmail.com >
Co-authored-by: Duncan Moss <dmoss@nvidia.com > 
						
						
					 
					
						2025-07-04 12:58:04 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f35a022e6 
					 
					
						
						
							
							Enable V1 for Hybrid SSM/Attention Models ( #20016 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-07-04 17:46:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ffe00ef77a 
					 
					
						
						
							
							[Misc] Small: Remove global media connector. Each test should have its own test connector object. ( #20395 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-07-04 08:15:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5561681d04 
					 
					
						
						
							
							[CI] add kvcache-connector dependency definition and add into CI build ( #18193 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Peter Pan <Peter.Pan@daocloud.io > 
						
						
					 
					
						2025-07-04 06:49:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fbd62d8750 
					 
					
						
						
							
							[Doc] Fix classification table in list of supported models ( #20489 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-04 06:08:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e26f9156a 
					 
					
						
						
							
							[Model][3/N] Automatic conversion of CrossEncoding model ( #20168 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-04 05:47:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9e5452ee34 
					 
					
						
						
							
							[Bug][Frontend] Fix structure of transcription's decoder_prompt ( #18809 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sangbumlikeagod <oironese@naver.com > 
						
						
					 
					
						2025-07-04 11:28:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e3fe896e2 
					 
					
						
						
							
							Support Llama 4 for fused_marlin_moe ( #20457 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-04 07:55:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1caca5a589 
					 
					
						
						
							
							[Misc] Add SPDX-FileCopyrightText ( #20428 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-04 07:40:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						783921d889 
					 
					
						
						
							
							[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels ( #20331 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-04 15:06:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a98edff1f 
					 
					
						
						
							
							[Structured Outputs][V1] Skipping with models doesn't contain tokenizers ( #20365 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-04 15:05:49 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a7bab0c9e5 
					 
					
						
						
							
							[Misc] small update ( #20462 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-03 20:33:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						25950dca9b 
					 
					
						
						
							
							Add ignore consolidated file in mistral example code ( #20420 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com > 
						
						
					 
					
						2025-07-04 02:55:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4113b035c 
					 
					
						
						
							
							[Platform] Add custom default max tokens ( #18557 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gabriel Marinho <gmarinho@ibm.com > 
						
						
					 
					
						2025-07-04 10:50:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e1665b089 
					 
					
						
						
							
							[Misc] Change warn_for_unimplemented_methods to debug ( #20455 )  
						
						 
						
						
						
						
					 
					
						2025-07-04 02:35:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d1096e7db 
					 
					
						
						
							
							[Bugfix] Register reducer even if transformers_modules not available ( #19510 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seiji Eicher <seiji@anyscale.com > 
						
						
					 
					
						2025-07-03 22:08:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d775dd30a 
					 
					
						
						
							
							[Misc] Fix Unable to detect current VLLM config. Defaulting to NHD kv cache layout warning ( #20400 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-03 14:56:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78fe77534b 
					 
					
						
						
							
							[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. ( #18864 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-07-03 14:55:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f2fcb31b8 
					 
					
						
						
							
							[Misc] Remove _maybe_ignore_quant_config from GLM4.1v ( #20432 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com > 
						
						
					 
					
						2025-07-03 21:41:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1dba2c4ebe 
					 
					
						
						
							
							[Misc] adjust for ipv6 for mookcacke url parse ( #20107 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-03 20:27:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71d6de3a26 
					 
					
						
						
							
							[Misc] Clean up InternVL family config registration ( #19992 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-03 20:01:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						536fd33003 
					 
					
						
						
							
							[CI] Trimming some failing test groups from AMDPRODUCTION. ( #20390 )  
						
						 
						
						
						
						
					 
					
						2025-07-03 08:21:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						619b9f5c7e 
					 
					
						
						
							
							[Frontend] fix duplicate output for bench subcmd ( #20446 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-03 08:02:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d1b689c445 
					 
					
						
						
							
							[Bugfix] Fix flaky test_streaming_response test ( #20363 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-03 14:46:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9854dc9040 
					 
					
						
						
							
							[Frontend] improve vllm bench <bench_type> --help display ( #20430 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-03 14:22:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ff5c60fad8 
					 
					
						
						
							
							[Misc] Automatically tag PRs to add new models ( #20222 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-03 07:11:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6f1229f91d 
					 
					
						
						
							
							[Model][2/N] Automatic conversion of CrossEncoding model ( #19978 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wang.yuqi <noooop@126.com > 
						
						
					 
					
						2025-07-03 13:59:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1819fbda63 
					 
					
						
						
							
							[Quantization] Bump to use latest bitsandbytes ( #20424 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-03 21:58:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f0367109e 
					 
					
						
						
							
							[CI/Build][CPU] Enable cross compilation in CPU release pipeline ( #20423 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-03 05:26:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb14d53cf6 
					 
					
						
						
							
							[Kernel] refactor cpu worker v0 cache dtype ( #20080 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Xie <andy.xning@gmail.com > 
						
						
					 
					
						2025-07-03 08:39:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b024a42e93 
					 
					
						
						
							
							[Core] Move multimodal placeholder from chat utils to model definition ( #20355 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-03 08:18:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb97f2bfc5 
					 
					
						
						
							
							[Docs] Replace two list with tables in intel_gaudi.md ( #20414 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-03 00:48:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						359200f6ac 
					 
					
						
						
							
							[doc] fix link ( #20417 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-03 00:21:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						220aee902a 
					 
					
						
						
							
							[Misc] Add rules to label Speculative Decoding Related PRs ( #20406 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lifan Shen <lifans@meta.com > 
						
						
					 
					
						2025-07-02 23:56:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67d25eca05 
					 
					
						
						
							
							[Tests] Update online DP tests to verify that requests are balanced ( #20157 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-03 14:49:13 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						363528de27 
					 
					
						
						
							
							[Feature] Support MiniMax-M1 function calls features ( #20297 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: QscQ <qscqesze@gmail.com >
Signed-off-by: qingjun <qingjun@minimaxi.com > 
						
						
					 
					
						2025-07-03 06:48:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ff61ababa 
					 
					
						
						
							
							[TPU] Add a case to cover RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 ( #20385 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <derrhein@gmail.com > 
						
						
					 
					
						2025-07-03 06:46:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ec3779df7 
					 
					
						
						
							
							[Bugfix][CI/CD][CPU] Fix CPU CI tests ( #20383 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-02 20:11:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b616f6a53d 
					 
					
						
						
							
							[Misc] Small: Fix video loader return type annotations. ( #20389 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-07-03 03:10:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e25bb12a8 
					 
					
						
						
							
							[Bugfix] Fix import of CutlassExpertsFp8 in compressed_tensors_moe.py ( #20381 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-07-03 02:07:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9965c47d0d 
					 
					
						
						
							
							Enable CPU nightly performance benchmark and its Markdown report ( #18444 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tsai, Louie <louie.tsai@intel.com > 
						
						
					 
					
						2025-07-02 17:50:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						059d4cdb49 
					 
					
						
						
							
							[BugFix] Fix DP headless mode arg validation ( #20398 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-02 17:15:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bdb84e26b0 
					 
					
						
						
							
							[Bugfix] Fixes for FlashInfer's TORCH_CUDA_ARCH_LIST ( #20136 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-07-02 17:15:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3dd359147d 
					 
					
						
						
							
							[Docs] Update EAGLE example ( #20375 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-02 17:13:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						657f2f301a 
					 
					
						
						
							
							[DP] Support external DP Load Balancer mode ( #19790 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-02 10:21:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a1aafc827a 
					 
					
						
						
							
							[ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) ( #20254 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-07-02 16:25:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						139508a418 
					 
					
						
						
							
							[Misc] add handler HF_TOKEN is emptry string ( #20369 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-07-02 09:14:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d265414dbc 
					 
					
						
						
							
							[Minor] Clean up incorrect comment in test ( #20382 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-02 09:13:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						48fb076cbc 
					 
					
						
						
							
							[V1] LogitsProcessor programming model ( #16728 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Andrew Feldman <afeldman@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-07-02 09:10:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c1909e7e8c 
					 
					
						
						
							
							[Kernels] MoE refactor ( #19636 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-07-02 06:08:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b95877509b 
					 
					
						
						
							
							Documentation update tool_calling: mapping back to function from response ( #20373 )  
						
						 
						
						
						
						
					 
					
						2025-07-02 05:55:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						706ff13224 
					 
					
						
						
							
							[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct ( #20286 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zichong Li <t-lizichong@microsoft.com @Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net>
Co-authored-by: Zichong Li <t-lizichong@microsoft.com @Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net>
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-07-02 12:54:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ccbfb1d1c9 
					 
					
						
						
							
							[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models ( #20322 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com > 
						
						
					 
					
						2025-07-02 12:53:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9e5552aa13 
					 
					
						
						
							
							[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) ( #17280 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kaln27 <liaojuncheng123@foxmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-02 06:47:19 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c600b9ab6 
					 
					
						
						
							
							[Build/CI] Automatically tag DeepSeek related PRs ( #20370 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-07-02 04:02:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e303dcf523 
					 
					
						
						
							
							[Model] Add Ernie4.5 and Ernie4.5MoE Model Support ( #20220 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangyafeng <wangyafeng@baidu.com > 
						
						
					 
					
						2025-07-02 03:37:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae9c4d416f 
					 
					
						
						
							
							[Docs] Make TPU ref prettier in google_tpu.md ( #20356 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-02 02:04:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d853520b3e 
					 
					
						
						
							
							[Docs] Fix indentations for 2-level items in deprecation_policy.md ( #20352 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-07-01 23:50:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba51aea65e 
					 
					
						
						
							
							[Bugfix] Keye-VL compatibility with tok_kwargs ( #20058 ) ( #20353 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-01 23:46:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8452946c06 
					 
					
						
						
							
							[Model][VLM] Support Keye-VL-8B-Preview ( #20126 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kwai-Keye <Keye@kuaishou.com > 
						
						
					 
					
						2025-07-01 23:35:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e7cbf2d7d 
					 
					
						
						
							
							[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. ( #20105 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenheli Hua <huachenheli@outlook.com > 
						
						
					 
					
						2025-07-01 23:34:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7da296be04 
					 
					
						
						
							
							[TPU] kv cache update kernel supports dynamic grid ( #20235 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-07-02 06:33:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b205e8467d 
					 
					
						
						
							
							[Doc][TPU] Add models and features supporting matrix. ( #20230 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Qiliang Cui <cuiq@google.com > 
						
						
					 
					
						2025-07-02 06:33:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be0cfb2b68 
					 
					
						
						
							
							fix[Docs]: link anchor is incorrect  #20309  ( #20315 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zxw <1020938856@qq.com > 
						
						
					 
					
						2025-07-02 06:32:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1a03dd496b 
					 
					
						
						
							
							[Bugfix] Fix dynamic rotary embedding ( #20343 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-07-02 06:31:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27b8017636 
					 
					
						
						
							
							[FIX][Intel GPU]fix ipex flash_attn_varlen_func api missing parameter ( #20348 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-07-01 22:26:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ec1e3065a 
					 
					
						
						
							
							[Misc][Doc] Add missing comment for LLM ( #20285 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lifan Shen <lifans@meta.com > 
						
						
					 
					
						2025-07-01 19:04:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9dae7d46bf 
					 
					
						
						
							
							[Refactor] Remove Unused Env VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON ( #20334 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-01 19:03:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7058d7dd5d 
					 
					
						
						
							
							[Refactor] Remove duplicate find_free_port ( #20333 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-01 19:03:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0389e0554 
					 
					
						
						
							
							[UT][intel GPU] use current_platform instead of device hardcode in v1 tests ( #20169 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com > 
						
						
					 
					
						2025-07-02 09:06:04 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3be8d312a2 
					 
					
						
						
							
							[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 ( #20324 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-07-01 18:05:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3abfe22154 
					 
					
						
						
							
							Enable group size 64 for Machete ( #20290 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: czhu-cohere <conway.zhu@cohere.com > 
						
						
					 
					
						2025-07-01 18:05:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e81fbefe8a 
					 
					
						
						
							
							[Refactor] Refactor import utils ( #20269 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-07-01 18:05:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9290de5667 
					 
					
						
						
							
							remove unused variables in marlin_template.h ( #20236 )  
						
						 
						
						
						
						
					 
					
						2025-07-02 00:51:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f280d69c9 
					 
					
						
						
							
							[Optimization] Cache sampled token ids in model runner ( #20291 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-01 11:01:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						02cabff207 
					 
					
						
						
							
							[V1] [ROCm] Enable EP with AITER Fused MoE ( #20270 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-07-01 16:48:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d19d47d91 
					 
					
						
						
							
							[Frontend] Expand tools even if tool_choice="none" ( #17177 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: okada shintarou <okada@preferred.jp > 
						
						
					 
					
						2025-07-01 12:47:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8acb4badee 
					 
					
						
						
							
							[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling ( #20301 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-01 09:07:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						314af8617c 
					 
					
						
						
							
							[Docs] Update transcriptions API to use openai client with stream=True  ( #20271 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-01 15:47:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e96cc9b7e 
					 
					
						
						
							
							[Misc] Minor refactoring for scheduler ( #20299 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-07-01 07:55:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ecad851cbd 
					 
					
						
						
							
							[Model]Add Tencent HunYuanMoEV1 Model Support ( #20114 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: aiyiwang <aiyiwang@tencent.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: quinnrong <quinnrong@tencent.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-07-01 07:28:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed70f3c64f 
					 
					
						
						
							
							Add GLM4.1V model (Draft) ( #19331 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-07-01 12:48:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						650d5dbd04 
					 
					
						
						
							
							[Misc] Minor refactor of NIXL background handshake ( #20068 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-07-01 12:40:14 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9025a9a705 
					 
					
						
						
							
							[Quant] [Bugfix] Fix quantization config matching with hf_to_vllm_mapper ( #20046 )  
						
						 
						
						
						
						
					 
					
						2025-07-01 19:20:34 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c05596f1a3 
					 
					
						
						
							
							[Perf] Validate @config in pre-commit instead of dynamically ( #20200 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lionel Villard <villard@us.ibm.com > 
						
						
					 
					
						2025-07-01 05:10:28 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						787b13389e 
					 
					
						
						
							
							[doc] fix the incorrect logo in dark mode ( #20289 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-07-01 08:18:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						96453cfa83 
					 
					
						
						
							
							[BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine ( #19067 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com > 
						
						
					 
					
						2025-07-01 16:12:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1c1fe35a5 
					 
					
						
						
							
							[Misc] remove redundant char ( #20287 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-07-01 15:33:22 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08d81f1014 
					 
					
						
						
							
							[Bugfix] Fix deepep tests ( #20288 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-07-01 15:29:08 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6cc1e7d96d 
					 
					
						
						
							
							[CPU] Update custom ops for the CPU backend ( #20255 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-07-01 07:25:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9909726d2a 
					 
					
						
						
							
							Enable ZP Support for Machete ( #20268 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: czhu-cohere <conway.zhu@cohere.com > 
						
						
					 
					
						2025-07-01 07:12:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						22e9d42040 
					 
					
						
						
							
							[Misc] add xgrammar for arm64 ( #18359 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com > 
						
						
					 
					
						2025-07-01 07:02:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86debab54c 
					 
					
						
						
							
							Fix numel() downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 ( #17082 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-01 06:48:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be250bbc67 
					 
					
						
						
							
							[V1] Only print cudagraph tqdm on rank 0 with is_global_first_rank ( #19516 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-01 06:02:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27949354fa 
					 
					
						
						
							
							[Feature] A calibration-free RTN-based quantization for accurate and accelerated INT4/INT8 inference ( #18768 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex Kogan <alex.kogan@oracle.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-07-01 05:44:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd5038af07 
					 
					
						
						
							
							[Doc] add config and troubleshooting guide for NCCL & GPUDirect RDMA ( #15897 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ernest Wong <chwong719@gmail.com > 
						
						
					 
					
						2025-06-30 21:44:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a2f14dc8f9 
					 
					
						
						
							
							[CI][Intel Gaudi][vllm-Plugin]Add CI for hpu-plugin-v1-test ( #20196 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-07-01 04:17:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						92ee7baaf9 
					 
					
						
						
							
							[Example] add one-click runnable example for P2P NCCL XpYd ( #20246 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu > 
						
						
					 
					
						2025-06-30 21:03:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7151f92241 
					 
					
						
						
							
							[Misc] Fix spec decode example ( #20296 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-30 21:01:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e28533a16f 
					 
					
						
						
							
							[Bugfix] Fix include prompt in stream response when echo=true ( #15233 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Fang <yuanfang@alauda.io > 
						
						
					 
					
						2025-07-01 01:30:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d42ce8315 
					 
					
						
						
							
							[CLI] Improve CLI arg parsing for -O/--compilation-config ( #20156 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-07-01 01:03:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ded1fb635b 
					 
					
						
						
							
							[Bugfix][V1][P/D]Fix the issue of occasional garbled output  for P2pNcclConnector ( #20263 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Abatom <abzhonghua@gmail.com > 
						
						
					 
					
						2025-06-30 16:45:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97d9524fe9 
					 
					
						
						
							
							[Refactor] Remove useless pdb comment ( #20266 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-30 18:15:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d8cf819a9a 
					 
					
						
						
							
							[Core] [Bugfix] [Multimodal] Fix multimodal profiling and generation for SFT/PTQed models ( #20058 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-06-30 17:26:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						551ef1631a 
					 
					
						
						
							
							[Unit Test] Add unit test for deep gemm ( #20090 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-06-30 10:26:42 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2863befce3 
					 
					
						
						
							
							[Optimization] Use Shared CachedRequestData Instance Across All Requests ( #20232 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-30 09:07:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2965c99c86 
					 
					
						
						
							
							[Spec Decode] Clean up spec decode example ( #20240 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-30 08:28:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2062c0723d 
					 
					
						
						
							
							[Spec Decode] Refactor spec decoding into a separate function ( #20238 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-30 08:13:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c50e100a9 
					 
					
						
						
							
							[Bugfix] fix quark ptpc ( #20251 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Haoyang Li <Haoyang.Li@amd.com >
Co-authored-by: Haoyang Li <307790822@qq.com > 
						
						
					 
					
						2025-06-30 22:24:50 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ee56e26be 
					 
					
						
						
							
							[Docs] Fix 1-2-3 list in v1/prefix_caching.md ( #20243 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-06-30 11:20:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8fe7fc8634 
					 
					
						
						
							
							[Quantization] Improve BitsAndBytesModelLoader ( #20242 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-06-30 18:22:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e936e401de 
					 
					
						
						
							
							[Bugfix] Fix processor initialization in transformers 4.53.0 ( #20244 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-06-30 10:16:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f5dfa07531 
					 
					
						
						
							
							[Bugfix] Skip loading extra parameters for modelopt Qwen3 MoE model ( #19598 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: noiji <> 
						
						
					 
					
						2025-06-30 18:21:56 +09:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						022c58b80f 
					 
					
						
						
							
							[doc] Add Slack and Forum to the top navigation ( #20208 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-06-30 07:53:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						19108ef311 
					 
					
						
						
							
							[Misc] Fix import ( #20233 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-06-29 20:34:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a52f389dd 
					 
					
						
						
							
							[BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert ( #20202 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-06-29 19:46:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65b1cbb138 
					 
					
						
						
							
							[Model] support dots1 ( #18254 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: redmoe-moutain <agiredmoe@gmail.com > 
						
						
					 
					
						2025-06-29 19:34:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c9837a761 
					 
					
						
						
							
							Fix cuda_archs_loose_intersection when handling sm_*a ( #20207 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-06-29 16:52:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6f2f53a82d 
					 
					
						
						
							
							[Quantization] Add compressed-tensors NVFP4 MoE Support ( #19990 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com > 
						
						
					 
					
						2025-06-29 22:05:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b1895e6ce 
					 
					
						
						
							
							[CI Fix] Try fixing eagle e2e test OOM by reducing block allocation ( #20213 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-29 10:31:37 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d36693687 
					 
					
						
						
							
							[Refactor] Create a function util and cache the results for has_deepgemm, has_deepep, has_pplx ( #20187 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-28 22:06:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						daec9dea6e 
					 
					
						
						
							
							[Bugfix] Correct behavior of GraniteMoeHybrid for TensorParallel execution ( #20137 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com > 
						
						
					 
					
						2025-06-28 08:16:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						daceac57c7 
					 
					
						
						
							
							[Frontend] Generalize v1/audio/transcriptions endpoint ( #20179 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-06-28 08:15:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8615d9776f 
					 
					
						
						
							
							[CI/Build] Add new CI job to validate Hybrid Models for every PR  ( #20147 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-06-27 23:00:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b460c25f9 
					 
					
						
						
							
							[BugFix] Fix the incorrect func name in the comments. (config.py) ( #20185 )  
						
						 
						
						
						
						
					 
					
						2025-06-27 22:51:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f719772281 
					 
					
						
						
							
							[Bugfix] Properly reject requests with empty list guided_choice ( #20195 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-27 22:50:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d45417b804 
					 
					
						
						
							
							fix ci issue distributed 4 gpu test ( #20204 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yewentao256 <zhyanwentao@126.com > 
						
						
					 
					
						2025-06-27 22:50:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a29e62ea34 
					 
					
						
						
							
							Fix num_token_padding support for static per-tensor scaled_fp8_quant ( #20188 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-27 22:48:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e53be6f00a 
					 
					
						
						
							
							[Misc] Add type assertion of request_id for LLMEngine.add_request ( #19700 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: n2ptr <xuzhanchaomail@163.com > 
						
						
					 
					
						2025-06-27 22:47:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c329ceca6d 
					 
					
						
						
							
							[CI Fix] Pin tests/models/registry.py MiniMaxText01ForCausalLM to revision due to model changes ( #20199 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-06-28 13:43:06 +08:00