a7b809e0f0 
					 
					
						
						
							
							Merge remote-tracking branch 'upstream/main' into benchmark-output  
						
						 
						
						
						
						
					 
					
						2025-04-23 14:55:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7efc568418 
					 
					
						
						
							
							Update convert_to_csv.py  
						
						 
						
						
						
						
					 
					
						2025-04-23 10:51:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8e630d680e 
					 
					
						
						
							
							Improve Transformers backend model loading QoL ( #17039 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-23 07:33:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af869f6dff 
					 
					
						
						
							
							[CI] Update structured-output label automation ( #17055 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-23 07:33:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53c0fa1e25 
					 
					
						
						
							
							Ensure that pid passed to kill_process_tree is int for mypy ( #17051 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-23 07:32:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f7912cba3d 
					 
					
						
						
							
							[Doc] Add top anchor and a note to quantization/bitblas.md ( #17042 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-23 07:32:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6317a5174a 
					 
					
						
						
							
							Categorize tests/kernels/ based on kernel type ( #16799 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-23 09:21:07 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa72d9a4ea 
					 
					
						
						
							
							Mistral-format support for compressed-tensors ( #16803 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-23 08:46:23 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce17db8085 
					 
					
						
						
							
							[CI] Run v1/test_serial_utils.py in CI ( #16996 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-23 01:13:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c87a9ad46 
					 
					
						
						
							
							[Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers ( #16964 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-23 07:24:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec69124eb4 
					 
					
						
						
							
							[Misc] Improve readability of get_open_port function. ( #17024 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: gitover22 <qidizou88@gmail.com > 
						
						
					 
					
						2025-04-23 06:16:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d0da99fb70 
					 
					
						
						
							
							[BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) ( #16998 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-22 21:49:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2f195c429 
					 
					
						
						
							
							[V1] Avoid socket errors during shutdown when requests are in in-flight ( #16807 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-23 12:36:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						047797ef90 
					 
					
						
						
							
							[Bugfix] Triton FA function takes no keyword arguments ( #16902 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-04-22 21:35:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb8ef4224d 
					 
					
						
						
							
							[doc] add download path tips ( #17013 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-23 04:06:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56a735261c 
					 
					
						
						
							
							[INTEL-HPU][v0] Port delayed sampling to upstream ( #16949 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai > 
						
						
					 
					
						2025-04-22 20:14:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1cf90e099 
					 
					
						
						
							
							[misc] tune some env vars for GB200 ( #16992 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-23 10:59:48 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6bc1e30ef9 
					 
					
						
						
							
							Revert "[Misc] Add S3 environment variables for better support of MinIO." ( #17021 )  
						
						 
						
						
						
						
					 
					
						2025-04-22 19:22:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e081ba7ca 
					 
					
						
						
							
							[BugFix] Revert ROCm Custom Paged Attention Env Flag Check ( #17022 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-04-22 19:17:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e013fa388 
					 
					
						
						
							
							[V1][DP] More robust DP/EP dummy request coordination ( #16277 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-22 19:12:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc7c4d206b 
					 
					
						
						
							
							[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 ( #13305 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com >
Signed-off-by: maleksan85 <maleksan@amd.com >
Signed-off-by: <>
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com > 
						
						
					 
					
						2025-04-22 19:11:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f67e9e9f22 
					 
					
						
						
							
							add Dockerfile build vllm against torch nightly ( #16936 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang Wang <elainewy@meta.com > 
						
						
					 
					
						2025-04-22 19:08:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						36fe78769f 
					 
					
						
						
							
							[Bugfix] validate urls object for multimodal content parts ( #16990 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-23 09:43:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83d933718c 
					 
					
						
						
							
							[Core][V1][TPU] Enable structured decoding on TPU V1 ( #16499 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-04-22 18:05:23 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5175b884f7 
					 
					
						
						
							
							[BugFix] Remove default multiproc executor collective_rpc timeout ( #17000 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-22 23:27:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5536b30a4c 
					 
					
						
						
							
							Fencing Kernels Tests for enabling on AMD ( #16929 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-04-22 09:32:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f58fb9718 
					 
					
						
						
							
							Add assertion for no objects while hashing hf_config ( #16930 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-22 09:32:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30bc3e0f66 
					 
					
						
						
							
							[FEAT][ROCm]: Support AITER MLA ( #15893 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: qli88 <qiang.li2@amd.com > 
						
						
					 
					
						2025-04-22 09:31:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f34410715f 
					 
					
						
						
							
							[frontend] enhance tool_calls type check ( #16882 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-22 15:40:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68d4c33202 
					 
					
						
						
							
							[Misc] Add S3 environment variables for better support of MinIO. ( #16977 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-22 14:27:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f961d7f6ef 
					 
					
						
						
							
							[BugFix] Pass in correct VLLM config in FlashInfer backend ( #13207 ) ( #16973 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn >
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn > 
						
						
					 
					
						2025-04-22 06:44:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d059110498 
					 
					
						
						
							
							Improve configs - SpeculativeConfig ( #16971 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-22 12:55:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						571e8dd65e 
					 
					
						
						
							
							[Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni ( #16974 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fyabc <suyang.fy@alibaba-inc.com > 
						
						
					 
					
						2025-04-22 12:23:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4b91c927f6 
					 
					
						
						
							
							[Misc] refactor example series ( #16972 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-22 11:44:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e237f0035 
					 
					
						
						
							
							[FEAT][ROCm] Integrate Paged Attention Kernel from AITER ( #15001 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-04-22 02:46:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f7bace7c3 
					 
					
						
						
							
							[Doc] Improve documentation for multimodal CLI args ( #16960 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-22 08:35:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e4d6144232 
					 
					
						
						
							
							[BugFix] Fix incremental detokenization perf issue ( #16963 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-22 08:16:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d32dc603d 
					 
					
						
						
							
							[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS ( #6036 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com >
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com > 
						
						
					 
					
						2025-04-22 09:01:36 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c4ab9f3e71 
					 
					
						
						
							
							[V1] Remove pre-allocation for KV cache ( #16941 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-22 00:52:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2689d5c027 
					 
					
						
						
							
							[Model] Use autoweightloader for mamba ( #16950 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sfeng33 <4florafeng@gmail.com > 
						
						
					 
					
						2025-04-22 07:48:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						acba33a0f1 
					 
					
						
						
							
							[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams ( #16767 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-22 06:02:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a114bf20a3 
					 
					
						
						
							
							[Perf] Optimize _update_states for GPU model runner ( #16910 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: snowcharm <snowcharmqq@gmail.com > 
						
						
					 
					
						2025-04-22 14:01:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3097ce3a32 
					 
					
						
						
							
							[Doc] Update ai_accelerator/hpu-gaudi.inc.md ( #16956 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-22 05:33:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6da9322c8 
					 
					
						
						
							
							[Bugfix] Fix f-string for Python 3.9-3.11 ( #16962 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-21 21:45:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71ce44047f 
					 
					
						
						
							
							Support S3 Sharded loading with RunAI Model Streamer ( #16317 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-21 21:21:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						188b7f9b8c 
					 
					
						
						
							
							[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm ( #15830 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-04-21 20:46:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9b4746950 
					 
					
						
						
							
							[V1] Remove additional_config check ( #16710 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-04-21 20:45:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b8a2ab76f 
					 
					
						
						
							
							[Kernel] Add expert_map support to Cutlass FP8 MOE ( #16861 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com >
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com > 
						
						
					 
					
						2025-04-21 20:44:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9acbf1141 
					 
					
						
						
							
							[Misc] Remove the chunked prefill warning for LoRA  ( #16925 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-21 20:44:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b794cae8d 
					 
					
						
						
							
							[ROCm] Add aiter tkw1 kernel for Llama4 fp8 ( #16727 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com > 
						
						
					 
					
						2025-04-21 20:42:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e4254492f 
					 
					
						
						
							
							[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other ( #16863 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com > 
						
						
					 
					
						2025-04-22 11:40:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1311913f55 
					 
					
						
						
							
							[BugFix][Spec Decode] No in-place update to draft probs ( #16952 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-21 19:54:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29f395c97c 
					 
					
						
						
							
							[Doc] Remove unnecessary V1 flag ( #16924 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-21 21:04:38 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa3bba2a53 
					 
					
						
						
							
							[TPU][V1] Enable Top-P ( #16843 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-22 00:46:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						986537f1c3 
					 
					
						
						
							
							[V1] V1 FlashInfer Attention ( #16684 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Aurick Qiao <qiao@aurick.net > 
						
						
					 
					
						2025-04-22 00:38:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						210207525e 
					 
					
						
						
							
							[TPU][V1] Capture multimodal encoder during model compilation ( #15051 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-04-21 18:36:59 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71eda0bb76 
					 
					
						
						
							
							Update Qwen1.5-MoE-W4A16-compressed-tensors.yaml ( #16946 )  
						
						 
						
						
						
						
					 
					
						2025-04-21 18:35:32 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						471fe65630 
					 
					
						
						
							
							[TPU][V1] Implicitly adjust page size when there's SMEM OOM ( #16871 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-21 15:43:13 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a0fba5cf4 
					 
					
						
						
							
							[V1][Spec Decode] Handle draft tokens beyond max_model_len ( #16087 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-21 12:38:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						299ebb62b2 
					 
					
						
						
							
							[Core] Speed up decode by remove synchronizing operation in sampler ( #16436 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com >
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com > 
						
						
					 
					
						2025-04-21 18:18:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f728ab8e35 
					 
					
						
						
							
							[Doc] mention how to install in CPU editable mode ( #16923 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-04-21 17:45:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63e26fff78 
					 
					
						
						
							
							[doc] install required python3-dev apt package ( #16888 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Xia <david@davidxia.com > 
						
						
					 
					
						2025-04-21 16:15:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe3462c774 
					 
					
						
						
							
							[XPU][Bugfix] minor fix for XPU ( #15591 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yan ma <yan.ma@intel.com > 
						
						
					 
					
						2025-04-22 00:02:57 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b34fd5273 
					 
					
						
						
							
							Raise error for data-parallel with benchmark_throughput ( #16737 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-21 23:51:43 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55d6d3fdb8 
					 
					
						
						
							
							[Bugfix] Fix GLM rotary_dim issue and support v1 ( #16912 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-21 14:26:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7272bfae77 
					 
					
						
						
							
							[Misc] Refactor platform to get device specific stream and event ( #14411 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-21 21:25:49 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d9ac9e3dc5 
					 
					
						
						
							
							[Misc] fix collect_env version parse ( #15267 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-04-21 20:29:40 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d41faaf9df 
					 
					
						
						
							
							Restore buffers when wake up from level 2 sleep ( #16564 ) ( #16889 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Han <zh950713@gmail.com > 
						
						
					 
					
						2025-04-21 20:18:28 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b34f33438a 
					 
					
						
						
							
							[Doc] Split dummy_processor_inputs() in Multimodal Docs ( #16915 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-04-21 11:10:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						26c0406555 
					 
					
						
						
							
							[Bugfix] Fix distributed bug in Qwen2.5-VL & Qwen2.5-Omni ( #16907 )  
						
						 
						
						
						
						
					 
					
						2025-04-21 10:25:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c41278b77 
					 
					
						
						
							
							[CI/CD][V1] Add spec decode tests to CI ( #16900 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-20 22:37:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb3605db85 
					 
					
						
						
							
							[Bugfix] Fix v1/spec_decode/test_ngram.py ( #16895 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qizixi <qizixi@meta.com > 
						
						
					 
					
						2025-04-20 20:54:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe742aef5a 
					 
					
						
						
							
							[easy] Pass compile_fx only the config patches ( #16845 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-20 12:25:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4b07d36891 
					 
					
						
						
							
							Improve configs - CacheConfig ( #16835 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-20 12:25:04 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						87aaadef73 
					 
					
						
						
							
							Serialize tensors using int8 views ( #16866 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Staszek Pasko <staszek@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-19 10:28:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						682e0b6d2f 
					 
					
						
						
							
							Log how much time loading a compiled artifact takes ( #16848 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-19 16:50:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6195a748b 
					 
					
						
						
							
							[doc] update hyperlink ( #16877 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-19 16:40:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						205d84aaa9 
					 
					
						
						
							
							[VLM] Clean up models ( #16873 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-19 12:13:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5124f5bf51 
					 
					
						
						
							
							[Model] Qwen2.5-Omni Cleanup  ( #16872 )  
						
						 
						
						
						
						
					 
					
						2025-04-19 09:37:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83f3c3bd91 
					 
					
						
						
							
							[Model] Refactor Phi-4-multimodal to use merged processor and support V1 ( #15477 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-19 02:26:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d9737ca1c6 
					 
					
						
						
							
							[V1][Misc] stop update prefix cache stats when logs_stats is disabled ( #16460 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vie-serendipity <2733147505@qq.com > 
						
						
					 
					
						2025-04-19 02:25:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d4ca19d50 
					 
					
						
						
							
							[Misc] Benchmarks for audio models ( #16505 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-19 02:24:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ef0dc53b8 
					 
					
						
						
							
							[Frontend] Add sampling params to v1/audio/transcriptions endpoint ( #16591 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jannis Schönleber <joennlae@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Jannis Schönleber <joennlae@gmail.com > 
						
						
					 
					
						2025-04-19 07:03:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1d4680fad2 
					 
					
						
						
							
							[rocm][MI300] llama4 maverick fp8 moe config tp8 ( #16847 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com > 
						
						
					 
					
						2025-04-19 06:21:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c1bd848a6 
					 
					
						
						
							
							[Model][VLM] Add Qwen2.5-Omni model support (thinker only) ( #15130 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fyabc <suyang.fy@alibaba-inc.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Xiong Wang <wangxiongts@163.com > 
						
						
					 
					
						2025-04-18 23:14:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c9121203c 
					 
					
						
						
							
							[release] Publish neuron docker image ( #16733 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com > 
						
						
					 
					
						2025-04-18 17:11:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						490b1698a5 
					 
					
						
						
							
							[Doc] Updated Llama section in tool calling docs to have llama 3.2 config info ( #16857 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jmho <jaylenho734@gmail.com > 
						
						
					 
					
						2025-04-18 23:28:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a5e29de88 
					 
					
						
						
							
							[Misc] refactor examples series - Chat Completion Client With Tools ( #16829 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-18 23:24:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ec11b459c 
					 
					
						
						
							
							Update convert_to_csv.py  
						
						 
						
						
						
						
					 
					
						2025-04-18 09:54:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d3ab3689f 
					 
					
						
						
							
							[New Model]: Snowflake Arctic Embed (Family)  ( #16649 )  
						
						 
						
						
						
						
					 
					
						2025-04-18 08:11:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						686623c5e7 
					 
					
						
						
							
							Fix nullable_kvs fallback ( #16837 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-18 05:58:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aadb656562 
					 
					
						
						
							
							[Misc] Clean up Kimi-VL ( #16833 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-18 05:15:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						87e067de41 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for BigCode, GPT-J ( #16823 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com > 
						
						
					 
					
						2025-04-18 10:42:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						26507f8973 
					 
					
						
						
							
							[Docs] Fix a link and grammar issue in production-stack.md ( #16809 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-18 06:42:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9c1d5b456d 
					 
					
						
						
							
							[Doc] add podman setup instructions for official image ( #16796 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nathan Weinberg <nweinber@redhat.com > 
						
						
					 
					
						2025-04-18 06:10:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e31045f95c 
					 
					
						
						
							
							[Bugfix] fix pp for llama4 ( #16746 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-04-18 13:51:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aaec845f8e 
					 
					
						
						
							
							[ROCm] [Attention] Cleanup ROCm output passing ( #16431 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Luka Govedič <lgovedic@redhat.com > 
						
						
					 
					
						2025-04-18 05:46:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7bdfd29a35 
					 
					
						
						
							
							[Misc] add collect_env to cli and docker image ( #16759 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-17 22:13:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e78587a64c 
					 
					
						
						
							
							Improve-mm-and-pooler-and-decoding-configs ( #16789 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-17 22:13:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7eb4255628 
					 
					
						
						
							
							[BugFix] Accuracy fix for llama4 int4 - improperly casted scales ( #16801 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-17 22:13:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a0f547561 
					 
					
						
						
							
							Add hardware print to TPU V1 test ( #16792 )  
						
						 
						
						
						
						
					 
					
						2025-04-17 22:13:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30ed81b7ca 
					 
					
						
						
							
							[V1][Structured Output] Minor modification to _validate_structured_output() ( #16748 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-18 13:12:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7a4a5de729 
					 
					
						
						
							
							[Misc] Update outdated note: LMCache now supports chunked prefill ( #16697 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-18 05:12:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c16fb5dae8 
					 
					
						
						
							
							[Doc] Improve help examples for --compilation-config ( #16729 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-17 21:22:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e37073efd7 
					 
					
						
						
							
							Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema ( #16721 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tarun Kumar <takumar@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-17 21:08:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						183dad7a85 
					 
					
						
						
							
							[Attention] Update to lastest FA3 code ( #13111 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-17 15:14:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3408e47159 
					 
					
						
						
							
							[P/D][V1] KV Connector API V1 ( #15960 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com > 
						
						
					 
					
						2025-04-17 13:22:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0377b8310b 
					 
					
						
						
							
							[MLA] Simplification to batch P/D reordering ( #16673 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-17 16:12:09 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e4755f7fac 
					 
					
						
						
							
							[V1][Metrics] Fix http metrics middleware ( #15894 )  
						
						 
						
						
						
						
					 
					
						2025-04-17 19:52:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						92edf35826 
					 
					
						
						
							
							[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints ( #16674 )  
						
						 
						
						
						
						
					 
					
						2025-04-17 11:44:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb5819b2d9 
					 
					
						
						
							
							[V1][TPU] Enable Top K ( #15489 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com > 
						
						
					 
					
						2025-04-17 18:18:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5989f4684d 
					 
					
						
						
							
							[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even ( #16726 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-17 18:09:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5125d72f02 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small ( #16548 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-17 17:48:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a018e555fd 
					 
					
						
						
							
							[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 ( #16753 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com >
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com > 
						
						
					 
					
						2025-04-18 00:01:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6211b92273 
					 
					
						
						
							
							[Bugfix]Fix index out of range error in api server log ( #16787 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-04-17 09:01:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05fcd1b430 
					 
					
						
						
							
							[V1][Perf] Faster incremental detokenization ( #15137 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-17 07:45:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c02d6a137 
					 
					
						
						
							
							[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion ( #16784 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: insukim1994 <insu.kim@moreh.io > 
						
						
					 
					
						2025-04-17 14:10:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11c3b98491 
					 
					
						
						
							
							[Doc] Document Matryoshka Representation Learning support ( #16770 )  
						
						 
						
						
						
						
					 
					
						2025-04-17 13:37:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dbe7f07001 
					 
					
						
						
							
							[Doc] Make sure to update vLLM when installing latest code ( #16781 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-17 06:53:31 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c69bf4ee06 
					 
					
						
						
							
							fix: hyperlink ( #16778 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-17 11:34:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d27ea94034 
					 
					
						
						
							
							Improve configs - TokenizerPoolConfig + DeviceConfig ( #16603 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-17 11:19:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99ed526101 
					 
					
						
						
							
							[Misc] refactor examples series - lmcache ( #16758 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-17 11:02:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						207da28186 
					 
					
						
						
							
							[Doc] Fix a 404 link in installation/cpu.md ( #16773 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: windsonsea <haifeng.yao@daocloud.io > 
						
						
					 
					
						2025-04-17 10:46:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b1aca2ae3 
					 
					
						
						
							
							[Bugfix] Fix GLM4 model ( #16618 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: intervitens <intervitens@tutanota.com > 
						
						
					 
					
						2025-04-17 03:35:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d8e557b5e5 
					 
					
						
						
							
							[doc] add open-webui example ( #16747 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-17 18:27:32 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61a44a0b22 
					 
					
						
						
							
							[Doc] Add more tips to avoid OOM ( #16765 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-17 09:54:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6481525b8 
					 
					
						
						
							
							[misc] ignore marlin_moe_wna16 local gen codes ( #16760 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-17 17:15:14 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8cac35ba43 
					 
					
						
						
							
							[Ray] Improve documentation on batch inference ( #16609 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Liaw <rliaw@berkeley.edu > 
						
						
					 
					
						2025-04-16 22:19:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9dbf7a2dc1 
					 
					
						
						
							
							[V1] Remove log noise when idle ( #16735 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-16 21:34:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						607029e515 
					 
					
						
						
							
							[Bugfix] Revert max_prompt_len validation for decoder-only models. ( #16741 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: David Heineman <david@davidheineman.com > 
						
						
					 
					
						2025-04-16 21:33:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb072ce93b 
					 
					
						
						
							
							[Bugfix] Update Florence-2 tokenizer to make grounding tasks work ( #16734 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-17 04:17:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95aca283b4 
					 
					
						
						
							
							[rocm][V0] fix selection logic for custom PA in V0 ( #16426 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com > 
						
						
					 
					
						2025-04-16 19:52:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b05b8ce69 
					 
					
						
						
							
							[V1][Frontend] Improve Shutdown And Logs ( #11737 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-16 19:48:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c776dcefb 
					 
					
						
						
							
							Adding vllm buildkite job for IBM Power ( #16679 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com > 
						
						
					 
					
						2025-04-17 10:47:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2cbd4d2999 
					 
					
						
						
							
							[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification ( #16636 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bryan Lu <yuzhelu@amazon.com > 
						
						
					 
					
						2025-04-16 19:47:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3092375e27 
					 
					
						
						
							
							[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] ( #16432 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Staszek Pasko <staszek@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-16 19:28:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3cd91dc955 
					 
					
						
						
							
							Help user create custom model for Transformers backend remote code models ( #16719 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-17 01:05:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a7368e069 
					 
					
						
						
							
							[Misc] Remove redundant comment ( #16703 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com > 
						
						
					 
					
						2025-04-17 00:44:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						93e561ec4d 
					 
					
						
						
							
							Improve error for structured output backend selection ( #16717 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-17 00:35:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1b004839a 
					 
					
						
						
							
							[Hardware] Add processor inputs to platform validation ( #16680 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-04-16 09:28:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ee378f3d49 
					 
					
						
						
							
							[Model] support modernbert  ( #16648 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com >
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com > 
						
						
					 
					
						2025-04-16 05:30:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e82ee40de3 
					 
					
						
						
							
							[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel ( #16693 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-16 03:31:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						facbe2a114 
					 
					
						
						
							
							[Doc] Improve OOM troubleshooting ( #16704 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-16 18:29:48 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7168920491 
					 
					
						
						
							
							[Misc] refactor examples series ( #16708 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-16 10:16:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21378a2323 
					 
					
						
						
							
							[CI] Cleanup additional_dependencies: [toml] for pre-commit yapf hook ( #16405 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-04-16 10:05:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						976711d9db 
					 
					
						
						
							
							[V1][Structured Output] Move xgrammar related utils to backend_xgrammar.py ( #16578 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-16 17:01:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44fa4d556c 
					 
					
						
						
							
							[ROCM] Bind triton version to 3.2 in requirements-built.txt  ( #16664 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-04-16 14:05:28 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ac98edcb1 
					 
					
						
						
							
							[Feature] add model aware kv ops helper ( #16020 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: billishyahao <bill.he@amd.com > 
						
						
					 
					
						2025-04-15 23:00:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						966c742ed2 
					 
					
						
						
							
							Disable remote caching when calling compile_fx ( #16611 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-15 22:18:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d7d05f4b6 
					 
					
						
						
							
							[Misc] Modify LRUCache touch ( #16689 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-16 04:51:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						96bb8aa68b 
					 
					
						
						
							
							[Bugfix] fix gpu docker image mis benchmarks dir ( #16628 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-15 21:21:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3badb0213b 
					 
					
						
						
							
							[Model] Add PLaMo2 ( #14323 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
Signed-off-by: shemmi <shemmi@preferred.jp >
Co-authored-by: Kento Nozawa <nzw0301@preferred.jp >
Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp >
Co-authored-by: Calvin Metzger <metzger@preferred.jp > 
						
						
					 
					
						2025-04-15 19:31:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdcb850f14 
					 
					
						
						
							
							[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server ( #10546 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local >
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local > 
						
						
					 
					
						2025-04-15 22:31:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54a66e5fee 
					 
					
						
						
							
							[Misc] Update compressed-tensors WNA16 to support zero-points ( #14211 )  
						
						 
						
						
						
						
					 
					
						2025-04-15 07:33:51 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						280d62b8a2 
					 
					
						
						
							
							[Kernel] Remove redundant Exp calculations ( #16123 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-15 12:58:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1666e66443 
					 
					
						
						
							
							Add "/server_info" endpoint in api_server to retrieve the vllm_config.  ( #16572 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xihui Cang <xihuicang@gmail.com > 
						
						
					 
					
						2025-04-15 11:50:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1575c1701a 
					 
					
						
						
							
							[CI/Build] Fix LoRA OOM ( #16624 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-15 16:38:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ae996a873 
					 
					
						
						
							
							[Misc] refactor argument parsing in examples ( #16635 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-15 08:05:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b590adfdc1 
					 
					
						
						
							
							Fix vLLM x torch.compile config caching ( #16491 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-14 23:11:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4fe16c75b 
					 
					
						
						
							
							Add vllm bench [latency, throughput] CLI commands ( #16508 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-14 23:10:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc5dd4f669 
					 
					
						
						
							
							[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) ( #16631 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io > 
						
						
					 
					
						2025-04-14 23:09:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dbb036cf61 
					 
					
						
						
							
							[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py ( #16623 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-04-15 05:35:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70e7ed841d 
					 
					
						
						
							
							[BugFix]: Update minimum pyzmq version ( #16549 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
Co-authored-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-04-14 20:06:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d06ba4ed3f 
					 
					
						
						
							
							[Kernel] moe wna16 marlin kernel ( #14447 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-14 20:05:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6b40996ae8 
					 
					
						
						
							
							[Core][Bugfix] Fix Offline MM Beam Search ( #16390 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-15 10:33:02 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d2020acac7 
					 
					
						
						
							
							config check sleep mode support oot platforms ( #16562 )  
						
						 
						
						
						
						
					 
					
						2025-04-14 16:31:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1eb3c2ed48 
					 
					
						
						
							
							[DOC][TPU] Add core idea about avoiding recompilation after warmup ( #16614 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-14 21:56:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c64ee87267 
					 
					
						
						
							
							[Hardware][TPU] Add torchvision to tpu dependency file ( #16616 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-04-14 17:50:46 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1308b84a3 
					 
					
						
						
							
							[Model][VLM] Add Kimi-VL model support ( #16387 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: courage17340 <courage17340@163.com > 
						
						
					 
					
						2025-04-14 21:41:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b5ecf79bd 
					 
					
						
						
							
							s390x: Fix PyArrow build and add CPU test script for Buildkite CI ( #16036 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nishan Acharya <Nishan.Acharya@ibm.com > 
						
						
					 
					
						2025-04-14 10:55:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9883a18859 
					 
					
						
						
							
							Fix triton install condition on CPU ( #16600 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-14 17:06:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3f2fddd17 
					 
					
						
						
							
							[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 ( #16596 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-14 17:01:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa29841ede 
					 
					
						
						
							
							[Bugfix] Multi-modal caches not acting like LRU caches ( #16593 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-14 09:24:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6bf27affb6 
					 
					
						
						
							
							[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet ( #16048 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com > 
						
						
					 
					
						2025-04-14 17:08:39 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1dd23386ec 
					 
					
						
						
							
							[Misc] Update usage with mooncake lib for kv transfer ( #16523 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-04-14 11:31:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7cbfc10943 
					 
					
						
						
							
							[Misc] refactor examples ( #16563 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-14 09:59:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce4ddd2d1a 
					 
					
						
						
							
							[Misc] remove warning if triton>=3.2.0 ( #16553 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-14 02:39:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e51929ebca 
					 
					
						
						
							
							Improve configs - SchedulerConfig ( #16533 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-14 17:24:16 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc1b4a6f13 
					 
					
						
						
							
							[Core][V0] Enable regex support with xgrammar ( #13228 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-14 10:13:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63d2705edb 
					 
					
						
						
							
							[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py ( #16556 )  
						
						 
						
						
						
						
					 
					
						2025-04-13 17:20:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d085a44082 
					 
					
						
						
							
							Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) ( #16537 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-13 14:55:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f49e5aff11 
					 
					
						
						
							
							[V1][Spec Decode] KV cache slots for eagle heads ( #16370 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com > 
						
						
					 
					
						2025-04-12 19:42:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c11ecf8d3 
					 
					
						
						
							
							[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine ( #16529 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ryan McConville <ryan@ryanmcconville.com > 
						
						
					 
					
						2025-04-12 20:19:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						93e5f3c5fb 
					 
					
						
						
							
							[Perf] Optimize Preparing Inputs for GPU Model Runner ( #16484 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: snowcharm <snowcharmqq@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-12 22:54:37 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70363bccfa 
					 
					
						
						
							
							Fix syntaxWarning: invalid escape sequence '\s' ( #16532 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jie Fu <jiefu@tencent.com > 
						
						
					 
					
						2025-04-12 14:39:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3cdc57669f 
					 
					
						
						
							
							[Misc] Delete redundant code ( #16530 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-04-12 11:21:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68bb122eb4 
					 
					
						
						
							
							[MISC] Make GroupCoordinator compatible with out-of-tree devices ( #16464 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: hzji210@gmail.com  <hzji210@gmail.com > 
						
						
					 
					
						2025-04-12 09:20:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d9fc8cd9da 
					 
					
						
						
							
							[V1] Enable multi-input by default ( #15799 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-12 08:52:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f069f3ea74 
					 
					
						
						
							
							[Misc] Openai transcription client example use same Whisper model ( #16487 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-12 07:27:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c5bc0e7fcc 
					 
					
						
						
							
							[Misc] Update chat utils tests ( #16520 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-12 06:48:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a3a518722 
					 
					
						
						
							
							fix: spelling ( #16466 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tianer Zhou <ezhoureal@gmail.com > 
						
						
					 
					
						2025-04-11 23:24:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fbf722c6e6 
					 
					
						
						
							
							[Frontend] support matryoshka representation / support embedding API dimensions ( #16331 )  
						
						 
						
						
						
						
					 
					
						2025-04-11 23:23:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e92d7085bf 
					 
					
						
						
							
							[Feature][V1] Add xgrammar to support minLength, maxLength with test ( #16516 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Leon Seidel <leon.seidel@fau.de > 
						
						
					 
					
						2025-04-11 23:22:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd6028d6b0 
					 
					
						
						
							
							Optimized topk for topk=1 (Llama-4) ( #16512 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-12 14:21:08 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						802329dee9 
					 
					
						
						
							
							[Doc] Update Llama4 Model Names in Supported Models ( #16509 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-04-12 02:53:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41cc883c29 
					 
					
						
						
							
							[BugFix] Handle non-contiguous tensors properly when serializing ( #16492 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-11 17:54:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57504a4bcf 
					 
					
						
						
							
							[CI][Bugfix] Add mistral_tool_use to Ci ( #16517 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-11 17:52:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed4792c990 
					 
					
						
						
							
							[Doc] Fix link to vLLM blog ( #16519 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-04-11 17:39:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						87b836ba77 
					 
					
						
						
							
							Bugfix for PixtralHF models without spatial_merge_size ( #16513 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-11 23:32:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56c76c2e0e 
					 
					
						
						
							
							[Bugfix] clean up duplicated code ( #16485 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gogs <gogs@fake.local >
Co-authored-by: Gogs <gogs@fake.local > 
						
						
					 
					
						2025-04-11 23:19:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c09632a66c 
					 
					
						
						
							
							Update openai_compatible_server.md ( #16507 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Christian Sears <csears@redhat.com > 
						
						
					 
					
						2025-04-11 22:54:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a3bf8d4a2b 
					 
					
						
						
							
							[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100  ( #16488 )  
						
						 
						
						
						
						
					 
					
						2025-04-12 06:26:55 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						16eda8c43a 
					 
					
						
						
							
							[Frontend] Added chat templates for LLaMa4 pythonic tool calling ( #16463 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Kai Wu <kaiwu@meta.com > 
						
						
					 
					
						2025-04-12 06:26:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd77382ac1 
					 
					
						
						
							
							Improve configs - LoadConfig ( #16422 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-11 20:27:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71b9cde010 
					 
					
						
						
							
							[Bugfix] handle alignment of encoder_seq_lens in mllama.py ( #14784 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com > 
						
						
					 
					
						2025-04-11 19:59:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5285589f37 
					 
					
						
						
							
							[Doc] Document InternVL3 support ( #16495 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-11 19:41:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f41647ee6b 
					 
					
						
						
							
							[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel ( #16366 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-11 17:54:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d022cbc75 
					 
					
						
						
							
							[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models ( #16483 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-11 17:06:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70de35a881 
					 
					
						
						
							
							Fix erroneous "model doesn't support compile" warning ( #16486 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rzou <zou3519@gmail.com > 
						
						
					 
					
						2025-04-11 16:24:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34b2cf3b33 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU ( #12779 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com > 
						
						
					 
					
						2025-04-11 07:38:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9e90c9f73f 
					 
					
						
						
							
							[Bugfix] Fix bugs of running Quark quantized models ( #16236 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaow <chaow@amd.com > 
						
						
					 
					
						2025-04-11 10:18:32 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e9528f6dc6 
					 
					
						
						
							
							[Kernel] support merge_attn_states CUDA kernel, 3x speedup ( #16173 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-04-11 06:50:50 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51baa9c333 
					 
					
						
						
							
							Don't install triton on ppc64le platform ( #16470 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-11 10:11:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35e076b3a8 
					 
					
						
						
							
							[Misc] update api_client example ( #16459 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-11 10:05:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a26f59ccbc 
					 
					
						
						
							
							[Misc] Raise error for V1 not supporting Long LoRA. ( #16415 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-11 01:51:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa3b3d76e0 
					 
					
						
						
							
							Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True ( #16447 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-11 08:09:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f7030df3be 
					 
					
						
						
							
							[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner ( #15990 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-11 15:32:37 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						905e91e9ac 
					 
					
						
						
							
							Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" ( #16453 )  
						
						 
						
						
						
						
					 
					
						2025-04-11 06:44:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f8f9c0ba62 
					 
					
						
						
							
							[Bugfix] Don't set an upper bound on repetition penalty ( #16403 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-11 14:19:40 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dda811021a 
					 
					
						
						
							
							[CPU][Bugfix] Fix CPU docker issues ( #16454 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-04-11 14:19:07 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						93195146ea 
					 
					
						
						
							
							[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test ( #16424 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-11 04:57:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed37599544 
					 
					
						
						
							
							Update supported_hardware.md for TPU INT8 ( #16437 )  
						
						 
						
						
						
						
					 
					
						2025-04-11 12:28:07 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99ef59cf7f 
					 
					
						
						
							
							[Llama4] Enable attention temperature tuning by default for long context (>32k) ( #16439 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-04-10 21:26:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d544d141ec 
					 
					
						
						
							
							update benchmark_serving_structured_output to include auto backend ( #16438 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-04-11 12:25:52 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e397a9484 
					 
					
						
						
							
							check input length of sonnet samples ( #16423 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com > 
						
						
					 
					
						2025-04-11 10:15:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						268c325078 
					 
					
						
						
							
							Fix range_ratio Bug in RandomDataset ( #16126 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jadewang21 <jadewangcn@outlook.com > 
						
						
					 
					
						2025-04-10 15:31:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3cc9af88ff 
					 
					
						
						
							
							[TPU][V1] Disable per-request seed/Generator ( #16172 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-10 17:05:44 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7cd0bd7212 
					 
					
						
						
							
							[Bugfix] Fix output token length check logic ( #16419 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: look <eeslook@163.com > 
						
						
					 
					
						2025-04-10 20:16:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56d4aefa33 
					 
					
						
						
							
							[VLM] Avoid unnecessary dummy multimodal data during processing ( #16416 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-10 19:32:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd143ef541 
					 
					
						
						
							
							[V1] Zero-copy tensor/ndarray serialization/transmission ( #13790 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-10 19:23:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						daefed052c 
					 
					
						
						
							
							[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B ( #15423 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com > 
						
						
					 
					
						2025-04-10 19:07:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5fbab20e02 
					 
					
						
						
							
							[Bugfix] Fix bug when dataset is json ( #15899 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-04-10 18:35:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e8224f3dca 
					 
					
						
						
							
							[V1][Spec Decode] Eagle Model loading ( #16035 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com > 
						
						
					 
					
						2025-04-10 11:21:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9665313c39 
					 
					
						
						
							
							[V1] Set structured output backend to auto by default ( #15724 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-10 17:53:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c54fc7273 
					 
					
						
						
							
							Improve configs - ParallelConfig ( #16332 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-10 17:34:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c1b57855ec 
					 
					
						
						
							
							[TPU][V1] Use language_model interface for getting text backbone in MM ( #16410 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-10 17:32:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83b824c8b4 
					 
					
						
						
							
							[VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item ( #16408 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-10 09:06:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7678fcd5b6 
					 
					
						
						
							
							Fix the torch version parsing logic ( #15857 )  
						
						 
						
						
						
						
					 
					
						2025-04-10 07:37:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8661c0241d 
					 
					
						
						
							
							[CI] Add auto update workflow for Dockerfile graph ( #11879 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wineandchord <guoqizhou19@gmail.com > 
						
						
					 
					
						2025-04-10 13:43:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce8d6b75fc 
					 
					
						
						
							
							[doc] update the wrong link ( #16401 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-10 21:02:37 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61de3ef74b 
					 
					
						
						
							
							[Model] Remove image mm limit for LLaMa4  ( #16365 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com > 
						
						
					 
					
						2025-04-10 09:36:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec1f9c8c91 
					 
					
						
						
							
							Update Numba to 0.61.2 ( #16376 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cyy <cyyever@outlook.com > 
						
						
					 
					
						2025-04-10 07:59:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						65e09094c4 
					 
					
						
						
							
							[doc] add download model tips ( #16389 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-10 07:45:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c70cf0fe06 
					 
					
						
						
							
							[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models ( #16038 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-10 15:08:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a5d11a54dc 
					 
					
						
						
							
							[Bugfix] Fix validation error for text-only Mllama 3.2 ( #16377 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-10 14:19:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d4c87758e 
					 
					
						
						
							
							[Misc] Update transformers version limits of multi-modal tests ( #16381 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-09 23:03:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a9bd832fc5 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for deepseek_v2, internlm2 ( #16383 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Ang <aaron.angyd@gmail.com > 
						
						
					 
					
						2025-04-09 23:01:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						417bcefbae 
					 
					
						
						
							
							fix sonnet dataset sample when prefix len is very small ( #16379 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <chenyangli@google.com > 
						
						
					 
					
						2025-04-10 05:35:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						baada0e737 
					 
					
						
						
							
							[Bugfix][TPU] Fix TPU validate_request ( #16369 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-10 12:55:12 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82eb61dd4c 
					 
					
						
						
							
							[misc] use tqdm.auto where appropriate ( #16290 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Kitor <bkitor@gigaio.com > 
						
						
					 
					
						2025-04-09 21:54:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d4d06fe2f 
					 
					
						
						
							
							[CI][Bugfix] Pin triton version for CPU ( #16384 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-10 04:35:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4aed0ca6a2 
					 
					
						
						
							
							[bugfix] Avoid the time consumption caused by creating dummy videos. ( #16371 )  
						
						 
						
						
						
						
					 
					
						2025-04-10 04:30:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1621b25288 
					 
					
						
						
							
							[TPU] Fix dummy loading OOM ( #16372 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-10 04:06:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a564797151 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral ( #16325 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Ang <aaron.angyd@gmail.com > 
						
						
					 
					
						2025-04-09 20:07:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1da6a09274 
					 
					
						
						
							
							[Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer ( #14094 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-09 19:43:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e44ffc3ff 
					 
					
						
						
							
							Add GLM-4-0414 support ( #16338 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: yihong <zouzou0208@gmail.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-10 09:19:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a454748544 
					 
					
						
						
							
							[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues ( #16275 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-09 18:51:51 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1bff42c4b7 
					 
					
						
						
							
							[Misc] refactor Structured Outputs example ( #16322 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-09 23:32:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb391d85dc 
					 
					
						
						
							
							[Hardware] add platform-specific request validation api ( #16291 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-04-09 12:50:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fee5b8d37f 
					 
					
						
						
							
							[Build/CI] Add tracing deps to vllm container image ( #15224 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-09 19:14:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2ce859bd2 
					 
					
						
						
							
							Fix benchmark_throughput.py --backend=hf ( #16352 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-09 19:09:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						566f10a929 
					 
					
						
						
							
							[CI]Fix hpu docker and numpy version for CI ( #16355 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chendi Xue <chendi.xue@intel.com > 
						
						
					 
					
						2025-04-09 17:52:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c3b5189137 
					 
					
						
						
							
							[Bugfix] catch AssertionError in MistralTokenizer as ValueError ( #16344 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-09 17:33:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a25866ac8d 
					 
					
						
						
							
							[Bugfix] Fix profiling.py ( #16202 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zh Wang <rekind133@outlook.com > 
						
						
					 
					
						2025-04-09 17:03:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						098900d7c2 
					 
					
						
						
							
							Revert "Update label-tpu mergify and remove removal bot" ( #16350 )  
						
						 
						
						
						
						
					 
					
						2025-04-09 07:59:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98d01d3ce2 
					 
					
						
						
							
							[Bugfix][Frontend] respect provided default guided decoding backend ( #15476 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-04-09 05:11:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d55244df31 
					 
					
						
						
							
							[Model] Add SupportsMultiModal.get_language_model interface ( #16007 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-09 04:12:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04149cce27 
					 
					
						
						
							
							[BugFix] fix some typos found by typos. ( #16314 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-09 03:43:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24834f4894 
					 
					
						
						
							
							update neuron config ( #16289 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ajay Vohra <ajayvohr@amazon.com > 
						
						
					 
					
						2025-04-09 03:43:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec7da6fcf3 
					 
					
						
						
							
							[BugFix] llama4 qknorm should be not shared across head ( #16311 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-04-09 00:59:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						819d548e8a 
					 
					
						
						
							
							[BugFix] logger is not callable ( #16312 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-09 00:59:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						477d2a8aa2 
					 
					
						
						
							
							Update label-tpu mergify and remove removal bot ( #16298 )  
						
						 
						
						
						
						
					 
					
						2025-04-09 07:56:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e484e02857 
					 
					
						
						
							
							[Bugfix] Avoid transferring cached multi-modal items from P0 to P1 ( #16273 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-09 00:51:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24f6b9a713 
					 
					
						
						
							
							[Misc] Fix test_sharded_state_loader.py( #16004 ) ( #16005 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com > 
						
						
					 
					
						2025-04-09 14:47:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9cdde47289 
					 
					
						
						
							
							[BugFix] Fix fusion test and add them to CI ( #16287 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-04-08 23:46:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1eb4ca152 
					 
					
						
						
							
							[TPU] Update PyTorch/XLA ( #16288 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-09 14:46:32 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						87b4ac56c2 
					 
					
						
						
							
							[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding ( #16221 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-09 04:14:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb84e45ac7 
					 
					
						
						
							
							[Core] Upgrade to xgrammar 0.1.18, add cache size limit ( #16283 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-08 19:13:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4716377fbc 
					 
					
						
						
							
							[Feature] Estimate max-model-len use available KV cache memory ( #16168 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-08 19:12:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e9cf8c1dd 
					 
					
						
						
							
							[Bugfix] fix gettid method is not define ( #16084 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-08 19:12:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2976dc27e9 
					 
					
						
						
							
							[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs ( #16198 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com > 
						
						
					 
					
						2025-04-08 19:12:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						102bf967f0 
					 
					
						
						
							
							[Model] Add smolvlm support ( #16017 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-08 19:12:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f4b09b525 
					 
					
						
						
							
							Add support to modelopt quantization of Mixtral model ( #15961 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yue <yueshen@nvidia.com > 
						
						
					 
					
						2025-04-09 01:53:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86c3369eb8 
					 
					
						
						
							
							[CI/Build] Fix CI LoRA failure ( #16270 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-09 09:13:56 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2755c34a8f 
					 
					
						
						
							
							[V1] Update structured output offline inference example ( #15721 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-08 22:34:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						db10422184 
					 
					
						
						
							
							[Bugfix] fix deepseek fp16 scale bug ( #14809 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-08 16:56:09 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1a2c699dd 
					 
					
						
						
							
							[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context ( #16209 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-04-08 18:56:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0115ccd5c0 
					 
					
						
						
							
							Add warning that content below line in template will be removed ( #16276 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-08 18:18:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40b4284fe3 
					 
					
						
						
							
							[Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear ( #15328 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-08 10:02:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ebc0b9640 
					 
					
						
						
							
							[Bugfix] Proper input validation for multi-modal encoder-decoder models ( #16156 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-08 09:45:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc96fd54c6 
					 
					
						
						
							
							[Misc] Avoid stripping meaningful whitespace from nvidia-smi topo -m output in collect_env.py ( #16272 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: imkero <kerorek@outlook.com > 
						
						
					 
					
						2025-04-08 16:08:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f5d13ab9f 
					 
					
						
						
							
							[New Model]: jinaai/jina-embeddings-v3 ( #16120 )  
						
						 
						
						
						
						
					 
					
						2025-04-08 08:39:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90cb44eb02 
					 
					
						
						
							
							Update to transformers==4.51.1 ( #16257 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-08 06:53:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e11880deea 
					 
					
						
						
							
							[Bugfix] Remove triton do_bench fast_flush arg ( #16256 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-04-08 13:51:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9351f91be9 
					 
					
						
						
							
							[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm ( #16247 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com > 
						
						
					 
					
						2025-04-08 05:10:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a1e1c8353 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe ( #16203 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-08 04:05:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69ecaa7c79 
					 
					
						
						
							
							[Misc] Add warning for multimodal data in LLM.beam_search ( #16241 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-04-08 04:05:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f00899ff7 
					 
					
						
						
							
							[Misc] format and refactor some examples ( #16252 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-08 10:42:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						995e3d1f41 
					 
					
						
						
							
							[Docs] Add Slides from Singapore Meetup ( #16213 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-08 07:20:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4ac449a83 
					 
					
						
						
							
							[Misc] Merge the logs of pp layers partitions ( #16225 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-04-08 00:18:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8e5314a468 
					 
					
						
						
							
							[V1] Add disable_chunked_mm_input arg to disable partial mm input prefill ( #15837 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-07 23:24:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						87918e40c4 
					 
					
						
						
							
							[torch.compile][TPU] Make @support_torch_compile work for XLA backend ( #15782 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-08 14:23:53 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f6b32efb7f 
					 
					
						
						
							
							[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version ( #16194 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-08 13:38:13 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b99733d092 
					 
					
						
						
							
							[Bugfix] Do not skip "empty" parts of chats that are parsable ( #16219 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-08 05:14:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05a015d6a5 
					 
					
						
						
							
							Add warning for Attention backends that do not support irope yet ( #16212 )  
						
						 
						
						
						
						
					 
					
						2025-04-08 03:59:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad971af8c7 
					 
					
						
						
							
							[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 ( #16161 )  
						
						 
						
						
						
						
					 
					
						2025-04-07 20:48:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f2ebb6f541 
					 
					
						
						
							
							[V1] Scatter and gather placeholders in the model runner ( #16076 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com > 
						
						
					 
					
						2025-04-08 10:43:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1d01211264 
					 
					
						
						
							
							Update BASE_IMAGE to 2.22 release of Neuron ( #16218 )  
						
						 
						
						
						
						
					 
					
						2025-04-07 19:11:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f94ab12f79 
					 
					
						
						
							
							[Misc] Update compressed-tensors to version 0.9.3 ( #16196 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com > 
						
						
					 
					
						2025-04-07 19:09:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a865bc1ca6 
					 
					
						
						
							
							[core] do not send error across process ( #16174 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-07 19:09:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21802c4b6d 
					 
					
						
						
							
							[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping ( #16031 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-04-07 21:28:14 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						652907b354 
					 
					
						
						
							
							Torchao ( #14231 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: drisspg <drisspguessous@gmail.com > 
						
						
					 
					
						2025-04-07 19:39:28 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24f1c01e0f 
					 
					
						
						
							
							[Bugfix][V0] XGrammar structured output supports Enum ( #15878 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Leon Seidel <leon.seidel@fau.de > 
						
						
					 
					
						2025-04-07 22:38:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fad6e2538e 
					 
					
						
						
							
							[Misc] add description attribute in CLI ( #15921 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-07 22:30:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f6d47c1a2 
					 
					
						
						
							
							[V1][BugFix] Exit properly if engine core fails during startup ( #16137 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-07 15:30:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3147586ebd 
					 
					
						
						
							
							[Bugfix] Fix guidance backend for Qwen models ( #16210 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-04-07 22:15:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed636d99ca 
					 
					
						
						
							
							[Misc] Move Llama 4 projector call into encoder execution ( #16201 )  
						
						 
						
						
						
						
					 
					
						2025-04-07 14:02:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						090c856d76 
					 
					
						
						
							
							[Misc] Human-readable max-model-len cli arg ( #16181 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 14:40:58 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad434d4cfe 
					 
					
						
						
							
							Print the warning only once ( #16193 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-07 18:30:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66d433b94f 
					 
					
						
						
							
							[V1] Revert the default max_num_seqs to V0 values for most hardware ( #16158 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 13:54:36 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						027b204ff1 
					 
					
						
						
							
							[Bugfix] Re-enable support for ChatGLMForConditionalGeneration ( #16187 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 23:15:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55dcce91df 
					 
					
						
						
							
							Upstream Llama4 Support to Main ( #16113 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com >
Signed-off-by: Chris Thi <chris.c.thi@gmail.com >
Signed-off-by: drisspg <drisspguessous@gmail.com >
Signed-off-by: Jon Swenson <jmswen@gmail.com >
Signed-off-by: Keyun Tong <tongkeyun@gmail.com >
Signed-off-by: Lu Fang <fanglu@meta.com >
Signed-off-by: Xiaodong Wang <xdwang@meta.com >
Signed-off-by: Yang Chen <yangche@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Lu Fang <lufang@fb.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Lucia Fang <fanglu@fb.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 08:06:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8017c8db7f 
					 
					
						
						
							
							[Doc]Update image to latest version ( #16186 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-04-07 14:17:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc3529dbf6 
					 
					
						
						
							
							[Misc] improve example mlpspeculator and llm_engine_example ( #16175 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-07 11:53:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7699258ef0 
					 
					
						
						
							
							[Model] Add Qwen3 and Qwen3MoE ( #15289 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-07 04:06:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e9ba99f296 
					 
					
						
						
							
							[V1][Structured Output] Add supports_structured_output() method to Platform ( #16148 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-04-07 11:06:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c80368710 
					 
					
						
						
							
							[VLM] Florence-2 supports online serving ( #16164 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-07 04:04:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95d63f38c0 
					 
					
						
						
							
							doc: fix some typos in doc ( #16154 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-07 05:32:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb8dab821e 
					 
					
						
						
							
							[CI] Set max transformers version for Ultravox model test  ( #16149 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-07 04:37:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc0f87768a 
					 
					
						
						
							
							[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings ( #16129 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-07 04:07:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0a57386721 
					 
					
						
						
							
							[Misc] Update Mistral-3.1 example ( #16147 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-07 03:57:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3749e28774 
					 
					
						
						
							
							[V1][Minor] Minor simplification for get_computed_blocks  ( #16139 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-06 20:38:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86fc2321ff 
					 
					
						
						
							
							[Metrics] Add bucket for request_latency, time_to_first_token and time_per_output_token ( #15202 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-04-06 20:34:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2549c0dfef 
					 
					
						
						
							
							Fix requires-python ( #16132 )  
						
						 
						
						
						
						
					 
					
						2025-04-06 19:22:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b10e519895 
					 
					
						
						
							
							[V1][Minor] Optimize get_cached_block ( #16135 )  
						
						 
						
						
						
						
					 
					
						2025-04-06 20:48:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9bde5ba127 
					 
					
						
						
							
							[TPU] Update PyTorch/XLA ( #16130 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-06 18:25:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72c8f1ad04 
					 
					
						
						
							
							[Misc] update requires-python in pyproject.toml ( #16116 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-06 14:56:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da224daaa9 
					 
					
						
						
							
							[Bugfix] add hf_token to EngineArgs ( #16093 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: paolovic <paul-philipp.luley@uzh.ch >
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch > 
						
						
					 
					
						2025-04-06 14:47:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a100b9278 
					 
					
						
						
							
							[Bugfix] LoRA : Fix the order in which the kernels process LoRAs  ( #16040 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-04-06 14:04:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						242a637aea 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 ( #16103 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-06 05:52:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c2a9671510 
					 
					
						
						
							
							[Misc] Improve model redirect to accept json dictionary ( #16119 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-06 05:51:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d5ae4f7f42 
					 
					
						
						
							
							[Doc][Bugfix] Add missing EOF in k8s deploy doc ( #16025 )  
						
						 
						
						
						
						
					 
					
						2025-04-06 12:10:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b6c502a150 
					 
					
						
						
							
							[Misc] refactor example eagle ( #16100 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-06 09:42:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ca710e525 
					 
					
						
						
							
							[CI][V1] Fix passing tokenizer as kwarg to validate_guidance_grammar ( #16117 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-06 16:18:00 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb07c8cb5b 
					 
					
						
						
							
							[Frontend] Fix typo in tool chat templates for llama3.2 and toolace ( #14501 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ben Jackson <ben@ben.com > 
						
						
					 
					
						2025-04-06 07:44:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba10801961 
					 
					
						
						
							
							[Benchmark] Add sampling parameters to benchmark_serving. ( #16022 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hyesoo Yang <hyeygit@gmail.com > 
						
						
					 
					
						2025-04-06 12:30:35 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						620fc2d09e 
					 
					
						
						
							
							[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 ( #16112 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com > 
						
						
					 
					
						2025-04-05 21:23:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29283eaa7e 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for phi, gemma, deepseek ( #16088 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com > 
						
						
					 
					
						2025-04-05 20:34:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2fa66ef713 
					 
					
						
						
							
							[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine ( #15946 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-04-05 20:04:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						13affc432d 
					 
					
						
						
							
							[Misc] Remove redundant code ( #16098 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-05 20:03:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d8f094a92a 
					 
					
						
						
							
							[Misc] format output for encoder_decoder.py ( #16095 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-05 19:57:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97ae6d777f 
					 
					
						
						
							
							Fix some capitalisations in generated examples doc titles ( #16094 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-05 13:44:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6baeee70d1 
					 
					
						
						
							
							Revert "doc: add info for macos clang errors ( #16049 )" ( #16091 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-05 11:51:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d2517a4939 
					 
					
						
						
							
							[doc] fix 404 ( #16082 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-05 11:39:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6342adc438 
					 
					
						
						
							
							fix: support clang17 for macos and fix the real libomp ( #16086 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-05 11:00:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0adba91547 
					 
					
						
						
							
							[CI] Fix benchmark script level ( #16089 )  
						
						 
						
						
						
						
					 
					
						2025-04-05 03:36:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4285e423a6 
					 
					
						
						
							
							[Misc] Auto detect bitsandbytes pre-quantized models ( #16027 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com > 
						
						
					 
					
						2025-04-04 23:30:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63375f0cdb 
					 
					
						
						
							
							[V1][Spec Decode] Update N-gram Proposer Interface ( #15750 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-04 16:32:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70ad3f9e98 
					 
					
						
						
							
							[Bugfix][TPU] Fix V1 TPU worker for sliding window ( #16059 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-04 23:31:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6fc629f4d 
					 
					
						
						
							
							[Kernel][Minor] Re-fuse triton moe weight application ( #16071 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-04-04 23:27:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af51d80fa1 
					 
					
						
						
							
							Revert "[V1] Scatter and gather placeholders in the model runner" ( #16075 )  
						
						 
						
						
						
						
					 
					
						2025-04-04 14:50:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f5722a5052 
					 
					
						
						
							
							[V1] Scatter and gather placeholders in the model runner ( #15712 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-04 21:26:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						651cf0fec1 
					 
					
						
						
							
							[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue ( #15906 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-04-04 12:56:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4dc52e1c53 
					 
					
						
						
							
							[CI] Reorganize .buildkite directory ( #16001 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kevin <kevin@anyscale.com > 
						
						
					 
					
						2025-04-04 12:16:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4708f13a9c 
					 
					
						
						
							
							[Bugfix] Fix default behavior/fallback for pp in v1 ( #16057 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-04 17:58:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6d042df0a 
					 
					
						
						
							
							[ROCm][Bugfix] Bring back fallback to eager mode removed in  #14917 , but for ROCm only ( #15413 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-04 09:40:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40a36ccfeb 
					 
					
						
						
							
							[ROCm][Bugfix] Use platform specific FP8 dtype ( #15717 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-04 09:40:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ef608c37a7 
					 
					
						
						
							
							[Distributed] [ROCM] Fix custom allreduce enable checks ( #16010 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-04-04 09:39:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2386803f2a 
					 
					
						
						
							
							[CPU] Change default block_size for CPU backend ( #16002 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-04-04 09:39:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95862f7b4d 
					 
					
						
						
							
							[Benchmark][Doc] Update throughput benchmark and README ( #15998 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-04 09:39:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						230b131b54 
					 
					
						
						
							
							[Bugfix][kernels] Fix half2float conversion in gguf kernels ( #15995 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-04 09:38:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0812d8dd41 
					 
					
						
						
							
							[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe ( #15945 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhenwei <zhenweiliu@habana.ai > 
						
						
					 
					
						2025-04-04 09:38:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf7e3c51ae 
					 
					
						
						
							
							[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt ( #15939 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com > 
						
						
					 
					
						2025-04-04 09:38:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a35a8a8392 
					 
					
						
						
							
							[V1][Spec Decode] Avoid logging useless nan metrics ( #16023 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-04-04 08:52:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ef0bb1fcf 
					 
					
						
						
							
							doc: add info for macos clang errors ( #16049 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-04 14:58:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fadc59c0e6 
					 
					
						
						
							
							[TPU][V1] Remove ragged attention kernel parameter hard coding ( #16041 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-04 07:48:50 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86cbd2eee9 
					 
					
						
						
							
							[Misc] improve gguf check ( #15974 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-04 01:33:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						092475f738 
					 
					
						
						
							
							[ROCm] Tweak the benchmark script to run on ROCm ( #14252 )  
						
						 
						
						
						
						
					 
					
						2025-04-03 17:12:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dcc56d62da 
					 
					
						
						
							
							[Bugfix] Fix function names in test_block_fp8.py ( #16033 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-04-03 23:01:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f15e70d906 
					 
					
						
						
							
							[TPU] Switch Test to Non-Sliding Window ( #15981 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-04-03 14:28:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b6be6f8d1e 
					 
					
						
						
							
							[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. ( #15732 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-04-03 14:23:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						03a70eacaf 
					 
					
						
						
							
							Re-enable the AMD Testing for the passing tests. ( #15586 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-04-03 11:05:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45b1ff7a25 
					 
					
						
						
							
							[Misc][Performance] Advance tpu.txt to the most recent nightly torch … ( #16024 )  
						
						 
						
						
						
						
					 
					
						2025-04-03 17:32:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						15ba07ef25 
					 
					
						
						
							
							[Minor] Fused experts refactor ( #15914 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-04-03 10:19:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d2b58ca203 
					 
					
						
						
							
							[Neuron][kernel] Fuse kv cache into a single tensor ( #15911 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com > 
						
						
					 
					
						2025-04-03 09:51:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82e7e19a6e 
					 
					
						
						
							
							[SupportsQuant] Chameleon, Chatglm, Commandr ( #15952 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-04-03 08:25:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						421c462948 
					 
					
						
						
							
							[SupportsQuant] Bert, Blip, Blip2, Bloom ( #15573 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-04-03 08:23:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						84884cd9ac 
					 
					
						
						
							
							fix: tiny fix make format.sh excutable ( #16015 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-03 15:18:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a43aa183dc 
					 
					
						
						
							
							[doc] update contribution link ( #15922 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-03 10:47:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						463bbb1835 
					 
					
						
						
							
							[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process ( #15367 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-04-03 07:32:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e125e74d1 
					 
					
						
						
							
							[misc] improve error message for "Failed to infer device type" ( #15994 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-03 14:45:03 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						06f21ce7a5 
					 
					
						
						
							
							[Benchmark] Add AIMO Dataset to Benchmark ( #15955 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com >
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com > 
						
						
					 
					
						2025-04-03 06:09:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						57a810db9c 
					 
					
						
						
							
							[ROCM][V0] PA kennel selection when no sliding window provided ( #15982 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com > 
						
						
					 
					
						2025-04-03 05:28:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8b664706aa 
					 
					
						
						
							
							[bugfix] add seed in torchrun_example.py ( #15980 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-03 12:25:01 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37bfee92bf 
					 
					
						
						
							
							fix: better error message for get_config  close   #13889  ( #15943 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-03 03:53:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e73ff24e31 
					 
					
						
						
							
							[ROCM][KERNEL] Paged attention for V1 ( #15720 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com > 
						
						
					 
					
						2025-04-02 19:48:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd7599d34a 
					 
					
						
						
							
							[V1][TPU] Do not compile sampling more than needed ( #15883 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-04-03 01:36:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01b6113659 
					 
					
						
						
							
							[TPU] optimize the all-reduce performance ( #15903 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-04-03 00:25:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b84eff03a 
					 
					
						
						
							
							[V1][TPU] TPU-optimized top-p implementation (avoids scattering). ( #15736 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b .c.tpu-prod-env-large-adhoc.internal> 
						
						
					 
					
						2025-04-02 17:18:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55acf86bf8 
					 
					
						
						
							
							Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] ( #15969 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-02 23:37:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f021b97993 
					 
					
						
						
							
							[V1] Support Mistral3 in V1 ( #15950 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-04-02 15:36:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cab43c2d2 
					 
					
						
						
							
							[misc] instruct pytorch to use nvml-based cuda check ( #15951 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-04-03 01:02:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8bd651b318 
					 
					
						
						
							
							Restricted cmake to be less than version 4 as 4.x breaks the build of… ( #15859 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com > 
						
						
					 
					
						2025-04-02 16:19:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58e234a754 
					 
					
						
						
							
							[Misc] V1 LoRA support CPU offload ( #15843 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-02 23:04:43 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e86c414d6a 
					 
					
						
						
							
							[Model] use AutoWeightsLoader in model load_weights ( #15770 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io > 
						
						
					 
					
						2025-04-02 07:47:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						550b2801ad 
					 
					
						
						
							
							[CPU][Bugfix] Using custom allreduce for CPU backend ( #15934 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-04-02 07:46:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cefb9e5a28 
					 
					
						
						
							
							[Frontend] Implement Tool Calling with tool_choice='required' ( #13483 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com >
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at >
Co-authored-by: Liangfu Chen <liangfc@amazon.com >
Co-authored-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-04-02 07:45:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98d7367b61 
					 
					
						
						
							
							[Metrics] Hide deprecated metrics ( #15458 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-04-02 07:37:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						594a8b9030 
					 
					
						
						
							
							[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. ( #15938 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-02 06:33:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44f990515b 
					 
					
						
						
							
							[CI] Remove duplicate entrypoints-test ( #15940 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kay Yan <kay.yan@daocloud.io > 
						
						
					 
					
						2025-04-02 02:44:01 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						252937806c 
					 
					
						
						
							
							[Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key ( #15926 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-04-02 02:19:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51826d51fa 
					 
					
						
						
							
							Add minimum version for huggingface_hub to enable Xet downloads ( #15873 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-02 02:03:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14e53ed11f 
					 
					
						
						
							
							[V1] Fix json_object support with xgrammar ( #15488 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-04-02 02:00:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ddb94c2605 
					 
					
						
						
							
							[core] Add tags parameter to wake_up() ( #15500 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eric <erictang000@gmail.com > 
						
						
					 
					
						2025-04-02 01:59:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90969fb39a 
					 
					
						
						
							
							[Kernel] Add more dtype support for GGUF dequantization ( #15879 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com > 
						
						
					 
					
						2025-04-02 01:58:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						101f1481f9 
					 
					
						
						
							
							[Build/CI] Update lm-eval to 0.4.8 ( #15912 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chris Thi <chris.c.thi@gmail.com > 
						
						
					 
					
						2025-04-02 01:47:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2edc87b161 
					 
					
						
						
							
							[Bugfix] Fix cache block size calculation for CPU MLA ( #15848 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-04-02 01:45:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4203926f10 
					 
					
						
						
							
							[CI/Build] Further clean up LoRA tests ( #15920 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-02 01:39:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cdb57015a7 
					 
					
						
						
							
							[Misc] Replace print with logger ( #15923 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-04-02 01:37:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa557e6422 
					 
					
						
						
							
							[Benchmark]Fix error message ( #15866 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com > 
						
						
					 
					
						2025-04-02 01:32:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0e00d40e4f 
					 
					
						
						
							
							[V1][Bugfix] Fix typo in MoE TPU checking ( #15927 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-01 23:46:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c920e01242 
					 
					
						
						
							
							[Doc] Update rocm.inc.md ( #15917 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chun37 <chun.jb.37@gmail.com > 
						
						
					 
					
						2025-04-01 23:38:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						274d8e8818 
					 
					
						
						
							
							[V1][Minor] Enhance SpecDecoding Metrics Log in V1 ( #15902 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-01 23:38:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2039c6305b 
					 
					
						
						
							
							[Bugfix] Fix imports for MoE on CPU ( #15841 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-04-02 03:33:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6efb195a6e 
					 
					
						
						
							
							[V1] Fix: make sure k_index is int64 for apply_top_k_only ( #15907 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-04-01 19:06:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24b7fb455a 
					 
					
						
						
							
							[Spec Decode] Fix input triton kernel for eagle ( #15909 )  
						
						 
						
						
						
						
					 
					
						2025-04-01 18:15:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58f5a59769 
					 
					
						
						
							
							[Docs] Add Intel as Sponsor ( #15913 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-01 17:16:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						db9dfcfa6a 
					 
					
						
						
							
							[Docs] Add Ollama meetup slides ( #15905 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-01 13:58:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ef98d527e 
					 
					
						
						
							
							[Model][MiniMaxText01] Support MiniMaxText01 model inference ( #13454 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qscqesze <475517977@qq.com >
Co-authored-by: qingjun <qingjun@minimaxi.com >
Co-authored-by: qscqesze <475517977@qq.com > 
						
						
					 
					
						2025-04-01 16:23:55 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						93491aefc7 
					 
					
						
						
							
							[BugFix] make sure socket close ( #15875 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-01 13:10:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7acd539cd7 
					 
					
						
						
							
							[Docs] update usage stats language ( #15898 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-04-01 12:54:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e75a6301bd 
					 
					
						
						
							
							[V1][Spec Decode] Implement Eagle Proposer [1/N] ( #15729 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-04-01 12:33:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a79cc68b3a 
					 
					
						
						
							
							[V1][Metrics] Initial speculative decoding metrics ( #15151 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-04-01 10:45:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e3f7a4ee7 
					 
					
						
						
							
							[CI] Disable flaky structure decoding test temporarily. ( #15892 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-01 17:42:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ec8257914 
					 
					
						
						
							
							[Model] Add module name prefixes to gemma3 ( #15889 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bartholomew Sabat <bartek@recursal.ai >
Co-authored-by: Bartholomew Sabat <bartek@recursal.ai > 
						
						
					 
					
						2025-04-01 10:13:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						38327cf454 
					 
					
						
						
							
							[Model] Aya Vision ( #15441 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-04-01 16:30:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dfa82e2a3d 
					 
					
						
						
							
							[CI/Build] Clean up LoRA tests ( #15867 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-04-01 16:28:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e59ca942f5 
					 
					
						
						
							
							Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. ( #13932 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bill Nell <bnell@redhat.com > 
						
						
					 
					
						2025-04-01 12:07:43 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a57a3044aa 
					 
					
						
						
							
							[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork ( #15820 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-01 08:56:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e5a0f6ae2 
					 
					
						
						
							
							[Misc] Allow using OpenCV as video IO fallback ( #15055 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-01 15:55:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b63bd14999 
					 
					
						
						
							
							Reinstate format.sh and make pre-commit installation simpler ( #15890 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-01 15:41:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2041c0e360 
					 
					
						
						
							
							[Doc] Quark quantization documentation ( #15861 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaow <chaow@amd.com > 
						
						
					 
					
						2025-04-01 08:32:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						085cbc4f9f 
					 
					
						
						
							
							[New Model]: jinaai/jina-reranker-v2-base-multilingual  ( #15876 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-01 08:32:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b93162fb0 
					 
					
						
						
							
							Remove format.sh as it's been unsupported >70 days ( #15884 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-01 22:27:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e45bd29fe 
					 
					
						
						
							
							[Misc] remove unused script ( #15746 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-04-01 13:58:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51d7c6a2b2 
					 
					
						
						
							
							[Model] Support Mistral3 in the HF Transformers format ( #15505 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-04-01 06:10:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f3aca1ee30 
					 
					
						
						
							
							setup correct nvcc version with CUDA_HOME ( #15725 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang Chen <yangche@fb.com > 
						
						
					 
					
						2025-04-01 06:09:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8dd41d6bcc 
					 
					
						
						
							
							[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE ( #15831 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-04-01 06:07:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0a298ea418 
					 
					
						
						
							
							[Bugfix] Fix no video/image profiling edge case for MultiModalDataParser ( #15828 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-04-01 18:17:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d330558bab 
					 
					
						
						
							
							[Docs] Fix small error in link text ( #15868 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-04-01 10:05:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						656fd72976 
					 
					
						
						
							
							[Misc] Fix speculative config repr string ( #15860 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-04-01 02:26:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						79455cf421 
					 
					
						
						
							
							[Misc] Enable V1 LoRA by default ( #15320 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-04-01 16:53:56 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30d6a015e0 
					 
					
						
						
							
							[Feature] specify model in config.yaml ( #15798 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: weizeng <weizeng@roblox.com > 
						
						
					 
					
						2025-04-01 01:20:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8af5a5c4e5 
					 
					
						
						
							
							fix: can not use uv run collect_env  close   #13888  ( #15792 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-04-01 07:45:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a5f0afcd2 
					 
					
						
						
							
							[V1] Implement sliding window attention in kv_cache_manager ( #14097 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-04-01 00:33:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7e63aa4d8 
					 
					
						
						
							
							[ROCm] Use device name in the warning ( #15838 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-04-01 00:10:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a9ce1784c 
					 
					
						
						
							
							[sleep mode] clear pytorch cache after sleep ( #15248 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <villard@us.ibm.com > 
						
						
					 
					
						2025-03-31 22:58:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e4e709b43 
					 
					
						
						
							
							[V1] TPU - Fix fused MOE ( #15834 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-31 22:58:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63d8eabed0 
					 
					
						
						
							
							[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding  ( #15824 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: alexwl <alexey.a.kiryushin@gmail.com > 
						
						
					 
					
						2025-03-31 22:57:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e830b01383 
					 
					
						
						
							
							[Bugfix] Fix extra comma ( #15851 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: haochengxia <xhc_1007@163.com > 
						
						
					 
					
						2025-03-31 22:57:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ff6473980d 
					 
					
						
						
							
							[Bugfix][Model] fix mllama multi-image ( #14883 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yan ma <yan.ma@intel.com > 
						
						
					 
					
						2025-03-31 22:53:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a164aea35d 
					 
					
						
						
							
							[Frontend] Add Phi-4-mini function calling support ( #14886 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kinfey <kinfeylo@microsoft.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-31 22:50:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a76f547e11 
					 
					
						
						
							
							Rename fallback model and refactor supported models section ( #15829 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-31 22:49:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b7b7676d67 
					 
					
						
						
							
							[Distributed] Add custom allreduce support for ROCM ( #14125 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com > 
						
						
					 
					
						2025-03-31 22:49:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e6e3c55ef2 
					 
					
						
						
							
							Move dockerfiles into their own directory ( #14549 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-31 13:47:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f98a4920f9 
					 
					
						
						
							
							[V1][Core] Remove unused speculative config from scheduler ( #15818 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-31 19:15:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d4bfc23ef0 
					 
					
						
						
							
							Fix Transformers backend compatibility check ( #15290 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-31 10:27:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a2160fa55 
					 
					
						
						
							
							[V1] TPU CI - Add basic perf regression test ( #15414 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-31 13:25:20 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2de4118243 
					 
					
						
						
							
							fix: change GB to GiB in logging  close   #14979  ( #15807 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-03-31 10:00:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						239b7befdd 
					 
					
						
						
							
							[V1][Spec Decode] Remove deprecated spec decode config params ( #15466 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-03-31 09:19:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						09e974d483 
					 
					
						
						
							
							[Bugfix] Check dimensions of multimodal embeddings in V1 ( #15816 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-31 09:01:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e5ef4fa99a 
					 
					
						
						
							
							Upgrade transformers to v4.50.3 ( #13905 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-31 08:59:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						037bcd942c 
					 
					
						
						
							
							[Bugfix] Fix missing return value in load_weights method of adapters.py ( #15542 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: noc-turne <2270929247@qq.com > 
						
						
					 
					
						2025-03-31 06:56:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c2e7507ad4 
					 
					
						
						
							
							[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats ( #15813 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com > 
						
						
					 
					
						2025-03-31 13:23:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3aa2b6a637 
					 
					
						
						
							
							[Model] Update support for NemotronNAS models ( #15008 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nave Assaf <nassaf@nvidia.com > 
						
						
					 
					
						2025-03-31 20:35:14 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						555aa21905 
					 
					
						
						
							
							[V1] Fully Transparent Implementation of CPU Offloading ( #15354 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-31 20:22:34 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7ae3bf3d6 
					 
					
						
						
							
							fix: better install requirement for install in setup.py ( #15796 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-03-31 05:13:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b932c048ac 
					 
					
						
						
							
							Recommend developing with Python 3.12 in developer guide ( #15811 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-03-31 11:54:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e85829450d 
					 
					
						
						
							
							[Feature][ROCm]Enable fusion pass for torch.compile on ROCm ( #15050 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-03-31 04:42:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						effc5d24fa 
					 
					
						
						
							
							[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup ( #15748 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com > 
						
						
					 
					
						2025-03-31 15:38:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						18ed3132d2 
					 
					
						
						
							
							[Misc] update the comments ( #15780 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chengyang liu <lcy4869@gmail.com >
Co-authored-by: chengyang liu <lcy4869@gmail.com > 
						
						
					 
					
						2025-03-30 19:39:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9b459eca88 
					 
					
						
						
							
							[V1][Scheduler] Avoid calling _try_schedule_encoder_inputs for every request ( #15778 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-30 14:10:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70fedd0f79 
					 
					
						
						
							
							fix: Comments to English for better dev experience ( #15768 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-03-30 10:47:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb103b29bf 
					 
					
						
						
							
							[Bugfix] Added embed_is_patch mask for fuyu model ( #15731 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Huang <kylhuang@nvidia.com > 
						
						
					 
					
						2025-03-30 03:45:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						248e76c4df 
					 
					
						
						
							
							fix: lint fix a ruff checkout syntax error ( #15767 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yihong0618 <zouzou0208@gmail.com > 
						
						
					 
					
						2025-03-30 03:36:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						803d5c35f3 
					 
					
						
						
							
							[V1] Override mm_counts for dummy data creation ( #15703 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-30 03:20:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7fd8c0f85c 
					 
					
						
						
							
							fix test_phi3v ( #15321 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com > 
						
						
					 
					
						2025-03-30 02:01:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44c3a5abc3 
					 
					
						
						
							
							[doc] update conda to usage link in installation ( #15761 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-30 08:12:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6909a76201 
					 
					
						
						
							
							[Bugfix] Fix Mistral guided generation using xgrammar ( #15704 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Denize <julien.denize@mistral.ai > 
						
						
					 
					
						2025-03-29 20:20:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						045533716b 
					 
					
						
						
							
							[CI] xgrammar structured output supports Enum. ( #15757 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-03-29 20:20:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3c0ff914ac 
					 
					
						
						
							
							[Bugfix] Fix Mllama interleaved images input support ( #15564 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-03-29 18:11:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2bc4be4e32 
					 
					
						
						
							
							[V1][Minor] Simplify rejection sampler's parse_output ( #15741 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-29 09:25:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c67abd614f 
					 
					
						
						
							
							[V1] Support interleaved modality items ( #15605 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-29 06:30:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6fa7cd3dbc 
					 
					
						
						
							
							[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore ( #12957 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-03-29 04:01:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						94744ba41a 
					 
					
						
						
							
							[V1] [Feature] Collective RPC ( #15444 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-03-29 03:39:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4965ec42d2 
					 
					
						
						
							
							[FEAT] [ROCm] Add AITER int8 scaled gemm kernel ( #15433 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-03-29 03:33:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73aa7041bf 
					 
					
						
						
							
							[doc] update doc ( #15740 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-29 04:27:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c1f760024 
					 
					
						
						
							
							[Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 ( #15659 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yarong Mu <ymu@google.com > 
						
						
					 
					
						2025-03-28 21:13:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da461f3cbf 
					 
					
						
						
							
							[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K ( #15714 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-28 21:13:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b800f0932 
					 
					
						
						
							
							[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server ( #15700 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-03-28 21:12:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8427f70493 
					 
					
						
						
							
							Use numba 0.61 for python 3.10+ to support numpy>=2 ( #15692 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cyy <cyyever@outlook.com > 
						
						
					 
					
						2025-03-29 12:11:51 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7a7992085b 
					 
					
						
						
							
							[CI] Speed up V1 structured output tests ( #15718 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-28 21:10:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1286211f57 
					 
					
						
						
							
							[Bugfix] LoRA V1: add and fix entrypoints tests ( #15715 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-28 21:10:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d531ad7b8 
					 
					
						
						
							
							[Misc][V1] Misc code streamlining ( #15723 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-28 20:59:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						762b424a52 
					 
					
						
						
							
							[Docs] Document v0 engine support in reasoning outputs ( #15739 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai > 
						
						
					 
					
						2025-03-29 03:46:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						de1cb38769 
					 
					
						
						
							
							[Model] Support Skywork-R1V ( #15397 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiacai.liu <932997367@qq.com >
Co-authored-by: jiacai.liu <932997367@qq.com > 
						
						
					 
					
						2025-03-28 20:39:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c802f5430d 
					 
					
						
						
							
							[ROCm][AMD][Build] Update AMD supported arch list ( #15632 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-03-28 20:39:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cff8991a50 
					 
					
						
						
							
							[Docs][V1] Optimize diagrams in prefix caching design ( #15716 )  
						
						 
						
						
						
						
					 
					
						2025-03-29 03:33:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f3f8d8fff4 
					 
					
						
						
							
							implement prometheus fast-api-instrumentor for http service metrics ( #15657 )  
						
						 
						
						
						
						
					 
					
						2025-03-29 00:12:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						26df46ee59 
					 
					
						
						
							
							[Misc] cli auto show default value ( #15582 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-28 22:23:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c3f687ac22 
					 
					
						
						
							
							[V1] TPU - Fix the chunked prompt bug ( #15713 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-28 20:19:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04437e313d 
					 
					
						
						
							
							[Bugfix] [torch.compile] Add Dynamo metrics context during compilation ( #15639 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-03-28 14:01:09 -06:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						038bededba 
					 
					
						
						
							
							[TPU] [Perf] Improve Memory Usage Estimation ( #15671 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-03-28 17:37:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d03308be0c 
					 
					
						
						
							
							[Misc] Remove stale func in KVTransferConfig ( #14746 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-03-28 17:33:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6bc0034d0 
					 
					
						
						
							
							[Misc] Remove unused utils and clean up imports ( #15708 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-28 09:41:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70e132244a 
					 
					
						
						
							
							[Minor] Remove TGI launching script  ( #15646 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-28 09:30:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47e9038d23 
					 
					
						
						
							
							Fix cpu offload testing for gptq/awq/ct ( #15648 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-29 00:29:32 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						432cf22a6a 
					 
					
						
						
							
							[Bugfix] Fix regex compile display format ( #15368 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com > 
						
						
					 
					
						2025-03-28 08:58:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2914006fe0 
					 
					
						
						
							
							[doc] add missing imports ( #15699 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-28 15:56:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7329ff5468 
					 
					
						
						
							
							[V1] Support disable_any_whtespace for guidance backend ( #15584 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-28 23:46:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						541d1df486 
					 
					
						
						
							
							[Bugfix] embed_is_patch for Idefics3 ( #15696 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-28 08:27:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b00ff9138 
					 
					
						
						
							
							[Bugfix][v1] xgrammar structured output supports Enum. ( #15594 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-03-28 06:14:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						91276c5721 
					 
					
						
						
							
							[Model] Adding torch compile annotations to chatglm ( #15624 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-28 21:14:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b4167526d 
					 
					
						
						
							
							[Docs] Add "Generation quality changed" section to troubleshooting ( #15701 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-28 13:03:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fd5fd26902 
					 
					
						
						
							
							[Frontend] update priority for --api-key and VLLM_API_KEY ( #15588 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-28 19:40:12 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3bbaacbe15 
					 
					
						
						
							
							[Bugfix][Frontend] Eliminate regex based check in reasoning full generator ( #14821 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai > 
						
						
					 
					
						2025-03-28 11:20:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a10314c6b3 
					 
					
						
						
							
							[Misc] Fix test_sleep to use query parameters ( #14373 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lize Cai <lize.cai@sap.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-28 18:00:14 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70f2c2a709 
					 
					
						
						
							
							[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' ( #15674 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-28 17:10:40 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						280d074103 
					 
					
						
						
							
							[CPU][CI] Improve CPU Dockerfile ( #15690 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-03-28 01:36:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32b14baf8a 
					 
					
						
						
							
							[Refactor][Frontend] Keep all logic about reasoning into one class ( #14428 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai > 
						
						
					 
					
						2025-03-28 00:23:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2d9045fce8 
					 
					
						
						
							
							[TPU][CI] Fix TPUModelRunner Test ( #15667 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com > 
						
						
					 
					
						2025-03-28 00:01:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						355f66348c 
					 
					
						
						
							
							[V1] Remove legacy input registry ( #15673 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 23:34:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8693e47e6a 
					 
					
						
						
							
							[Bugfix] Fix mm_hashes forgetting to be passed ( #15668 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-28 05:51:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cec8c7d7f8 
					 
					
						
						
							
							Refactor error handling for multiple exceptions in preprocessing ( #15650 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com > 
						
						
					 
					
						2025-03-28 03:27:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d0ec37267 
					 
					
						
						
							
							[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 ( #14578 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-03-28 02:58:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7f720ea56 
					 
					
						
						
							
							[Misc]add coding benchmark for speculative decoding ( #15303 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: CXIAAAAA <cxia0209@gmail.com > 
						
						
					 
					
						2025-03-28 10:47:05 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ae17bf1e2 
					 
					
						
						
							
							Revert "Use Cache Hinting for fused_moe kernel ( #15511 )" ( #15645 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wes Medford <wryanmedford@gmail.com > 
						
						
					 
					
						2025-03-27 19:45:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a49eea74b 
					 
					
						
						
							
							[CI][TPU] Temporarily Disable Quant Test on TPU ( #15649 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-27 19:45:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4245a48df 
					 
					
						
						
							
							[Doc] Fix dead links in Job Board ( #15637 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-03-28 02:43:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4e0f6076be 
					 
					
						
						
							
							[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. ( #14948 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-28 10:13:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						726efc6a32 
					 
					
						
						
							
							[Quantization][V1]  BitsAndBytes support V1 ( #15611 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-28 10:12:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd45912b99 
					 
					
						
						
							
							[TPU] Lazy Import ( #15656 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-28 09:57:01 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						15dac210f0 
					 
					
						
						
							
							[V1] AsyncLLM data parallel ( #13923 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-27 16:14:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						112b3e5b3b 
					 
					
						
						
							
							[CI] Update rules for applying tpu label. ( #15634 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-27 22:15:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32d669275b 
					 
					
						
						
							
							Correct PowerPC to modern IBM Power ( #15635 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com > 
						
						
					 
					
						2025-03-27 15:04:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4098b72210 
					 
					
						
						
							
							[Bugfix][TPU][V1] Fix recompilation ( #15553 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-27 19:15:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						46450b8d33 
					 
					
						
						
							
							Use absolute placement for Ask AI button ( #15628 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-27 18:52:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						13ac9cab21 
					 
					
						
						
							
							[Misc] Avoid direct access of global mm_registry in compute_encoder_budget ( #15621 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 17:52:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66aa4c0bf4 
					 
					
						
						
							
							[Feature] Add middleware to log API Server responses ( #15593 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-03-27 17:49:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						247181536f 
					 
					
						
						
							
							[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs ( #15620 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 17:36:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						07bf813fb5 
					 
					
						
						
							
							[Doc] Link to onboarding tasks ( #15629 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 16:30:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8958217ad5 
					 
					
						
						
							
							[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 ( #15211 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: h-sugi <h.sugi@ieee.org >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-27 22:29:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac5bc615b0 
					 
					
						
						
							
							[Model] MiniCPM-V/O supports V1 ( #15487 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 06:07:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8063dfc61a 
					 
					
						
						
							
							[Doc] update --system for transformers installation in docker doc ( #15616 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-27 20:38:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6278bc829e 
					 
					
						
						
							
							Fix incorrect filenames in vllm_compile_cache.py ( #15494 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <zou3519@gmail.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-27 18:33:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3f532cb6a6 
					 
					
						
						
							
							[Misc] Use model_redirect to redirect the model name to a local folder. ( #14116 )  
						
						 
						
						
						
						
					 
					
						2025-03-27 02:21:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e6c9053f9e 
					 
					
						
						
							
							[Misc] Clean up scatter_patch_features ( #15559 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-27 07:45:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						43ed4143c4 
					 
					
						
						
							
							[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM ( #15587 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Co-authored-by: ElizaWszola <eliza@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com > 
						
						
					 
					
						2025-03-27 06:47:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4c98b4d4c 
					 
					
						
						
							
							[Misc] Consolidate LRUCache implementations ( #15481 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bella kira <2374035698@qq.com > 
						
						
					 
					
						2025-03-27 06:43:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1e0fd7543 
					 
					
						
						
							
							[TPU] Avoid Triton Import ( #15589 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-27 06:43:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						df8d3d1287 
					 
					
						
						
							
							[Misc] Restrict ray version dependency and update PP feature warning in V1 ( #15556 )  
						
						 
						
						
						
						
					 
					
						2025-03-27 06:21:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						619d3de8bd 
					 
					
						
						
							
							[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS ( #15583 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-03-26 22:46:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ecff8309a3 
					 
					
						
						
							
							[ROCm] Env variable to trigger custom PA ( #15557 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-03-26 22:46:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dcf2a590f5 
					 
					
						
						
							
							Allow torchao quantization in SiglipMLP ( #15575 )  
						
						 
						
						
						
						
					 
					
						2025-03-26 22:45:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54aa619459 
					 
					
						
						
							
							[V1] Refactor num_computed_tokens logic ( #15307 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-27 04:54:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb22be5817 
					 
					
						
						
							
							[moe][quant] add weight name case for offset ( #15515 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-03-27 04:50:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f301dd8ef 
					 
					
						
						
							
							[Doc] Update V1 user guide for fp8 kv cache support ( #15585 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: weizeng <weizeng@roblox.com > 
						
						
					 
					
						2025-03-26 19:39:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8095341a01 
					 
					
						
						
							
							[misc] LoRA: Remove unused long context test data ( #15558 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-27 10:04:51 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69db16a46a 
					 
					
						
						
							
							add platform check back ( #15578 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <llccyy1212@gmail.com > 
						
						
					 
					
						2025-03-27 01:50:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce78f9af4e 
					 
					
						
						
							
							Add automatic tpu label to mergify.yml ( #15560 )  
						
						 
						
						
						
						
					 
					
						2025-03-26 21:39:58 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9239bf718e 
					 
					
						
						
							
							[Kernel] CUTLASS grouped gemm fp8 MoE kernel ( #13972 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com > 
						
						
					 
					
						2025-03-27 00:54:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7a6d45bc8a 
					 
					
						
						
							
							Support FIPS enabled machines with MD5 hashing ( #15299 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com > 
						
						
					 
					
						2025-03-26 20:19:46 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e74ff409e0 
					 
					
						
						
							
							[TPU] support disabling xla compilation cache ( #15567 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-03-27 00:09:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7a888271f5 
					 
					
						
						
							
							Use Cache Hinting for fused_moe kernel ( #15511 )  
						
						 
						
						
						
						
					 
					
						2025-03-26 23:21:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d119a86ae 
					 
					
						
						
							
							[V1] TPU CI - Fix test_compilation.py ( #15570 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-26 21:51:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2e85e26f4 
					 
					
						
						
							
							[V1] TPU - Revert to exponential padding by default ( #15565 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-26 21:35:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd8a29da99 
					 
					
						
						
							
							Applying some fixes for K8s agents in CI ( #15493 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-03-26 20:35:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27df5199d9 
					 
					
						
						
							
							Support SHA256 as hash function in prefix caching ( #15297 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com > 
						
						
					 
					
						2025-03-26 11:11:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						35fad35a48 
					 
					
						
						
							
							[V1][Sampler] Faster top-k only implementation ( #15478 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-26 10:56:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						733e7c9e95 
					 
					
						
						
							
							[Refactor] Remove unnecessary backend parameter in structured output interface ( #15317 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-03-26 17:51:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0af4d764d6 
					 
					
						
						
							
							Fix weight loading for some models in Transformers backend ( #15544 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-26 10:17:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e64afa455c 
					 
					
						
						
							
							multi-node offline DP+EP example ( #15484 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-26 23:54:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1711b929b6 
					 
					
						
						
							
							[Model] Add Reasoning Parser for Granite Models ( #14202 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com >
Co-authored-by: Joe Runde <joe@joerun.de > 
						
						
					 
					
						2025-03-26 14:28:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c091c0a588 
					 
					
						
						
							
							Improve validation of TP in Transformers backend ( #15540 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-26 07:26:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1aa162e030 
					 
					
						
						
							
							Apply torchfix ( #15532 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cyy <cyyever@outlook.com > 
						
						
					 
					
						2025-03-26 12:09:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf5c8f1686 
					 
					
						
						
							
							Separate base model from TransformersModel ( #15467 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-03-26 18:13:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ec2cee000 
					 
					
						
						
							
							[Misc] improve example script output ( #15528 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com > 
						
						
					 
					
						2025-03-26 10:12:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99f536f830 
					 
					
						
						
							
							[Misc] Enhance warning information to user-defined chat template ( #15408 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-03-26 02:21:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5ebf66748b 
					 
					
						
						
							
							[FEAT][ROCm] Integrate Fused MoE Kernels from AITER ( #14967 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-03-26 16:30:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						781d056280 
					 
					
						
						
							
							[Feature] Enhance EAGLE Architecture with Proper RMS Norms ( #14990 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-03-26 08:24:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5aefd6ac31 
					 
					
						
						
							
							Fix raw_request extraction in load_aware_call decorator ( #15382 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniel Salib <danielsalib@meta.com > 
						
						
					 
					
						2025-03-25 22:29:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c663dfd5e 
					 
					
						
						
							
							[misc] LoRA - Skip LoRA kernels when not required ( #15152 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-26 11:33:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						33437bc6e7 
					 
					
						
						
							
							[BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) ( #15492 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-03-25 20:33:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23114d3364 
					 
					
						
						
							
							[Misc] Warn about v0 in benchmark_paged_attn.py ( #15495 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-25 20:31:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						997c8811d6 
					 
					
						
						
							
							[Model] Support multi-image for Molmo ( #15438 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-26 11:26:33 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e42389f9d7 
					 
					
						
						
							
							Transformers backend already supports V1 ( #15463 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-25 20:26:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						244d5cc749 
					 
					
						
						
							
							update  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-26 01:46:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						816693fd00 
					 
					
						
						
							
							update  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-26 01:45:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c16128106 
					 
					
						
						
							
							updated  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-26 01:45:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7bb88b2edc 
					 
					
						
						
							
							updated  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-26 01:44:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae4f3e2aeb 
					 
					
						
						
							
							update  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-26 01:41:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ff38f0a32c 
					 
					
						
						
							
							[CI/Build] LoRA: Delete long context tests ( #15503 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-25 17:18:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a5cfbab3c8 
					 
					
						
						
							
							[Core] LoRA: V1 Scheduler optimization ( #15422 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-25 22:50:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac3cd6e83c 
					 
					
						
						
							
							[core] add bucket padding to tpu_model_runner ( #14995 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <llccyy1212@gmail.com >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-25 17:27:22 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						082ab86f5f 
					 
					
						
						
							
							[V1] Support long_prefill_token_threshold in v1 scheduler ( #15419 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-03-25 14:22:26 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6aa196c8dc 
					 
					
						
						
							
							[V1][Minor] Use SchedulerInterface type for engine scheduler field ( #15499 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-25 14:21:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0dd7dcd49 
					 
					
						
						
							
							[TPU][V1] Fix Sampler recompilation ( #15309 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-25 16:43:54 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e977c11111 
					 
					
						
						
							
							Add workaround for shared field_names in pydantic model class ( #13925 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Max de Bayser <mbayser@br.ibm.com > 
						
						
					 
					
						2025-03-25 20:31:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5f063a80bd 
					 
					
						
						
							
							[bugfix] add supports_v1 platform interface ( #15417 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-03-25 15:00:32 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5d8e1c9279 
					 
					
						
						
							
							[Bugfix] Support triton==3.3.0+git95326d9f for RTX 5090 (Unsloth + vLLM compatibility) ( #15471 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: ServerAI <ai@exc-mad-ai.com > 
						
						
					 
					
						2025-03-25 17:59:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0a049c7d86 
					 
					
						
						
							
							[CI/Build] Add tests for the V1 tpu_model_runner. ( #14843 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yarong Mu <ymu@google.com > 
						
						
					 
					
						2025-03-25 12:27:16 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d0cfec7ab9 
					 
					
						
						
							
							[bugfix] fix inductor cache on max_position_embeddings ( #15436 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-25 07:05:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a608160027 
					 
					
						
						
							
							[Kernel] Fix conflicting macro names for gguf kernels ( #15456 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: SzymonOzog <szymon.ozog@gmail.com > 
						
						
					 
					
						2025-03-25 13:50:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3f04a7fbf2 
					 
					
						
						
							
							[Doc] Update V1 user guide for multi-modality ( #15460 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-25 11:01:58 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5994430b84 
					 
					
						
						
							
							[Misc] Remove redundant num_embeds ( #15443 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-25 18:27:57 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a9e879b316 
					 
					
						
						
							
							[Misc] Clean up MiniCPM-V/O code ( #15337 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-25 10:22:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e2f37a69a 
					 
					
						
						
							
							Dockerfile.ppc64le changes to move to UBI ( #15402 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com > 
						
						
					 
					
						2025-03-25 10:15:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f044b1d67 
					 
					
						
						
							
							[Kernel][CPU] CPU MLA ( #14744 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-03-25 09:34:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4157f563b4 
					 
					
						
						
							
							[Hardware][TPU][Bugfix] Fix v1 mp profiler ( #15409 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-03-25 01:43:00 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						051da7efe3 
					 
					
						
						
							
							Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 ( #15160 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Richard Barnes <rbarnes@meta.com > 
						
						
					 
					
						2025-03-25 15:36:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						25f560a62c 
					 
					
						
						
							
							[V1][Spec Decode] Update target_logits in place for rejection sampling ( #15427 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-24 21:04:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a09ad90a72 
					 
					
						
						
							
							[V1] guidance backend for structured output + auto fallback mode ( #14779 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com >
Co-authored-by: Michal Moskal <michal@moskal.me > 
						
						
					 
					
						2025-03-24 21:02:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10b34e36b9 
					 
					
						
						
							
							[Bugfix] Fixed the issue of not being able to input video and image simultaneously ( #15387 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-03-25 03:48:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b5269db959 
					 
					
						
						
							
							Revert "Fix non-contiguous input passed to Marlin kernel ( #15319 )" ( #15398 )  
						
						 
						
						
						
						
					 
					
						2025-03-24 20:43:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6db94571d7 
					 
					
						
						
							
							[Misc] Remove LoRA log ( #15388 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-24 20:43:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97cfa65df7 
					 
					
						
						
							
							Add pipeline parallel support to TransformersModel ( #12832 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-25 10:41:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						911c8eb000 
					 
					
						
						
							
							[Minor][Spec Decode] Remove compiled_softmax ( #15416 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-24 19:09:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ebcebeeb6b 
					 
					
						
						
							
							[V1][Spec Decode] Enable spec decode for top-p & top-k sampling ( #15063 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-24 17:16:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f533b5837f 
					 
					
						
						
							
							[ROCm][Kernel] MoE weights padding ( #14454 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: charlifu <charlifu@amd.com > 
						
						
					 
					
						2025-03-24 23:45:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8279201ce6 
					 
					
						
						
							
							[Build] Cython compilation support fix ( #14296 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-03-24 23:37:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23fdab00a8 
					 
					
						
						
							
							[Hardware][TPU] Skip failed compilation test ( #15421 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-03-24 23:28:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						623e2ed29f 
					 
					
						
						
							
							[BugFix][V1] Quick fix for min_tokens with multiple EOS ( #15407 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-24 15:58:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d72daf4ce 
					 
					
						
						
							
							[V1][Perf] Simpler request output queues ( #15156 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-24 22:44:08 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6dd55af6c9 
					 
					
						
						
							
							[Doc] Update docs on handling OOM ( #15357 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-24 14:29:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3eb08ed9b1 
					 
					
						
						
							
							[DOC] Add Kubernetes deployment guide with CPUs ( #14865 )  
						
						 
						
						
						
						
					 
					
						2025-03-24 10:48:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5eeadc2642 
					 
					
						
						
							
							[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral ( #12303 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: zhenwei <zhenweiliu@habana.ai > 
						
						
					 
					
						2025-03-24 09:48:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3aee6573dc 
					 
					
						
						
							
							[V1] Aggregate chunked prompt logprobs in model runner ( #14875 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-24 12:27:57 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9cc645141d 
					 
					
						
						
							
							[MISC] Refine no available block debug msg ( #15076 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yi Liu <yiliu4@habana.ai >
Signed-off-by: yiliu30 <yi4.liu@intel.com >
Co-authored-by: Yi Liu <yiliu4@habana.ai > 
						
						
					 
					
						2025-03-25 00:01:10 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0893567db9 
					 
					
						
						
							
							[V1][Minor]   fix comments ( #15392 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chenjincong <chenjincong@baidu.com >
Signed-off-by: Chen-0210 <chenjincong11@gmail.com >
Co-authored-by: chenjincong <chenjincong@baidu.com > 
						
						
					 
					
						2025-03-24 08:45:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8abe69b499 
					 
					
						
						
							
							[Core] Don't force uppercase for VLLM_LOGGING_LEVEL ( #15306 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-24 08:27:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						761702fd19 
					 
					
						
						
							
							[Core] Integrate fastsafetensors loader for loading model weights ( #10647 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com > 
						
						
					 
					
						2025-03-24 08:08:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9606d572ed 
					 
					
						
						
							
							[distributed] fix dp group ( #15355 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-24 14:54:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cbcdf2c609 
					 
					
						
						
							
							[Bugfix] Fix chat template loading ( #15143 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-24 13:50:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						038de04d7b 
					 
					
						
						
							
							Fix zmq IPv6 URL format error ( #15341 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-24 09:30:41 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6b3cc75be0 
					 
					
						
						
							
							[Kernel] allow non-contiguous input for marlin kernel ( #14658 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-03-24 09:21:33 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ffcccfa5c 
					 
					
						
						
							
							Revert "[CI/Build] Use uv python for docker rather than ppa:deadsnakess/ppa ( #13569 )" ( #15377 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-24 05:53:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc8accfd53 
					 
					
						
						
							
							[Misc] Update guided decoding logs to debug ( #15310 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com > 
						
						
					 
					
						2025-03-24 04:25:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						948ab03e7e 
					 
					
						
						
							
							[Bugfix][V1] Avoid importing PreTrainedModel ( #15366 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hollow Man <hollowman@opensuse.org > 
						
						
					 
					
						2025-03-24 10:33:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5797fb97e9 
					 
					
						
						
							
							[Misc] Remove ignore_reinit_error for ray.init() ( #15373 )  
						
						 
						
						
						
						
					 
					
						2025-03-24 07:41:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3892e58ad7 
					 
					
						
						
							
							[Misc] Upgrade BNB version ( #15183 )  
						
						 
						
						
						
						
					 
					
						2025-03-24 05:51:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d20e261199 
					 
					
						
						
							
							Fix non-contiguous input passed to Marlin kernel ( #15319 )  
						
						 
						
						
						
						
					 
					
						2025-03-24 03:09:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f622dbcf39 
					 
					
						
						
							
							[Fix] [torch.compile] Improve UUID system for custom passes ( #15249 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-03-24 01:54:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dccf535f8e 
					 
					
						
						
							
							[V1] Enable V1 Fp8 cache for FA3 in the oracle ( #15191 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-03-23 15:07:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9c5c81b0da 
					 
					
						
						
							
							[Misc][Doc] Add note regarding loading generation_config by default ( #15281 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-23 14:00:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6cd59f122 
					 
					
						
						
							
							[Frontend] Support tool calling and reasoning parser ( #14511 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-03-23 14:00:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc8ed3c4ba 
					 
					
						
						
							
							[V1][Spec Decode] Use better defaults for N-gram ( #15358 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-23 10:52:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9bd76ca14 
					 
					
						
						
							
							[V1][Spec Decode] Respect prompt_lookup_max ( #15348 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-23 10:41:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ebaf9ac71 
					 
					
						
						
							
							[Bugfix] consider related env vars for torch.compiled cache hash ( #14953 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DefTruth <31974251+DefTruth@users.noreply.github.com > 
						
						
					 
					
						2025-03-23 15:53:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f90d34b498 
					 
					
						
						
							
							[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 ( #15322 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DefTruth <qiustudent_r@163.com > 
						
						
					 
					
						2025-03-23 01:10:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f68cce8e64 
					 
					
						
						
							
							[ci/build] fix broken tests in LLM.collective_rpc ( #15350 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-23 14:49:48 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						09b6a95551 
					 
					
						
						
							
							[ci/build] update torch nightly version for GH200 ( #15135 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-23 14:04:13 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						50c9636d87 
					 
					
						
						
							
							[V1][Usage] Refactor speculative decoding configuration and tests ( #14434 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-03-22 19:28:10 -10:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0661cfef7a 
					 
					
						
						
							
							Fix v1 supported oracle for worker-cls and worker-extension-cls ( #15324 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-23 10:23:35 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a827aa815d 
					 
					
						
						
							
							[doc] Add back previous news ( #15331 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-03-22 17:38:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b877031d80 
					 
					
						
						
							
							Remove openvino support in favor of external plugin ( #15339 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-22 14:06:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd861b992f 
					 
					
						
						
							
							[BugFix][Typing] Fix Imprecise Type Annotations ( #15208 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wang Ran (汪然) <wrran@outlook.com > 
						
						
					 
					
						2025-03-22 09:05:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb63ea1e18 
					 
					
						
						
							
							[V1] Add disable-any-whitespace option support for xgrammar ( #15316 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-22 15:56:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f4bd358f1 
					 
					
						
						
							
							[Model] Support Tele-FLM Model ( #15023 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Naitong Yu <ntyu@baai.ac.cn >
Signed-off-by: jiangxin <horizon94@outlook.com >
Co-authored-by: Jason Fang <jasonfang3900@gmail.com >
Co-authored-by: jiangxin <horizon94@outlook.com > 
						
						
					 
					
						2025-03-22 02:04:44 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a8b30eac1 
					 
					
						
						
							
							[Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes ( #15308 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-22 02:03:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2fa0e1396b 
					 
					
						
						
							
							[Bugfix] Fix torch.compile raise FileNotFoundError ( #15278 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-22 13:49:34 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c2bec0f82 
					 
					
						
						
							
							[Doc] add load_format items in docs ( #14804 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wwl2755 <wangwenlong2755@gmail.com > 
						
						
					 
					
						2025-03-21 22:36:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec870fba9a 
					 
					
						
						
							
							[FEAT] [ROCm]:  Add AITER RMS Norm (Layer Norm) Feature ( #14959 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-03-21 22:36:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						df1430265c 
					 
					
						
						
							
							[Bugfix][V0] Multi-sequence logprobs streaming edge case ( #15259 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Lo <andy@mistral.ai > 
						
						
					 
					
						2025-03-21 22:35:37 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c69e228b3 
					 
					
						
						
							
							[Misc] Increase RayDistributedExecutor RAY_CGRAPH_get_timeout ( #15301 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-03-21 22:25:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						790b79750b 
					 
					
						
						
							
							[Build/CI] Fix env var typo ( #15305 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-21 22:28:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cfbb8c930f 
					 
					
						
						
							
							[TPU][V1] MHA Pallas backend ( #15288 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-21 08:50:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						baec0d4de9 
					 
					
						
						
							
							Revert "[Feature] specify model in config.yaml  ( #14855 )" ( #15293 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-21 08:30:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c21b99b912 
					 
					
						
						
							
							[Bugfix][VLM] fix llava processor ( #15285 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-03-21 05:14:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						93a00d7dde 
					 
					
						
						
							
							[v1] Refactor KVCacheConfig ( #14079 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-03-21 04:56:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61e8c18350 
					 
					
						
						
							
							[Misc] Add cProfile helpers ( #15074 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-21 04:56:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8afcd0f633 
					 
					
						
						
							
							[Bugfix] Fix broken kernel test due to missing rename for v1 Triton backend ( #15282 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-21 11:42:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						91ca929dc7 
					 
					
						
						
							
							[V1] Fix wrong import path of get_flash_attn_version ( #15280 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lehua Ding <lehuading@tencent.com > 
						
						
					 
					
						2025-03-21 03:54:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						84e00adc8a 
					 
					
						
						
							
							[Bugfix] Fix incorrect resolving order for transformers fallback ( #15279 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-21 03:54:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47c7126213 
					 
					
						
						
							
							[Misc] Add attention mask pre-computation optimization back to Qwen2.5-VL ( #15273 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-21 10:32:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a989ca2bf6 
					 
					
						
						
							
							[Bugfix] Add int8 torch dtype for KVCache ( #15260 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: shen-shanshan <467638484@qq.com > 
						
						
					 
					
						2025-03-21 08:58:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0fa3970deb 
					 
					
						
						
							
							[Feature] specify model in config.yaml  ( #14855 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: weizeng <weizeng@roblox.com > 
						
						
					 
					
						2025-03-21 00:26:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da6ea29f7a 
					 
					
						
						
							
							[V1] Avoid redundant input processing in n>1 case ( #14985 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-20 22:24:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7297941b38 
					 
					
						
						
							
							[Doc] Update LWS docs ( #15163 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Edwinhr716 <Edandres249@gmail.com > 
						
						
					 
					
						2025-03-20 21:18:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f8a08cb90d 
					 
					
						
						
							
							[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs ( #14071 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-21 03:14:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b15fd2be2a 
					 
					
						
						
							
							[Hardware][TPU] Add check for no additional graph compilation during runtime ( #14710 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-03-21 03:05:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e588ac237c 
					 
					
						
						
							
							Add an example for reproducibility ( #15262 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-20 19:55:47 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5df2da5b97 
					 
					
						
						
							
							[Misc] Better RayExecutor and multiprocessing compatibility ( #14705 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-20 19:27:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						11b986b3fb 
					 
					
						
						
							
							[Docs] Trim the latest news in README ( #15261 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-20 19:24:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						296f927f24 
					 
					
						
						
							
							[Model] RE: Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies  ( #14857 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com > 
						
						
					 
					
						2025-03-20 19:21:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0032903a5b 
					 
					
						
						
							
							[Bugfix] detect alibi and revert to FA2 ( #15231 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com > 
						
						
					 
					
						2025-03-20 19:20:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47195057e9 
					 
					
						
						
							
							[V1][TPU] Speed up top-k on TPU by using torch.topk ( #15242 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hyesoo Yang <hyeygit@gmail.com > 
						
						
					 
					
						2025-03-20 19:19:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6edbfa924d 
					 
					
						
						
							
							Mention extra_body as a way top pass vLLM only parameters using the OpenAI client ( #15240 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-20 19:18:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e508343e1 
					 
					
						
						
							
							[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation ( #15200 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-20 19:18:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e0b4cfde0 
					 
					
						
						
							
							[ROCM] Upgrade torch to 2.6 ( #15244 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-03-20 19:17:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10f55fe6c5 
					 
					
						
						
							
							[Misc] Clean up the BitsAndBytes arguments ( #15140 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-20 19:17:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3ccbd6350 
					 
					
						
						
							
							Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 ( #15159 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Richard Barnes <rbarnes@meta.com > 
						
						
					 
					
						2025-03-21 10:01:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0cfe7d386d 
					 
					
						
						
							
							[CI/Build] LoRA : make add_lora_test safer ( #15181 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-21 09:28:53 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c6f5023c3 
					 
					
						
						
							
							[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface ( #15250 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-20 17:50:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						06dd08256f 
					 
					
						
						
							
							Enforce that TP > 1 is not supported for Mamba2 if Quantization is Enabled. ( #14617 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com > 
						
						
					 
					
						2025-03-21 00:44:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b22290ce0 
					 
					
						
						
							
							[V1] Add flag to disable cascade attention ( #15243 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-20 15:24:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d8e82bc06d 
					 
					
						
						
							
							[Bugfix] fix V1 Engine crash while handling requests with duplicate request id ( #15043 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jiahui Sun <jhsun2020@gmail.com > 
						
						
					 
					
						2025-03-20 10:01:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						086b56824c 
					 
					
						
						
							
							[ci] feat: make the test_torchrun_example run with tp=2, external_dp=2 ( #15172 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-21 00:30:04 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a0905ba2a 
					 
					
						
						
							
							Replace misc issues with link to forum ( #15226 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-20 23:18:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a8f12a63fd 
					 
					
						
						
							
							Fix env vars for running Ray distributed backend on GKE ( #15166 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Liu <ricliu@google.com > 
						
						
					 
					
						2025-03-20 14:59:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69ae2380c6 
					 
					
						
						
							
							Add user forum to README ( #15220 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-20 22:39:51 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27261e40a6 
					 
					
						
						
							
							[Bugfix] Multi-video inference on LLaVA-Onevision ( #15082 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-20 14:10:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e3f813c33b 
					 
					
						
						
							
							[macOS] Ugrade pytorch to 2.6.0 ( #15129 )  
						
						 
						
						
						
						
					 
					
						2025-03-20 01:22:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c607a2652b 
					 
					
						
						
							
							Fixing Imprecise Type Annotations ( #15192 )  
						
						 
						
						
						
						
					 
					
						2025-03-20 01:19:55 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d45e3d749 
					 
					
						
						
							
							[release] Tag vllm-cpu with latest upon new version released ( #15193 )  
						
						 
						
						
						
						
					 
					
						2025-03-20 01:19:10 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						742369d35a 
					 
					
						
						
							
							[Frontend][Bugfix] support prefill decode disaggregation on deepseek ( #14824 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: billishyahao <bill.he@amd.com >
Co-authored-by: Zhai Feiyue <80079571+ZhaiFeiyue@users.noreply.github.com > 
						
						
					 
					
						2025-03-20 00:00:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bfe2fe0af4 
					 
					
						
						
							
							typo: Update config.py ( #15189 )  
						
						 
						
						
						
						
					 
					
						2025-03-19 23:31:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a8652f4f0f 
					 
					
						
						
							
							Enable CUDA graph support for llama 3.2 vision ( #14917 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com > 
						
						
					 
					
						2025-03-19 23:29:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f726b241e 
					 
					
						
						
							
							[Doc] Update README.md ( #15187 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-20 13:25:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a597a57595 
					 
					
						
						
							
							[Attention] Flash Attention 3 - fp8 ( #14570 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mickael Seznec <mickael@mistral.ai > 
						
						
					 
					
						2025-03-20 01:14:20 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae65f3e237 
					 
					
						
						
							
							[Misc]fixed disable these http request logs ( #14754 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-03-19 21:53:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34868b106a 
					 
					
						
						
							
							[Doc] Update Mistral Small 3.1/Pixtral example ( #15184 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-20 04:46:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f16b7fe74 
					 
					
						
						
							
							[Core][V0] Add guidance backend for structured output ( #14589 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Loc Huynh <lohuynh@microsoft.com >
Co-authored-by: Michal Moskal <michal@moskal.me >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-03-19 21:33:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b88be22165 
					 
					
						
						
							
							[Benchmark] Allow oversample request in benchmark dataset ( #15170 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com > 
						
						
					 
					
						2025-03-20 12:32:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d8c6d7d6b5 
					 
					
						
						
							
							[V1][TPU] Support V1 Sampler for ragged attention ( #14227 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-19 21:00:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40828ce5fe 
					 
					
						
						
							
							fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… ( #14673 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wang, Yi A <yi.a.wang@intel.com > 
						
						
					 
					
						2025-03-19 20:56:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ffa443afed 
					 
					
						
						
							
							[Bugfix] Fix embedding assignment for InternVL-based models ( #15086 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-20 03:40:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70e500cad9 
					 
					
						
						
							
							Fix broken tests ( #14713 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: JovanSardinha <jovan.sardinha@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com > 
						
						
					 
					
						2025-03-20 02:06:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4cb1c05c9e 
					 
					
						
						
							
							[Doc] Clarify run vllm only on one node in distributed inference ( #15148 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-03-20 09:55:59 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c47aafa37c 
					 
					
						
						
							
							[BugFix] Lazily import XgrammarBackend to avoid early cuda init ( #15171 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-20 01:30:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cfbca8a2f2 
					 
					
						
						
							
							[V1] TPU - Tensor parallel MP support ( #15059 )  
						
						 
						
						
						
						
					 
					
						2025-03-20 00:55:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0fe5609874 
					 
					
						
						
							
							[Docs] Annouce Ollama and Singapore Meetups ( #15161 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-19 16:18:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						22d33baca2 
					 
					
						
						
							
							[FrontEnd][Perf] merge_async_iterators fast-path for single-prompt requests ( #15150 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-19 21:04:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b0e96aaebb 
					 
					
						
						
							
							[V1][TPU] Change kv cache shape. ( #15145 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-03-19 12:16:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8310e0b59b 
					 
					
						
						
							
							simple bugfix: Update stats.py ( #15139 )  
						
						 
						
						
						
						
					 
					
						2025-03-19 18:26:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						26dd972adb 
					 
					
						
						
							
							[FEAT]Support reset prefix cache by specified device ( #15003 )  
						
						 
						
						
						
						
					 
					
						2025-03-19 10:54:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61c7a1b856 
					 
					
						
						
							
							[V1] Minor V1 async engine test refactor ( #15075 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca >
Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca > 
						
						
					 
					
						2025-03-19 10:37:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						374ee287d8 
					 
					
						
						
							
							[Frontend] Remove custom_cache_manager ( #13791 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: fulvius31 <asangior@redhat.com > 
						
						
					 
					
						2025-03-20 00:13:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4d83661d7 
					 
					
						
						
							
							[Misc] Update the "the first vLLM China Meetup" slides link to point to the first page ( #15134 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: imkero <kerorek@outlook.com > 
						
						
					 
					
						2025-03-19 15:07:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8363cd093d 
					 
					
						
						
							
							[Bugfix] Adjust mllama to regional compilation ( #15112 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai > 
						
						
					 
					
						2025-03-19 07:57:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c5a3195db 
					 
					
						
						
							
							[Misc][Benchmark] Add support for different tokenizer_mode ( #15040 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-19 14:56:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						073d1ed354 
					 
					
						
						
							
							[Doc] Update tip info on using latest transformers when creating a custom Dockerfile  ( #15070 )  
						
						 
						
						
						
						
					 
					
						2025-03-19 13:33:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d446433ec 
					 
					
						
						
							
							[Bugfix] Fix size calculation of processing cache ( #15114 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-19 05:53:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1fe0fd12d3 
					 
					
						
						
							
							[Misc] Avoid unnecessary HF do_rescale warning when passing dummy data ( #15107 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-19 03:42:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dafb4e504a 
					 
					
						
						
							
							[V1][Bugfix] Fix oracle for device checking ( #15104 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-19 18:35:32 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68cf1601d3 
					 
					
						
						
							
							[CI][Intel GPU] update XPU dockerfile and CI script ( #15109 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-03-19 01:29:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61f412187d 
					 
					
						
						
							
							[Bugfix] Re-enable Gemma3 for V1 ( #14980 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-18 23:58:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05ccd0aa35 
					 
					
						
						
							
							[V1] Ensure using int64 for sampled token ids ( #15065 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-18 23:52:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f690372b68 
					 
					
						
						
							
							[Core] Update dtype detection and defaults ( #14858 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-19 13:49:33 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8b3e94a357 
					 
					
						
						
							
							[Model] Remove duplicated message check in Mistral chat completion request ( #15069 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-03-19 05:09:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						437f9162d0 
					 
					
						
						
							
							[Model] Pixtral: Remove layer instantiation duplication ( #15053 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Julien Denize <julien.denize@mistral.ai > 
						
						
					 
					
						2025-03-19 10:34:03 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f065f12f5 
					 
					
						
						
							
							[Misc][V1] Skip device checking if not available ( #15061 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-18 19:33:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						228b768db6 
					 
					
						
						
							
							[Doc] Minor v1_user_guide update ( #15064 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com > 
						
						
					 
					
						2025-03-18 16:10:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						027827cc1d 
					 
					
						
						
							
							fix long dtype in topk sampling ( #15049 )  
						
						 
						
						
						
						
					 
					
						2025-03-18 15:57:31 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72a8639b68 
					 
					
						
						
							
							[V1] TPU - CI/CD use smaller model ( #15054 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-18 21:39:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99abb8b650 
					 
					
						
						
							
							[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels ( #14930 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-18 14:31:54 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3a1e648158 
					 
					
						
						
							
							[V1] Refactor Structured Output for multiple backends ( #14694 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-18 19:49:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						46c759c165 
					 
					
						
						
							
							[Bugfix] Fix LoRA extra vocab size ( #15047 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-18 09:40:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						179a619c21 
					 
					
						
						
							
							[Bugfix] Fix broken CPU quantization due to triton import ( #15038 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-18 08:57:39 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						452e8fd968 
					 
					
						
						
							
							[MODEL] Add support for Zamba2 models ( #13185 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yury Tokpanov <yury@zyphra.com >
Signed-off-by: Quentin Anthony <qganthony@yahoo.com >
Co-authored-by: Quentin Anthony <qganthony@yahoo.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-03-18 08:56:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8b793f7ec6 
					 
					
						
						
							
							MI325 configs, fused_moe_kernel bugfix ( #14987 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Eugene Kuznetsov <eugene.kuznetsov@amd.com > 
						
						
					 
					
						2025-03-18 08:05:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af35d3a3cc 
					 
					
						
						
							
							[TPU][V1][Bugfix] Fix chunked prefill with padding ( #15037 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-18 07:34:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b457143d2 
					 
					
						
						
							
							[Bugfix] Register serializers for V0 MQ Engine ( #15009 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-18 09:14:47 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ab656f2c2f 
					 
					
						
						
							
							[Bugfix] Loosen type check to avoid errors in V1 ( #15021 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-18 12:54:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						64fc2193dc 
					 
					
						
						
							
							[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros ( #14347 )  
						
						 
						
						
						
						
					 
					
						2025-03-18 05:50:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd732028f5 
					 
					
						
						
							
							[Bugfix][Frontend] Fix validation of logprobs in ChatCompletionRequest ( #14352 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com > 
						
						
					 
					
						2025-03-18 05:50:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						414919138b 
					 
					
						
						
							
							[Bugfix] torchrun compatibility ( #14899 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: hiyouga <hiyouga@buaa.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-18 05:49:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						db7c8ca910 
					 
					
						
						
							
							[Misc] Embedding model support LoRA ( #14935 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-18 12:07:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f863ffc965 
					 
					
						
						
							
							[Mistral-Small 3.1] Update docs and tests ( #14977 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-18 03:29:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						400d483e87 
					 
					
						
						
							
							[Kernels] LoRA - Retire SGMV and BGMV Kernels ( #14685 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-18 09:47:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d1695758b2 
					 
					
						
						
							
							[Doc][V1] Fix V1 APC doc ( #14920 )  
						
						 
						
						
						
						
					 
					
						2025-03-18 08:15:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53a0cf8b95 
					 
					
						
						
							
							[Neuron] trim attention kernel tests to fit trn1.2x instance ( #14988 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com > 
						
						
					 
					
						2025-03-18 15:05:52 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5eeabc2a44 
					 
					
						
						
							
							[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights ( #14950 )  
						
						 
						
						
						
						
					 
					
						2025-03-17 23:27:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						18551e820c 
					 
					
						
						
							
							[V1] TPU - Fix CI/CD runner ( #14974 )  
						
						 
						
						
						
						
					 
					
						2025-03-17 21:07:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e41e160263 
					 
					
						
						
							
							[V1] Guard Against Main Thread Usage ( #14972 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com > 
						
						
					 
					
						2025-03-17 13:23:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b89fb2a4a1 
					 
					
						
						
							
							[CI/Build] Use AutoModelForImageTextToText to load VLMs in tests ( #14945 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-17 18:35:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5340b0e221 
					 
					
						
						
							
							[Bugfix] Fix interface for Olmo2 on V1 ( #14976 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-17 11:26:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37e3806132 
					 
					
						
						
							
							[Bugfix] Make Gemma3 MM V0 only for now ( #14971 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-17 10:04:21 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c0efdd655b 
					 
					
						
						
							
							[Fix][Structured Output] using vocab_size to construct matcher ( #14868 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com > 
						
						
					 
					
						2025-03-17 11:42:45 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aaaec52ad9 
					 
					
						
						
							
							[Bugfix][Model] Mixtral: use unused head_dim config argument ( #14961 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Quentin Torroba <quentin.torroba@mistral.ai > 
						
						
					 
					
						2025-03-17 07:44:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1eb45d397 
					 
					
						
						
							
							[Bugfix] Fix precommit - line too long in pixtral.py ( #14960 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-17 07:18:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89fca671fb 
					 
					
						
						
							
							[V1] Default MLA to V1 ( #14921 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-17 06:54:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d20b0c139c 
					 
					
						
						
							
							Add patch merger ( #14957 )  
						
						 
						
						
						
						
					 
					
						2025-03-17 06:47:50 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						166a168b0f 
					 
					
						
						
							
							[Doc] Fix misleading log during multi-modal profiling ( #14955 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-17 06:14:32 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2bb0e1a799 
					 
					
						
						
							
							[Bugfix][ROCm] running new process using spawn method for rocm in tests. ( #14810 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-03-17 11:33:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6eaf1e5c52 
					 
					
						
						
							
							[Misc] Add --seed option to offline multi-modal examples ( #14934 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-17 03:00:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						868a8c5b2c 
					 
					
						
						
							
							[Bugfix] Fix Ultravox on V1 ( #14929 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-17 17:15:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4ad56c1bd 
					 
					
						
						
							
							[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. ( #14846 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com > 
						
						
					 
					
						2025-03-17 01:48:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69698f257e 
					 
					
						
						
							
							fix minor miscalled method ( #14327 )  
						
						 
						
						
						
						
					 
					
						2025-03-17 01:47:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd0cd85102 
					 
					
						
						
							
							[MISC] More AMD unused var clean up ( #14926 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-03-17 16:40:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0a74bfce9c 
					 
					
						
						
							
							setup.py: drop assumption about local main branch ( #14692 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-17 01:37:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd3b865854 
					 
					
						
						
							
							[Doc] Add vLLM Beijing meetup slide ( #14938 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-03-17 16:29:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9b87a579aa 
					 
					
						
						
							
							[Misc][XPU] Use None as device capacity for XPU ( #14932 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yan ma <yan.ma@intel.com > 
						
						
					 
					
						2025-03-17 01:22:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b539222d4e 
					 
					
						
						
							
							[V1] Remove input cache client ( #14864 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-16 23:42:06 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d6cf89526 
					 
					
						
						
							
							[V1] [Spec Decode] Support random sampling for spec decode ( #13933 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-16 22:00:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						583a9778e0 
					 
					
						
						
							
							[Benchmark] Do not save detailed info to json by default ( #14879 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-16 21:48:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a73e183e36 
					 
					
						
						
							
							[Misc] Replace os environ to monkeypatch in test suite ( #14516 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-03-16 20:35:57 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e799b7ec1 
					 
					
						
						
							
							[BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context ( #14910 )  
						
						 
						
						
						
						
					 
					
						2025-03-17 03:35:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f6c5ee06c 
					 
					
						
						
							
							[V1][Minor] Add __repr__ to ConstantList ( #14907 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-16 20:20:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						faa0275730 
					 
					
						
						
							
							[V1] Optimize the overhead of rewinding ( #14905 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-16 20:19:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a5a9b70d7 
					 
					
						
						
							
							[CI/Build] Update defaults for test reproducibility ( #14893 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-17 10:38:15 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb3aeddfaf 
					 
					
						
						
							
							[CI] Nightly Tests ( #14898 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Signed-off-by: rshaw@neuralmagic.com  <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com > 
						
						
					 
					
						2025-03-17 02:06:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aecc780dba 
					 
					
						
						
							
							[V1] Enable Entrypoints Tests ( #14903 )  
						
						 
						
						
						
						
					 
					
						2025-03-16 17:56:16 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90df7f23aa 
					 
					
						
						
							
							[Doc] Add guidance for using ccache with pip install -e . in doc ( #14901 )  
						
						 
						
						
						
						
					 
					
						2025-03-16 23:10:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9b5bdfc7d 
					 
					
						
						
							
							[Misc] Catching Ray Compiled Graph PP test failures for V1 ( #14847 )  
						
						 
						
						
						
						
					 
					
						2025-03-16 15:46:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31060b2757 
					 
					
						
						
							
							[V1][BugFix] Detect interleaved sliding window attention ( #14896 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-16 14:53:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc1f67715d 
					 
					
						
						
							
							[BugFix][V1] Fix overhead related to bad_words sampling when not in use ( #14894 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-16 14:53:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f6137adbcb 
					 
					
						
						
							
							Revert "[Bugfix] Limit profiling run sequence length by max_model_len ( #14785 ) ( #14892 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-16 09:13:46 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e53b1350f2 
					 
					
						
						
							
							[Bugfix] Explicitly disable Phi-4-multimodal in V1 ( #14889 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-16 09:05:40 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d30aa7e9e6 
					 
					
						
						
							
							[Bugfix] Limit profiling run sequence length by max_model_len ( #14785 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-03-16 07:44:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d1ad2a57af 
					 
					
						
						
							
							[V1] [Spec Decode] Fix ngram tests ( #14878 )  
						
						 
						
						
						
						
					 
					
						2025-03-16 00:29:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b82662d952 
					 
					
						
						
							
							[BugFix] Fix torch distributed stateless PG backend init ( #14870 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-15 20:26:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71c1e07107 
					 
					
						
						
							
							[Kernel] Add more tuned configs ( #14877 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-15 20:25:03 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b30c75dda4 
					 
					
						
						
							
							[V1] Remove V0 fallback for mistral-tokenizer ( #14873 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-15 20:21:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						def232e122 
					 
					
						
						
							
							[VLM] Clean up Phi-4-MM ViT implementation ( #14812 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-15 18:53:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3453b964a3 
					 
					
						
						
							
							[Misc][Doc] Minor benchmark README update ( #14874 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-16 09:46:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61c6a5a796 
					 
					
						
						
							
							[VLM] Merged multi-modal processor for Pixtral ( #12211 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: remi <remi@mistral.ai >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-15 06:28:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						74bc397b0a 
					 
					
						
						
							
							[Core] Expose API endpoint /is_sleeping ( #14312 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jun Duan <jun.duan.phd@outlook.com > 
						
						
					 
					
						2025-03-15 06:28:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f58aea002c 
					 
					
						
						
							
							[CI][Intel GPU] refine intel GPU ci docker build ( #14860 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-03-15 11:58:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3556a41434 
					 
					
						
						
							
							[VLM] Limit multimodal input cache by memory ( #14805 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-15 02:52:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ed6ee92d6 
					 
					
						
						
							
							[Bugfix] EAGLE output norm bug ( #14464 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Bryan Lu <yuzhelu@amazon.com > 
						
						
					 
					
						2025-03-15 06:50:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ee3778d5fc 
					 
					
						
						
							
							[Build/CI] Upgrade jinja2 to get 3 moderate CVE fixes ( #14839 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-15 05:38:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aaacf17324 
					 
					
						
						
							
							[Doc] V1 user guide ( #13991 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-03-14 22:17:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c7629cae9 
					 
					
						
						
							
							[V1][Structured Output] calculate vocab_size eagerly ( #14851 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-03-14 22:09:51 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0fdfa1608 
					 
					
						
						
							
							[CI/Build] Delete LoRA bias test ( #14849 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-14 22:09:25 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5952d8ab61 
					 
					
						
						
							
							[Attention] Get rid of mla cache alignment ( #14842 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-03-15 05:08:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a2ae496589 
					 
					
						
						
							
							[CPU] Support FP8 KV cache ( #14741 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com > 
						
						
					 
					
						2025-03-14 22:07:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						877e352262 
					 
					
						
						
							
							[Docs] Add new East Coast vLLM Meetup slides to README and meetups.md ( #14852 )  
						
						 
						
						
						
						
					 
					
						2025-03-14 22:06:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d4d93db2c5 
					 
					
						
						
							
							[V1] V1 Enablement Oracle  ( #13726 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Co-authored-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com > 
						
						
					 
					
						2025-03-14 22:02:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c0d15d5c5 
					 
					
						
						
							
							[Misc][Easy] Annotate unused vars in the csrc files ( #14798 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-03-15 12:40:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97ac781c62 
					 
					
						
						
							
							[Misc] Remove misleading message in gemma2 and gemma3 ( #14850 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-14 21:35:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						776dcec8fe 
					 
					
						
						
							
							Disable outlines cache by default ( #14837 )  
						
						 
						
						
						
						
					 
					
						2025-03-15 03:57:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ccf02fcbae 
					 
					
						
						
							
							Revert "[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of U… ( #14848 )  
						
						 
						
						
						
						
					 
					
						2025-03-14 20:45:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						acaea3bb07 
					 
					
						
						
							
							[Bugfix][V1] Fix flashinfer sampling ( #14815 )  
						
						 
						
						
						
						
					 
					
						2025-03-14 20:42:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f37422779 
					 
					
						
						
							
							[Neuron][CI] update docker run command ( #14829 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com > 
						
						
					 
					
						2025-03-14 18:51:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd344e0342 
					 
					
						
						
							
							[Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … ( #14844 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yarong Mu <ymu@google.com > 
						
						
					 
					
						2025-03-15 00:41:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54a8804455 
					 
					
						
						
							
							[Doc] More neutral K8s deployment guide ( #14084 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-03-14 16:12:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bbd94a19fc 
					 
					
						
						
							
							[Build/CI] Upgrade aiohttp to incldue CVE fix ( #14840 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-14 23:11:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						233ffce1eb 
					 
					
						
						
							
							[Build/CI] Move ninja to common deps ( #14835 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-14 21:25:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40677783aa 
					 
					
						
						
							
							[CI] Add TPU v1 test ( #14834 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Richard Liu <ricliu@google.com > 
						
						
					 
					
						2025-03-14 17:13:30 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14f301b541 
					 
					
						
						
							
							Update to torch==2.6.0 ( #12721 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: luka <luka@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-14 16:58:30 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						46f98893dd 
					 
					
						
						
							
							[V1] Fix model parameterization for structured output tests ( #14833 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-14 20:55:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe66b34728 
					 
					
						
						
							
							[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies  ( #14778 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com > 
						
						
					 
					
						2025-03-14 16:36:18 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						270a5da495 
					 
					
						
						
							
							Re-enable the AMD Entrypoints Test ( #14711 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com > 
						
						
					 
					
						2025-03-14 12:18:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7097b4cc1c 
					 
					
						
						
							
							[release] Remove log cleanup commands from TPU job ( #14838 )  
						
						 
						
						
						
						
					 
					
						2025-03-14 11:59:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						977a16772c 
					 
					
						
						
							
							[Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 ( #14430 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wyj371990 <wyj371990@alibaba-inc.com > 
						
						
					 
					
						2025-03-14 09:55:14 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73deea2fdb 
					 
					
						
						
							
							[Frontend] track server_load ( #13950 )  
						
						 
						
						
						
						
					 
					
						2025-03-14 09:53:17 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9d2b4a70f4 
					 
					
						
						
							
							[V1][Metrics] Updated list of deprecated metrics in v0.8 ( #14695 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-15 00:45:25 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b0d6421b2 
					 
					
						
						
							
							[Frontend] Fix log message to use http vs https ( #14774 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-14 09:21:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1140991a7b 
					 
					
						
						
							
							[V1] Fix vocab size calculation for structured output ( #14826 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-14 09:18:38 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						613c5bb945 
					 
					
						
						
							
							[Bugfix] Fix Aria test loading ( #14823 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-14 09:11:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fd8e055ffb 
					 
					
						
						
							
							[BugFix]: properly catch templating error when preprocess input ( #13976 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com > 
						
						
					 
					
						2025-03-14 05:58:34 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ab93f1360f 
					 
					
						
						
							
							[VLM] Various cleanup and fixes ( #14806 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-14 05:58:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40253bab44 
					 
					
						
						
							
							[Bugfix][W8A8] fixed cutlass block fp8 binding ( #14796 )  
						
						 
						
						
						
						
					 
					
						2025-03-14 03:32:42 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c77620d22d 
					 
					
						
						
							
							[V1][Minor] Minor code cleanup for scheduling metrics ( #14800 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-14 08:21:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						989ecd2007 
					 
					
						
						
							
							[Misc] Gemma3ForConditionalGeneration supports LoRA ( #14797 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-14 01:07:30 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54cc46f3eb 
					 
					
						
						
							
							[Bugfix] Fix small typo in the example of Streaming delimiter ( #14793 )  
						
						 
						
						
						
						
					 
					
						2025-03-14 08:05:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						601bd3268e 
					 
					
						
						
							
							[Misc] Clean up type annotation for SupportsMultiModal ( #14794 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-14 00:59:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						09269b3127 
					 
					
						
						
							
							[BugFix]Fix performance serving benchmark when enable profiling ( #14737 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangli <wangli858794774@gmail.com > 
						
						
					 
					
						2025-03-14 07:02:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27b50f1fe6 
					 
					
						
						
							
							[Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel ( #14667 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-03-13 23:47:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9532c49836 
					 
					
						
						
							
							[Attention] MLA get rid of materialization ( #14770 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-03-13 23:39:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c2af17c76 
					 
					
						
						
							
							[CI] Fix missing example model id in processor test ( #14787 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-14 13:52:15 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a6e0d096dd 
					 
					
						
						
							
							[Feature] Add visionarena offline support for benchmark_throughput ( #14654 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com > 
						
						
					 
					
						2025-03-14 04:07:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3d4956261 
					 
					
						
						
							
							[Neuron] flatten test parameterization for neuron attention kernels ( #14712 )  
						
						 
						
						
						
						
					 
					
						2025-03-13 20:46:56 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4059adc31b 
					 
					
						
						
							
							[Misc][Minor] Simplify SamplingParams.__post_init__() ( #14772 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-14 11:44:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f1f632d9ec 
					 
					
						
						
							
							[ci] Reduce number of tests in fastcheck ( #14782 )  
						
						 
						
						
						
						
					 
					
						2025-03-13 20:43:45 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95d680b862 
					 
					
						
						
							
							[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it ( #14681 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg > 
						
						
					 
					
						2025-03-13 20:43:18 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb4c7f8ef0 
					 
					
						
						
							
							[Kernel] [V1] Further optimizations to ROCm (Triton) Backend to better handle GQA. ( #14431 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com >
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com > 
						
						
					 
					
						2025-03-13 20:42:27 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b1cfa6180 
					 
					
						
						
							
							[Kernel] LoRA - Enable CUDAGraphs for V1 ( #14626 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-13 20:42:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32ef4983cd 
					 
					
						
						
							
							[V1] Temporarily disable FlashInfer Rejection Sampler ( #14788 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-13 20:40:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad19c8a003 
					 
					
						
						
							
							[V1] Move OOM check into sampler run ( #14728 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Simon Mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-13 20:40:23 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2a602b055a 
					 
					
						
						
							
							forward fix PR 14245, restore build on ROCm 6.2 ( #14709 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jeff Daily <jeff.daily@amd.com > 
						
						
					 
					
						2025-03-13 20:40:15 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7888e1d0a3 
					 
					
						
						
							
							[V1] TPU - Enable prefix caching by default ( #14773 )  
						
						 
						
						
						
						
					 
					
						2025-03-13 20:40:05 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						60c872d4b6 
					 
					
						
						
							
							[Doc] Fix small typo in Transformers fallback ( #14791 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-03-13 20:33:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3fb17d26c8 
					 
					
						
						
							
							[Doc] Fix typo in documentation ( #14783 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yasu52 <tsuguro4649@gmail.com > 
						
						
					 
					
						2025-03-13 20:33:09 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d47807ba08 
					 
					
						
						
							
							[Attention] Remove slow setattr in MLA ( #14769 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-03-13 21:31:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						02fcaa3d0a 
					 
					
						
						
							
							[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output ( #14624 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com > 
						
						
					 
					
						2025-03-13 19:07:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a4a2efc6f 
					 
					
						
						
							
							[V1][Core] using cached vocab_size for Structured Outputs ( #14630 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-03-13 11:39:28 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8e9ffd37d6 
					 
					
						
						
							
							[Misc] Clean up processor tests ( #14771 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-13 18:25:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01b3fd0af7 
					 
					
						
						
							
							[V1][Minor] Minor enhancements on scheduler ( #14732 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-13 08:53:22 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f53a0586b9 
					 
					
						
						
							
							[Bugfix] Fix prompt format of GLM4V ( #14539 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-13 11:37:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b1cc4dfef5 
					 
					
						
						
							
							[VLM] Support loading InternVideo2.5 models as original InternVLChatModel ( #14738 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-13 03:10:02 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						382403921f 
					 
					
						
						
							
							[VLM] Support pan-and-scan for Gemma3 multi-modal processor ( #14672 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-13 02:23:12 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a73122de96 
					 
					
						
						
							
							[Bugfix] fix benchmark moe ( #14653 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-13 16:12:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd44b812cb 
					 
					
						
						
							
							[CI/Build]  Delete ultravox LoRA test ( #14730 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-13 07:57:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						55211b01e8 
					 
					
						
						
							
							[Bugfix] Fix chunked prefill for GGUF ( #14666 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com > 
						
						
					 
					
						2025-03-13 07:19:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5d043c1685 
					 
					
						
						
							
							[Quant] Bamba SupportsQuant ( #14698 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-03-13 04:57:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						36d1ccb286 
					 
					
						
						
							
							[Quant] BartModel SupportsQuant ( #14699 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-03-13 04:55:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1bc3b739c4 
					 
					
						
						
							
							[V1][TPU] Add assertion on multi-step-scheduler ( #14707 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com > 
						
						
					 
					
						2025-03-12 21:37:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1bd32bc8dd 
					 
					
						
						
							
							[Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config ( #14367 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mathis Felardos <mathis@mistral.ai > 
						
						
					 
					
						2025-03-12 20:15:20 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						128bf75283 
					 
					
						
						
							
							[BugFix][TritonMLA] Process weights after model loading for GGUF ( #14555 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com > 
						
						
					 
					
						2025-03-12 20:14:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a94a699c3f 
					 
					
						
						
							
							[ROCm][FP8] Fix for adjustments needed only for fnuz ( #14689 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-03-12 20:14:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ab426ec9c0 
					 
					
						
						
							
							Add ray[data] as tpu dependency ( #14691 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <ricliu@google.com >
Signed-off-by: Richard Liu <ricliu@google.com > 
						
						
					 
					
						2025-03-12 20:13:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						165290d357 
					 
					
						
						
							
							[bugfix] fixup warning message for plugged schedulers for v1 ( #14700 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-03-12 20:12:13 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce20124671 
					 
					
						
						
							
							[release] Add force remove for TPU logs ( #14697 )  
						
						 
						
						
						
						
					 
					
						2025-03-12 22:35:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53be4a8634 
					 
					
						
						
							
							[V1] Allow sliding window + prefix caching ( #13069 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-12 11:21:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f5d3acd474 
					 
					
						
						
							
							[BugFix][V1] Fix parallel sampling finishing/aborts ( #14512 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-12 10:29:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						916836bbfb 
					 
					
						
						
							
							[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. ( #14664 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com > 
						
						
					 
					
						2025-03-12 09:31:19 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d9f83d6206 
					 
					
						
						
							
							[ROCm] Enable chunked prefill/paged attention in MLA on ROCm ( #14316 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-03-12 15:51:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a754fcf15 
					 
					
						
						
							
							[Bugfix] Missing thumbnail from NVLM-D processor ( #14633 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ameyanjarlekar <aanjarlekar@nvidia.com > 
						
						
					 
					
						2025-03-12 08:50:49 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c0c25e25fa 
					 
					
						
						
							
							[Model] Add support for Gemma 3 ( #14660 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-12 08:36:33 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45f3f3f59e 
					 
					
						
						
							
							[ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. ( #14629 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-03-12 08:00:28 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ff47aab056 
					 
					
						
						
							
							[CPU] Upgrade CPU backend to torch-2.6 ( #13381 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-12 10:41:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						debd6bbf09 
					 
					
						
						
							
							[Kernel] Add ModelOpt FP4 Checkpoint Support ( #12520 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pavani Majety <pmajety@nvidia.com > 
						
						
					 
					
						2025-03-12 05:13:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5c538c37b2 
					 
					
						
						
							
							[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing ( #14645 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-03-11 22:12:41 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e22ee1e7a2 
					 
					
						
						
							
							[Kernel] GGUF MoE kernel ( #14613 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com > 
						
						
					 
					
						2025-03-12 03:33:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e392d85831 
					 
					
						
						
							
							[Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization ( #14545 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-11 20:12:52 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						77a318bd01 
					 
					
						
						
							
							[V1][Core] Support MistralTokenizer for Structured Output ( #14625 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-03-12 10:40:09 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						80e78d02ac 
					 
					
						
						
							
							[Model] Extend Ultravox to accept audio longer than 30s ( #13631 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai > 
						
						
					 
					
						2025-03-12 10:27:10 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a42b9f5d6 
					 
					
						
						
							
							[Doc] Update benchmarks README ( #14646 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com > 
						
						
					 
					
						2025-03-11 19:23:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47532cd9f4 
					 
					
						
						
							
							[core][V1] pluggable scheduler ( #14466 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-03-12 01:15:15 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						36e0c8f7da 
					 
					
						
						
							
							[Feature] Add vllm bench CLI ( #13993 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Randy Chen <acad.randyjhc@gmail.com >
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-12 00:31:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f583e360c 
					 
					
						
						
							
							[release] Add commands to clean up logs on TPU release node ( #14642 )  
						
						 
						
						
						
						
					 
					
						2025-03-12 00:14:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b706d898af 
					 
					
						
						
							
							[Bugfix][V1][PP] Only warmup sampler at last PP rank ( #14643 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-11 23:40:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						863d315c86 
					 
					
						
						
							
							[V1][TPU] Pad the block_table.shape[1] so the ragged paged attention can handle correctly ( #14597 )  
						
						 
						
						
						
						
					 
					
						2025-03-11 19:12:26 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d374f04a33 
					 
					
						
						
							
							Fix run_tpu_test ( #14641 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <ricliu@google.com >
Signed-off-by: Richard Liu <ricliu@google.com > 
						
						
					 
					
						2025-03-11 21:14:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						61a01b27a7 
					 
					
						
						
							
							[V1] Delay all xgrammar usage until needed ( #14616 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-11 20:21:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53056731fd 
					 
					
						
						
							
							fix some typos : supported_head_sizes ( #14627 )  
						
						 
						
						
						
						
					 
					
						2025-03-11 10:38:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4cbf286794 
					 
					
						
						
							
							[V1] Remove cache from StructuredOutputManager ( #14622 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-11 10:36:07 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6e14a61ab 
					 
					
						
						
							
							[Hardware][Intel GPU] upgrade IPEX dependency to 2.6.10.  ( #14564 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-03-11 17:11:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						07b4b7a37f 
					 
					
						
						
							
							[BugFix/Build] Fix sparse kernels not getting built on hopper ( #14572 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-03-11 17:09:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						07964e2f30 
					 
					
						
						
							
							docs: Add documentation for s390x cpu implementation ( #14198 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-11 17:02:17 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4bf82d4b90 
					 
					
						
						
							
							[V1] Add regex structured output support with xgrammar ( #14590 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-11 23:03:44 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ab326713f 
					 
					
						
						
							
							Uninstall dependencies before installing requirements/tpu.txt ( #14586 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <ricliu@google.com >
Signed-off-by: Richard Liu <ricliu@google.com > 
						
						
					 
					
						2025-03-11 08:01:35 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af295e9b01 
					 
					
						
						
							
							[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 ( #14609 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-11 07:59:43 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a1c8f3796c 
					 
					
						
						
							
							dynamic distpatch of fp8 kernels ( #14245 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jeff Daily <jeff.daily@amd.com > 
						
						
					 
					
						2025-03-11 10:54:56 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08a1a1121d 
					 
					
						
						
							
							benchmarks: simplify test jsonschema ( #14567 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-11 13:39:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1477ffc381 
					 
					
						
						
							
							[VLM] Cleanup siglip legacy code and fix broken paligemma multimodal processor ( #14602 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-11 11:27:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70b808fe1a 
					 
					
						
						
							
							[Perf]:Optimize qwen2-vl to reduce cudaMemcpyAsync ( #14377 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cynthieye <987073381@qq.com > 
						
						
					 
					
						2025-03-11 07:39:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63d635d179 
					 
					
						
						
							
							[Misc] Correct deepseek-vl2 chat template ( #14558 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-11 04:37:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1fc973c0b5 
					 
					
						
						
							
							[V1][Core] Fix memory issue with logits & sampling ( #14508 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com > 
						
						
					 
					
						2025-03-11 04:03:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c982ac5722 
					 
					
						
						
							
							[Bugfix] Fix FP16 overflow for DeepSeek V2 ( #13232 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yida Wu <yida.wu@amd.com > 
						
						
					 
					
						2025-03-10 20:46:59 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4290b704ff 
					 
					
						
						
							
							[V1][PP] Do not block engine core when no requests to schedule ( #14585 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-10 19:48:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c91b64f749 
					 
					
						
						
							
							[neuron] add reshape_and_cache ( #14391 )  
						
						 
						
						
						
						
					 
					
						2025-03-10 18:37:29 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d6123170d5 
					 
					
						
						
							
							[Neuron] Add Neuron device communicator for vLLM v1 ( #14085 )  
						
						 
						
						
						
						
					 
					
						2025-03-10 18:37:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						485afdd3cb 
					 
					
						
						
							
							[MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils ( #14379 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-10 20:42:11 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						90e88ab756 
					 
					
						
						
							
							[Kernel] moe wna16 cuda kernel ( #13321 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-10 20:12:40 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04421dff8a 
					 
					
						
						
							
							[V1] Prevent xgrammar from breaking TPU support ( #14575 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-10 23:06:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						432d6dad15 
					 
					
						
						
							
							Fix typo in benchmark_serving_structured_output.py ( #14566 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-10 14:58:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5ff0d32580 
					 
					
						
						
							
							[V1] LoRA - Add triton kernels for V1 ( #13096 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-10 17:27:53 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0967110e42 
					 
					
						
						
							
							[Minor] Update the tqdm bar for parallel sampling ( #14571 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-10 14:23:48 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb0acb6c72 
					 
					
						
						
							
							[Perf] Improve MLA on V1 ( #14540 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-10 12:06:58 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						92b0ce2ac7 
					 
					
						
						
							
							[Bugfix][v1] fixed llava-hf/llava-1.5-7b-hf is broken on V1 ( #14554 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-10 18:24:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc2d4473bf 
					 
					
						
						
							
							[Docs] Make installation URLs nicer ( #14556 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-10 10:43:08 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b352a2f92 
					 
					
						
						
							
							Correct capitalisation: VLLM -> vLLM ( #14562 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-10 16:36:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dea985aef0 
					 
					
						
						
							
							[V1][Bugfix] Fix handing of second_per_grid_ts for Qwen2-VL & Qwen2.5-VL ( #14548 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-10 16:03:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						39be30351f 
					 
					
						
						
							
							Correct capitalisation: Github -> GitHub ( #14561 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-10 15:53:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						001a9c7b0d 
					 
					
						
						
							
							[Doc] Update PaliGemma note to a warning ( #14565 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-10 15:02:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						89cdaa83e7 
					 
					
						
						
							
							[Kernel] Add more dtype support for GGUF kernels ( #14043 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com >
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com > 
						
						
					 
					
						2025-03-10 07:30:04 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b0746fae3d 
					 
					
						
						
							
							[Frontend] support image embeds ( #13955 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-03-10 12:36:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						60a98b2de5 
					 
					
						
						
							
							[Docs] Mention model_impl arg when explaining Transformers fallback ( #14552 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-10 12:13:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						460f553a6d 
					 
					
						
						
							
							[Misc] Add log information for handle_process_request. ( #14130 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-03-10 08:40:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1253b15774 
					 
					
						
						
							
							[Feature] Consolidate performance benchmark datasets ( #14036 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-10 07:23:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc74613fa2 
					 
					
						
						
							
							[Bugfix] Wrong requirements path - rocm ( #14527 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Martin Hoyer <mhoyer@redhat.com > 
						
						
					 
					
						2025-03-10 02:49:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a21076ed3a 
					 
					
						
						
							
							[Misc] Ensure out-of-tree quantization method recognize by cli args ( #14328 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: liuyanyi <wolfsonliu@163.com > 
						
						
					 
					
						2025-03-09 12:13:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						212007b168 
					 
					
						
						
							
							[Hardware][TPU] Fix the recompiling issue in logits processor after warmup ( #14510 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-03-09 05:44:39 -04:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fb16eea48b 
					 
					
						
						
							
							[Bugfix] Revert QKVCrossParallelLinear usage in Mllama to keep BNB quantization work ( #14498 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-09 04:47:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73ae0b44e9 
					 
					
						
						
							
							[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 ( #12428 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuchen Yan <740987012@qq.com > 
						
						
					 
					
						2025-03-08 20:14:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6d7f037748 
					 
					
						
						
							
							[Feat] Support chunked prefill for LMCache connector ( #14505 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn > 
						
						
					 
					
						2025-03-08 19:30:06 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10f7552789 
					 
					
						
						
							
							[V1][TPU] Remove unnecessary padding for running on TPU. ( #14467 )  
						
						 
						
						
						
						
					 
					
						2025-03-08 21:56:04 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b0d541947a 
					 
					
						
						
							
							[Attention] Default to FlashMLA backend for MLA ( #14451 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-08 18:18:39 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5f0b53c6ea 
					 
					
						
						
							
							Revert "[V1][Core] Fix memory issue with logits & sampling" ( #14504 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-08 17:43:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb8b5eb183 
					 
					
						
						
							
							[V1] Support bad_words in sampler ( #13376 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-08 14:50:26 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9513290032 
					 
					
						
						
							
							[Misc] Upgrade to Python 3.9 typing for additional directories ( #14492 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-08 17:35:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d5e73d30e 
					 
					
						
						
							
							Update CODEOWNERS for structured output ( #14496 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-08 17:19:51 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						609ef61fea 
					 
					
						
						
							
							[Bugfix] Fix profiling OOM and decouple encoder multimodal profiling ( #14361 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-08 16:52:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						db84f5eb3b 
					 
					
						
						
							
							[Bugfix] DeepSeek Accuracy ( #14476 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-03-08 16:47:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						206e2577fa 
					 
					
						
						
							
							Move requirements into their own directory ( #12547 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-08 16:44:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e02883c400 
					 
					
						
						
							
							[Misc] Don't run ruff at all on 3rd party libs ( #14493 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-08 07:16:40 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9085aabd62 
					 
					
						
						
							
							[benchmarks] Add option to use unique jsonschema for each request ( #14457 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-08 06:36:39 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d5aa466fb 
					 
					
						
						
							
							[V1][Core] Fix memory issue with logits & sampling ( #13776 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-08 06:11:04 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0b7f06b447 
					 
					
						
						
							
							[Misc] add use_tqdm_on_load to reduce logs ( #14407 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz > 
						
						
					 
					
						2025-03-08 05:57:46 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						03fe18ae0f 
					 
					
						
						
							
							[VLM] Add TP support for Phi-4-MM ( #14453 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-08 05:57:14 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb8bdfade2 
					 
					
						
						
							
							[V1] TPU - Add tensor parallel support via Ray ( #13618 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alexander Matveev <amatveev@redhat.com > 
						
						
					 
					
						2025-03-08 08:19:38 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						33f227e16b 
					 
					
						
						
							
							[CI/Build] Use a fixed seed to avoid flaky tests ( #14480 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-08 11:30:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cfd0ae8234 
					 
					
						
						
							
							Add RLHF document ( #14482 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-08 09:51:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7caff01a7b 
					 
					
						
						
							
							[Build/BugFix] Fix hopper 12.8 build ( #14354 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-08 08:11:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						be0b399d74 
					 
					
						
						
							
							Add training doc signposting to TRL ( #14439 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-08 07:35:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b8b0ccbd2d 
					 
					
						
						
							
							[Bugfix] Make the deviceprofiler include LoRA memory. ( #14469 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-08 07:12:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c908a07f57 
					 
					
						
						
							
							[Doc] Added QwQ-32B to the supported models list in the reasoning out… ( #14479 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-03-08 07:07:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b6fd6e486 
					 
					
						
						
							
							[Doc]add doc for Qwen models tool calling ( #14478 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-03-08 06:58:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47512b3200 
					 
					
						
						
							
							Default to generation_config from model ( #12622 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-08 14:46:15 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b9c6c6947 
					 
					
						
						
							
							[CI/Build] refactor: set timezone of container to UTC ( #12888 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Meier <r.meier@siemens.com > 
						
						
					 
					
						2025-03-07 22:42:01 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4aae667668 
					 
					
						
						
							
							[core] add extra_args to SamplingParams ( #13300 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com > 
						
						
					 
					
						2025-03-08 14:41:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f3bc0f58c 
					 
					
						
						
							
							[MISC][V1] Register process killing handler only in the main thread ( #14380 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-07 22:40:06 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						980385f8c1 
					 
					
						
						
							
							[Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache ( #14369 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mathis Felardos <mathis@mistral.ai > 
						
						
					 
					
						2025-03-07 22:39:31 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ca7a2d5f28 
					 
					
						
						
							
							Revert "[Perf] Reduce MLA CPU overheads in V1 ( #14384 )" ( #14471 )  
						
						 
						
						
						
						
					 
					
						2025-03-07 22:18:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						333681408f 
					 
					
						
						
							
							[Bugfix][V1] Handle MLA in kv_cache_interface ( #14462 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-07 22:18:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ef64044079 
					 
					
						
						
							
							[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC ( #13949 )  
						
						 
						
						
						
						
					 
					
						2025-03-08 01:48:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66e16a038e 
					 
					
						
						
							
							[Bugfix] Fix torch_xla which can't handle None seed introduced in  #14274  ( #14459 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yarong Mu <ymu@google.com > 
						
						
					 
					
						2025-03-07 23:17:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1f0835ae0 
					 
					
						
						
							
							[V1][Metrics] Fix traceback with preemptions+LoRA ( #14220 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-07 15:36:16 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ed5421aaa 
					 
					
						
						
							
							[V1] Eagerly remove finished requests from the batch ( #14388 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-07 10:56:00 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6359e8ca6 
					 
					
						
						
							
							[v1] torch.compile integration explanation ( #14437 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-08 01:55:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						952a074980 
					 
					
						
						
							
							[Misc] Add Phi4-MM example ( #14343 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-07 17:28:52 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d0feea31c7 
					 
					
						
						
							
							[Kernel] optimize performance of gptq marlin kernel when n is small ( #14138 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-03-07 11:53:38 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58abe35455 
					 
					
						
						
							
							[Benchmarks] Make detokenization optional in benchmark scripts ( #11697 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com > 
						
						
					 
					
						2025-03-07 08:09:00 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f7ebad2307 
					 
					
						
						
							
							[Doc] Update prefix_caching.md to match the example image ( #14420 )  
						
						 
						
						
						
						
					 
					
						2025-03-07 15:29:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						80e9afb5bc 
					 
					
						
						
							
							[V1][Core] Support for Structured Outputs ( #12388 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-07 07:19:11 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e3598edeb 
					 
					
						
						
							
							Use the optimized block sizes after tuning the kernel. ( #14329 )  
						
						 
						
						
						
						
					 
					
						2025-03-07 13:25:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f7a6bd0fa1 
					 
					
						
						
							
							Fix missing kv_caches and attn_metadata in OpenVINOCausalLM ( #14271 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-07 12:30:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ca3b8e01c 
					 
					
						
						
							
							[BUGFIX] Skip tokenization support for throughput benchmark ( #12712 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com > 
						
						
					 
					
						2025-03-07 02:51:47 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc10281498 
					 
					
						
						
							
							[Misc] Set default value of seed to None ( #14274 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com > 
						
						
					 
					
						2025-03-07 10:40:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						05fb6718f0 
					 
					
						
						
							
							[Bugfix] Clean up multi-modal processors ( #14417 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-07 10:33:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12c29a881f 
					 
					
						
						
							
							[Bugfix] Further clean up LoRA test ( #14422 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-07 10:30:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						70da0c0748 
					 
					
						
						
							
							correct wrong markdown syntax ( #14414 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vincent-pli <justdoit.pli@gmail.com > 
						
						
					 
					
						2025-03-07 08:01:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c1588a2c94 
					 
					
						
						
							
							[GH] Auto-apply multi-modality label to relevant PRs ( #14402 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-07 15:26:32 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ca7a71df7 
					 
					
						
						
							
							OpenVINO: added CPU-like conditions ( #14338 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com > 
						
						
					 
					
						2025-03-06 22:24:49 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						63137cd922 
					 
					
						
						
							
							[Build] Add nightly wheel fallback when latest commit wheel unavailable ( #14358 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-06 22:10:57 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ddd1ef66ec 
					 
					
						
						
							
							[Bugfix] Fix JambaForCausalLM LoRA  ( #14370 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-06 22:05:47 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e5e03c2c1b 
					 
					
						
						
							
							[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs ( #14396 )  
						
						 
						
						
						
						
					 
					
						2025-03-06 21:56:06 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1744502c2 
					 
					
						
						
							
							[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object ( #14390 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-03-07 05:20:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dae6896977 
					 
					
						
						
							
							[Perf] Reduce MLA CPU overheads in V1 ( #14384 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-03-06 19:59:14 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c34eeec58d 
					 
					
						
						
							
							[Bugfix] Correctly call cudaProfilerStop in benchmarks script ( #14183 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-03-07 00:42:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad60bbb2b2 
					 
					
						
						
							
							[Doc] Fix a typo ( #14385 )  
						
						 
						
						
						
						
					 
					
						2025-03-06 16:31:52 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0578e5a462 
					 
					
						
						
							
							[Hardware][TPU]Enable ragged paged attention kernel and resolve recompilation issue ( #14310 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chengji Yao <chengjiyao@google.com > 
						
						
					 
					
						2025-03-06 23:31:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04222984f8 
					 
					
						
						
							
							[Docs] Add nsight guide to profiling docs ( #14298 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-06 14:19:58 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6832707e90 
					 
					
						
						
							
							[V1][Bugfix] Standardize quantized kv cache rejection for attention backends ( #14221 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-06 14:18:29 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6b2ef5cd17 
					 
					
						
						
							
							[Bug] Fix Attention when ignored in by quant_method ( #14313 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-06 14:18:06 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						958adce478 
					 
					
						
						
							
							[Bugfix] Fix use_direct_call condition in FusedMoE layer for  ( #14382 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-06 14:17:21 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						99b0915d3b 
					 
					
						
						
							
							[Kernel] Add needs_fixed_stride_order tag to most GEMMs ( #14306 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-06 14:17:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8ca2b21c98 
					 
					
						
						
							
							[CI] Disable spawn when running V1 Test ( #14345 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-03-06 21:52:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d9292786e1 
					 
					
						
						
							
							[CI/Build] Use uv python for docker rather than ppa:deadsnakes/ppa ( #13569 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-06 16:08:36 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc2f9b32c8 
					 
					
						
						
							
							[Distributed] Add enable_expert_parallel arg ( #14305 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-06 18:54:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd579352bf 
					 
					
						
						
							
							[V1] Do not detokenize if sampling param detokenize is False ( #14224 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Himanshu Jaju <hj@mistral.ai >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-06 10:40:24 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f1710f1ac 
					 
					
						
						
							
							Fix mla prefill context performance ( #13897 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: ZhongYingMatrix <zhongyingmatrix@gmail.com > 
						
						
					 
					
						2025-03-06 09:35:49 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e642ec962c 
					 
					
						
						
							
							Add authors to license header. ( #14371 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com > 
						
						
					 
					
						2025-03-06 08:43:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ada19210a3 
					 
					
						
						
							
							Adding cpu inference with VXE ISA for s390x architecture ( #12613 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com >
Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com > 
						
						
					 
					
						2025-03-06 08:40:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf0560bda9 
					 
					
						
						
							
							Reinstate best_of for V0 ( #14356 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-06 08:34:22 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						151b08e0fe 
					 
					
						
						
							
							[RLHF] use worker_extension_cls for compatibility with V0 and V1 ( #14185 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-07 00:32:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						81b2f4a45f 
					 
					
						
						
							
							[Doc] Fix date typo in README.md ( #14366 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl > 
						
						
					 
					
						2025-03-06 08:29:57 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82551ad616 
					 
					
						
						
							
							[Core] Don't use cache during multi-modal profiling ( #14336 )  
						
						 
						
						
						
						
					 
					
						2025-03-06 08:03:31 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						caac5c2e59 
					 
					
						
						
							
							[Bugfix][Core] fix abort_seq_group and memory leak when n>1 ( #14326 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: courage17340 <courage17340@163.com > 
						
						
					 
					
						2025-03-06 23:59:32 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6bd1dd9d26 
					 
					
						
						
							
							[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend  ( #14152 )  
						
						 
						
						
						
						
					 
					
						2025-03-06 07:39:16 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f27044aab 
					 
					
						
						
							
							[Doc] Correct beam_search using in generative_models.md ( #14363 )  
						
						 
						
						
						
						
					 
					
						2025-03-06 15:37:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ddc991f5c 
					 
					
						
						
							
							[Doc] Update reasoning with stream example to use OpenAI library ( #14077 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: liuyanyi <wolfsonliu@163.com > 
						
						
					 
					
						2025-03-06 13:20:37 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa82b93853 
					 
					
						
						
							
							[Frontend][Docs] Transcription API streaming ( #13301 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-03-06 10:39:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69ff99fdcd 
					 
					
						
						
							
							[Core] Optimizing cross-attention QKVParallelLinear computation ( #12325 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal> 
						
						
					 
					
						2025-03-06 09:37:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5d802522a7 
					 
					
						
						
							
							[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 ( #14275 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Linkun Chen <github@lkchen.net > 
						
						
					 
					
						2025-03-06 08:58:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1769928079 
					 
					
						
						
							
							[Model] Update Paligemma multimodal processing with PromptUpdate  ( #14015 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Huang <kylhuang@nvidia.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-06 08:31:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed6ea06577 
					 
					
						
						
							
							[Hardware] Update the flash attn tag to support Blackwell ( #14244 )  
						
						 
						
						
						
						
					 
					
						2025-03-05 22:01:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5ee10e990d 
					 
					
						
						
							
							[Bugfix][CI] ALiBi test case in xformers multi_query_kv_attention ( #11301 )  
						
						 
						
						
						
						
					 
					
						2025-03-05 20:00:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3dbd2d813a 
					 
					
						
						
							
							[V1] LoRA - Enable more V1 tests ( #14315 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-03-06 11:55:42 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f5f7f00cd9 
					 
					
						
						
							
							[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 ( #14114 )  
						
						 
						
						
						
						
					 
					
						2025-03-06 03:49:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						abcc61e0af 
					 
					
						
						
							
							[misc] Mention ray list nodes command to troubleshoot ray issues ( #14318 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-03-06 02:00:36 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f6bb18fd9a 
					 
					
						
						
							
							[BugFix] MLA + V1, illegal memory access and accuracy issues ( #14253 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-03-05 17:10:13 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71eaf8969b 
					 
					
						
						
							
							[Build] Add UV_HTTP_TIMEOUT to avoid timeout during installation ( #13850 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-03-05 17:09:29 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ca100c90fe 
					 
					
						
						
							
							Add benchmark for DeepGEMM and vLLM Block FP8 Dense GEMM ( #13917 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-05 17:08:51 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ffad94397d 
					 
					
						
						
							
							[CI/Build] Use spawn multiprocessing mode for V1 test pipeline ( #14243 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-03-05 17:08:02 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4dacaa4a83 
					 
					
						
						
							
							[BugFix] Fix prefix caching V0 MLA ( #14255 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: Ying Zhong <zhongyingmatrix@gmail.com > 
						
						
					 
					
						2025-03-05 17:07:42 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a7ea35aa67 
					 
					
						
						
							
							[Bugfix] Remove num_tokens_across_dp ( #14302 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-05 23:55:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e3e76b6cc 
					 
					
						
						
							
							[Bugfix] Fix DeepSeek MTP crash when using TP1ModelRunner with CUDA graph due to shape mismatch ( #14237 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: pyc96 <pychen96@gmail.com > 
						
						
					 
					
						2025-03-05 22:22:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						53ea6ad830 
					 
					
						
						
							
							[V1][Easy] Add empty allowed_token_ids in the v1 sampler test ( #14308 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-03-05 21:41:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1b7624bf5c 
					 
					
						
						
							
							[misc] Add FlashMLA as a new option of VLLM_ATTENTION_BACKEND env ( #14267 )  
						
						 
						
						
						
						
					 
					
						2025-03-05 21:28:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac60dc7fe1 
					 
					
						
						
							
							[V1][BugFix] Fix for mixed top_k batch ( #14301 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Ye Cao <caoye.cao@alibaba-inc.com > 
						
						
					 
					
						2025-03-05 20:43:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4f1ee35d6 
					 
					
						
						
							
							Deprecate best_of Sampling Parameter in anticipation for vLLM V1 ( #13997 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com >
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-05 20:22:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a32c8669ca 
					 
					
						
						
							
							[V1][Minor] Remove obsolete FIXME comment ( #14304 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-05 11:59:23 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ca2ca8de57 
					 
					
						
						
							
							[Docs] Add Meta Slides ( #14297 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-03-05 08:30:23 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f71b00a19e 
					 
					
						
						
							
							[Bugfix] Fix broken vision language example ( #14292 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-05 15:57:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8f808cf86e 
					 
					
						
						
							
							prefix_caching.md: Fixed typo ( #14293 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daivid Savernin-Frenk <daivid.frank@TurboNext.ai > 
						
						
					 
					
						2025-03-05 15:43:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7bab4bb048 
					 
					
						
						
							
							[Misc] Add Qwen2MoeForCausalLM moe tuning support  ( #14276 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-05 23:11:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e17e4488bd 
					 
					
						
						
							
							[LoRA] Remove linear hack outside transformers backend ( #14177 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-03-05 15:06:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						257e200a25 
					 
					
						
						
							
							[V1][Frontend] Add Testing For V1 Runtime Parameters ( #14159 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com > 
						
						
					 
					
						2025-03-05 14:18:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						47d4a7e004 
					 
					
						
						
							
							Small update for external_launcher backend docs ( #14288 )  
						
						 
						
						
						
						
					 
					
						2025-03-05 21:30:00 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f89a594dd 
					 
					
						
						
							
							[Doc] [3/N] Refer code examples for common cases in dev multimodal processor ( #14278 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-03-05 12:29:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						961644e6a8 
					 
					
						
						
							
							[Doc] Update nginx guide: remove privileged from vllm container run and add target GPU ID ( #14217 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Iacopo Poli <iacopo@lighton.ai > 
						
						
					 
					
						2025-03-05 11:44:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8d6cd32b7b 
					 
					
						
						
							
							[Bugfix][V1] Fix allowed_token_ids for v1 Sampler ( #14169 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-03-05 08:49:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec79b67c77 
					 
					
						
						
							
							[Misc][V1] Avoid using envs.VLLM_USE_V1 in mm processing ( #14256 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-03-05 07:37:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32985bed7c 
					 
					
						
						
							
							[Frontend] Allow return_tokens_as_token_ids to be passed as a request param ( #14066 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-03-05 06:30:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dae9ec464c 
					 
					
						
						
							
							Temporarily disable test_awq_gemm_opcheck ( #14251 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-05 06:10:35 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6eaf93020d 
					 
					
						
						
							
							[platforms] improve rocm debugging info ( #14257 )  
						
						 
						
						
						
						
					 
					
						2025-03-04 21:32:18 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72c62eae5f 
					 
					
						
						
							
							[V1] EP/TP MoE + DP Attention ( #13931 )  
						
						 
						
						
						
						
					 
					
						2025-03-04 21:27:26 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0a995d5434 
					 
					
						
						
							
							[Model] New model support for Phi-4-multimodal-instruct ( #14119 )  
						
						 
						
						
						
						
					 
					
						2025-03-04 20:57:01 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ade3f7d988 
					 
					
						
						
							
							[V1][Bugfix] Do not reset prefix caching metrics ( #14235 )  
						
						 
						
						
						
						
					 
					
						2025-03-05 04:39:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0df25101d6 
					 
					
						
						
							
							[Bugfix] Fix gptq_marlin for deepseek-v3 ( #13750 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dangshunya <dangshunya@baichuan-inc.com >
Co-authored-by: dangshunya <dangshunya@baichuan-inc.com > 
						
						
					 
					
						2025-03-05 12:25:53 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e123aafdf0 
					 
					
						
						
							
							Disable GPTQ AllSpark kernels for CUDA Compiler < 12.0 ( #14157 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-05 12:25:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b143d33be 
					 
					
						
						
							
							Moved numba from common requirements to cuda/rocm specific requirements ( #14199 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com > 
						
						
					 
					
						2025-03-05 12:25:00 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb59b5a6cb 
					 
					
						
						
							
							[misc] announce china meetup ( #14248 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-05 10:33:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fbfc3ee37e 
					 
					
						
						
							
							[V1][TPU] TPU multimodal model support for ragged attention ( #14158 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-04 19:58:48 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e1d223626 
					 
					
						
						
							
							[ROCm] Disable a few more kernel tests that are broken on ROCm ( #14145 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-03-04 23:37:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4f5b059f14 
					 
					
						
						
							
							Clean up unused padding_idx variables across many model definitions ( #13240 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-03-04 21:27:00 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						288ca110f6 
					 
					
						
						
							
							[Security] Serialize using safetensors instead of pickle in Mooncake Pipe ( #14228 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu > 
						
						
					 
					
						2025-03-04 21:10:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c2bd2196fc 
					 
					
						
						
							
							[v1][Metrics] Add design doc ( #12745 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-04 20:36:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						550c7ba3dc 
					 
					
						
						
							
							[Docs] Update Dockerfile dependency image ( #14215 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-04 20:22:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e5b2f1601a 
					 
					
						
						
							
							[Frontend] Do prompt_logprobs clamping for chat as well as completions ( #14225 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-04 20:13:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9badee53de 
					 
					
						
						
							
							Fix performance when --generation-config is not None ( #14223 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-04 20:59:22 +01:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						beebf4742a 
					 
					
						
						
							
							[TPU][Profiler] Support start_profile/stop_profile in TPU worker ( #13988 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-04 14:40:06 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f89978ad7c 
					 
					
						
						
							
							add cutlass support for blackwell fp8 gemm ( #13798 )  
						
						 
						
						
						
						
					 
					
						2025-03-04 07:55:07 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3cf368d79 
					 
					
						
						
							
							[V1][Molmo] Fix get_multimodal_embeddings() in molmo.py ( #14161 )  
						
						 
						
						
						
						
					 
					
						2025-03-04 15:43:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c8525f06fc 
					 
					
						
						
							
							[V0][Metrics] Deprecate some questionable request time metrics ( #14135 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-04 15:11:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5db6b2c961 
					 
					
						
						
							
							[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs ( #13869 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-04 15:06:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6247bae6c6 
					 
					
						
						
							
							[Bugfix] Restrict MacOS CPU detection ( #14210 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-04 22:25:27 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3610fb4930 
					 
					
						
						
							
							[doc] add "Failed to infer device type" to faq ( #14200 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-04 20:47:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71c4b40562 
					 
					
						
						
							
							[sleep mode] error out with expandable_segments ( #14189 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-04 18:54:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac65bc92df 
					 
					
						
						
							
							[platform] add debug logging during inferring the device type ( #14195 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-03-04 18:39:16 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f78c0be80a 
					 
					
						
						
							
							Fix benchmark_moe.py tuning for CUDA devices ( #14164 )  
						
						 
						
						
						
						
					 
					
						2025-03-03 21:11:03 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						66233af7b6 
					 
					
						
						
							
							Use math.prod instead of np.prod for trivial ops ( #14142 )  
						
						 
						
						
						
						
					 
					
						2025-03-03 21:09:22 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf13d40972 
					 
					
						
						
							
							[core] Pass all driver env vars to ray workers unless excluded ( #14099 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-03-04 11:44:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						989f4f430c 
					 
					
						
						
							
							[Misc] Remove lru_cache in NvmlCudaPlatform ( #14156 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-04 11:09:34 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb5b640359 
					 
					
						
						
							
							[core] moe fp8 block quant tuning support ( #14068 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com > 
						
						
					 
					
						2025-03-04 01:30:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c060b71408 
					 
					
						
						
							
							[Model] Add support for GraniteMoeShared models ( #13313 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-03-04 08:04:52 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						79e4937c65 
					 
					
						
						
							
							[v1] Add comments to the new ragged paged attention Pallas kernel ( #14155 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-03 23:00:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd1d3c3df8 
					 
					
						
						
							
							[Docs] Add GPTQModel ( #14056 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-03 21:59:09 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						19d98e0c7d 
					 
					
						
						
							
							[Kernel] Optimize moe intermediate_cache usage ( #13625 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-03 16:29:53 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b04c209ee 
					 
					
						
						
							
							[Bugfix] Allow shared_experts skip quantization for DeepSeekV2/V3 ( #14100 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-03-03 14:20:24 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ae122b1cbd 
					 
					
						
						
							
							[WIP][[V1][Metrics] Implement max_num_generation_tokens,  request_params_n, and request_params_max_tokens metrics ( #14055 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-03 19:04:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						872db2be0e 
					 
					
						
						
							
							[V1] Simplify stats logging ( #14082 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-03-03 10:34:14 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2dfdfed8a0 
					 
					
						
						
							
							[V0][Metrics] Deprecate some KV/prefix cache metrics ( #14136 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-03 18:25:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c41d27156b 
					 
					
						
						
							
							[V0][Metrics] Remove unimplemented vllm:tokens_total ( #14134 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-03 17:50:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						91373a0d15 
					 
					
						
						
							
							Fix head_dim not existing in all model configs (Transformers backend) ( #14141 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-03 17:48:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						848a6438ae 
					 
					
						
						
							
							[ROCm] Faster Custom Paged Attention kernels ( #12348 )  
						
						 
						
						
						
						
					 
					
						2025-03-03 09:24:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98175b2816 
					 
					
						
						
							
							Improve the docs for TransformersModel ( #14147 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-03-03 17:03:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4167252eaf 
					 
					
						
						
							
							[V1] Refactor parallel sampling support ( #13774 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-03-03 08:15:27 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f35f8e2242 
					 
					
						
						
							
							[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 ( #13921 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-03-03 16:43:14 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b87c21fc89 
					 
					
						
						
							
							[Misc][Platform] Move use allgather to platform ( #14010 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-03-03 15:40:04 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e584b85afd 
					 
					
						
						
							
							[Misc] duplicate code in deepseek_v2  ( #14106 )  
						
						 
						
						
						
						
					 
					
						2025-03-03 14:10:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						09e56f9262 
					 
					
						
						
							
							[Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure ( #14051 )  
						
						 
						
						
						
						
					 
					
						2025-03-02 17:35:01 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf069aa8aa 
					 
					
						
						
							
							Update deprecated Python 3.8 typing ( #13971 )  
						
						 
						
						
						
						
					 
					
						2025-03-02 17:34:51 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf33700ecd 
					 
					
						
						
							
							[v0][structured output] Support reasoning output ( #12955 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai > 
						
						
					 
					
						2025-03-02 14:49:42 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc6ccb9878 
					 
					
						
						
							
							[Doc] Source building add clone step ( #14086 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qux-bbb <1147635419@qq.com > 
						
						
					 
					
						2025-03-02 10:59:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82fbeae92b 
					 
					
						
						
							
							[Misc] Accurately capture the time of loading weights ( #14063 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jun Duan <jun.duan.phd@outlook.com > 
						
						
					 
					
						2025-03-01 17:20:30 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc5e8f6db8 
					 
					
						
						
							
							[Model] Add LoRA support for TransformersModel ( #13770 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-02 09:17:34 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d54990da47 
					 
					
						
						
							
							[v1] Add __repr__ to KVCacheBlock to avoid recursive print ( #14081 )  
						
						 
						
						
						
						
					 
					
						2025-03-01 20:46:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9f1d4294e 
					 
					
						
						
							
							[v1][Bugfix] Only cache blocks that are not in the prefix cache ( #14073 )  
						
						 
						
						
						
						
					 
					
						2025-03-01 08:25:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b28246f6ff 
					 
					
						
						
							
							[ROCm][V1][Bugfix] Add get_builder_cls method to the ROCmAttentionBackend class ( #14065 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-03-01 07:18:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b5567a209 
					 
					
						
						
							
							[V1][Minor] Do not print attn backend twice ( #13985 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-03-01 07:09:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdcc405346 
					 
					
						
						
							
							[Doc] Consolidate whisper and florence2 examples ( #14050 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 22:49:15 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8994dabc22 
					 
					
						
						
							
							[Documentation] Add more deployment guide for Kubernetes deployment ( #13841 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu > 
						
						
					 
					
						2025-03-01 06:44:24 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						02296f420d 
					 
					
						
						
							
							[Bugfix][V1][Minor] Fix shutting_down flag checking in V1 MultiprocExecutor ( #14053 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 22:31:01 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a92ff93e1 
					 
					
						
						
							
							[Misc][Kernel]: Add GPTQAllSpark Quantization ( #12931 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 22:30:59 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a84164add 
					 
					
						
						
							
							[Bugfix] Add file lock for ModelScope download ( #14060 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-03-01 06:10:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f64ffa8c25 
					 
					
						
						
							
							[Docs] Add pipeline_parallel_size to optimization docs ( #14059 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-03-01 05:43:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bd56c983d6 
					 
					
						
						
							
							[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass ( #10902 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: luka <luka@neuralmagic.com > 
						
						
					 
					
						2025-02-28 16:20:11 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						084bbac8cc 
					 
					
						
						
							
							[core] Bump ray to 2.43 ( #13994 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-02-28 21:47:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						28943d36ce 
					 
					
						
						
							
							[v1] Move block pool operations to a separate class ( #13973 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-02-28 20:53:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b526ca6726 
					 
					
						
						
							
							Add RELEASE.md ( #13926 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: atalman <atalman@fb.com > 
						
						
					 
					
						2025-02-28 12:25:50 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7bd944e08 
					 
					
						
						
							
							[v1] Cleanup the BlockTable in InputBatch ( #13977 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen Zhang <zhangch99@outlook.com > 
						
						
					 
					
						2025-02-28 19:03:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c3b6559a10 
					 
					
						
						
							
							[V1][TPU] Integrate the new ragged paged attention kernel with vLLM v1 on TPU ( #13379 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-28 11:01:36 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4be4b26cb7 
					 
					
						
						
							
							Fix entrypoint tests for embedding models ( #14052 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 08:56:44 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2aed2c9fa7 
					 
					
						
						
							
							[Doc] Fix ROCm documentation ( #14041 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-02-28 16:42:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9b61dd41e7 
					 
					
						
						
							
							[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series ( #14031 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 07:36:08 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f7bee5c815 
					 
					
						
						
							
							[VLM][Bugfix] Enable specifying prompt target via index ( #14038 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 07:35:55 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e0734387fb 
					 
					
						
						
							
							[Bugfix] Fix MoeWNA16Method activation ( #14024 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 15:22:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f58f8b5c96 
					 
					
						
						
							
							Update AutoAWQ docs ( #14042 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-28 15:20:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3f7aaccd0 
					 
					
						
						
							
							[V1][Minor] Restore V1 compatibility with LLMEngine class ( #13090 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 00:52:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b91660ddb8 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] Regional compilation support ( #13213 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 00:51:49 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						76c89fcadd 
					 
					
						
						
							
							Use smaller embedding model when not testing model specifically ( #13891 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 00:50:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9e41734c5 
					 
					
						
						
							
							[Bugfix][Disaggregated] patch the inflight batching on the decode node in SimpleConnector to avoid hangs in SimpleBuffer (nccl based) ( #13987 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mathis Felardos <mathis@mistral.ai > 
						
						
					 
					
						2025-02-28 07:53:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1088f06242 
					 
					
						
						
							
							[Doc] Move multimodal Embedding API example to Online Serving page ( #14017 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-28 07:12:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73e0225ee9 
					 
					
						
						
							
							[Bugfix] Check that number of images matches number of <|image|> tokens with mllama ( #13911 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com > 
						
						
					 
					
						2025-02-28 04:00:45 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c85da3a18 
					 
					
						
						
							
							[V1]SupportsV0Only protocol for model definitions ( #13959 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-02-27 20:02:15 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67fc426845 
					 
					
						
						
							
							[Misc] Print FusedMoE detail info ( #13974 )  
						
						 
						
						
						
						
					 
					
						2025-02-27 18:53:13 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9804145cac 
					 
					
						
						
							
							[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict ( #13626 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai > 
						
						
					 
					
						2025-02-27 15:28:08 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e94b9cfbb 
					 
					
						
						
							
							[Attention] Flash MLA for V1 ( #13867 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang Chen <yangche@fb.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Yang Chen <yangche@fb.com > 
						
						
					 
					
						2025-02-27 23:03:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8294773e48 
					 
					
						
						
							
							[core] Perf improvement for DSv3 on AMD GPUs ( #13718 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: qli88 <qiang.li2@amd.com > 
						
						
					 
					
						2025-02-27 22:14:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd813c6d4d 
					 
					
						
						
							
							[V1][Minor] Minor cleanup for GPU Model Runner ( #13983 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-27 13:11:40 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						38acae6e97 
					 
					
						
						
							
							[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups ( #13970 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-02-27 20:31:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a2dd48c386 
					 
					
						
						
							
							[VLM] Deprecate legacy input mapper for OOT multimodal models ( #13979 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-27 19:14:55 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						126f6beeb4 
					 
					
						
						
							
							Bump azure/setup-helm from 4.2.0 to 4.3.0 ( #13742 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> 
						
						
					 
					
						2025-02-27 19:04:10 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58d1b2aa77 
					 
					
						
						
							
							[Attention] MLA support for V1 ( #13789 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang Chen <yangche@fb.com > 
						
						
					 
					
						2025-02-27 13:14:17 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f1579b229d 
					 
					
						
						
							
							[VLM] Generalized prompt updates for multi-modal processor ( #13964 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-27 17:44:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7864875879 
					 
					
						
						
							
							[Bugfix] Fix qwen2.5-vl overflow issue ( #13968 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-02-27 17:30:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1dd422b64a 
					 
					
						
						
							
							Update LMFE version to v0.10.11 to support new versions of transforme… ( #13930 )  
						
						 
						
						
						
						
					 
					
						2025-02-27 17:16:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						06c8f8d885 
					 
					
						
						
							
							[bugfix] Fix profiling for RayDistributedExecutor ( #13945 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-02-28 01:01:21 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5677c9bb3e 
					 
					
						
						
							
							Deduplicate .pre-commit-config.yaml's exclude ( #13967 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-27 16:27:47 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						512d77d582 
					 
					
						
						
							
							Update quickstart.md ( #13958 )  
						
						 
						
						
						
						
					 
					
						2025-02-27 16:05:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f0be2aa24 
					 
					
						
						
							
							[Model] Deepseek GGUF support  ( #13167 )  
						
						 
						
						
						
						
					 
					
						2025-02-27 02:08:35 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						edf309ebbe 
					 
					
						
						
							
							[VLM] Support multimodal inputs for Florence-2 models ( #13320 )  
						
						 
						
						
						
						
					 
					
						2025-02-27 02:06:41 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						788f284b53 
					 
					
						
						
							
							Fix test_block_fp8.py test for MoE ( #13915 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-27 18:00:00 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4b1d141f49 
					 
					
						
						
							
							[PP] Correct cache size check ( #13873 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yang Zheng <zhengy.gator@gmail.com > 
						
						
					 
					
						2025-02-27 17:47:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						10c3b8c1cf 
					 
					
						
						
							
							[Misc] fixed 'required' is an invalid argument for positionals ( #13948 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com > 
						
						
					 
					
						2025-02-27 09:06:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a7f37314b7 
					 
					
						
						
							
							[CI/Build] Add examples/ directory to be labelled by mergify ( #13944 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-02-27 08:24:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd711c48b2 
					 
					
						
						
							
							[V1][Metrics] Handle preemptions ( #13169 )  
						
						 
						
						
						
						
					 
					
						2025-02-26 20:04:59 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						378b3ef6f8 
					 
					
						
						
							
							[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding ( #13922 )  
						
						 
						
						
						
						
					 
					
						2025-02-26 20:04:12 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9944acbf9 
					 
					
						
						
							
							[misc] Rename Ray ADAG to Compiled Graph ( #13928 )  
						
						 
						
						
						
						
					 
					
						2025-02-26 20:03:28 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ca377cf1b9 
					 
					
						
						
							
							Use CUDA 12.4 as default for release and nightly wheels ( #12098 )  
						
						 
						
						
						
						
					 
					
						2025-02-26 19:06:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a31614e386 
					 
					
						
						
							
							[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined ( #13851 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hollow Man <hollowman@opensuse.org > 
						
						
					 
					
						2025-02-27 10:39:10 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f95903909f 
					 
					
						
						
							
							[Kernel] FlashMLA integration ( #13747 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com > 
						
						
					 
					
						2025-02-27 10:35:08 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b382a7f28f 
					 
					
						
						
							
							[BugFix] Make FP8 Linear compatible with torch.compile ( #13918 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-26 13:48:55 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4cb6fa0a9c 
					 
					
						
						
							
							[Bugfix] Backend option to disable xgrammar any_whitespace ( #12744 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Wallas Santos <wallashss@ibm.com >
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-02-26 10:52:34 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d08b285adf 
					 
					
						
						
							
							[Misc] fixed qwen_vl_utils parameter error ( #13906 )  
						
						 
						
						
						
						
					 
					
						2025-02-26 08:31:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b27122acc2 
					 
					
						
						
							
							[TPU] use torch2.6 with whl package ( #13860 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chenyaaang <llccyy1212@gmail.com > 
						
						
					 
					
						2025-02-26 08:18:54 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						934bb99c71 
					 
					
						
						
							
							[Bugfix] Update expected token counts for Ultravox tests ( #13895 )  
						
						 
						
						
						
						
					 
					
						2025-02-26 04:56:50 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3f808cc044 
					 
					
						
						
							
							[Bugfix] Do not crash V0 engine on input errors ( #13101 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-02-26 19:07:29 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ec8a5e5386 
					 
					
						
						
							
							[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor ( #13736 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca > 
						
						
					 
					
						2025-02-26 19:06:47 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						215bf150a6 
					 
					
						
						
							
							[Bugfix] Handle None parameters in Mistral function calls. ( #13786 )  
						
						 
						
						
						
						
					 
					
						2025-02-26 03:06:21 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ecdd98031 
					 
					
						
						
							
							Add comments on accessing kv_cache and attn_metadata ( #13887 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-26 18:41:02 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b700ec8c8 
					 
					
						
						
							
							[Bugfix] Add test example for Ultravox v0.5 ( #13890 )  
						
						 
						
						
						
						
					 
					
						2025-02-26 02:31:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ca1da020f 
					 
					
						
						
							
							[Misc] Fix input processing for Ultravox ( #13871 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 23:56:34 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5157338ed9 
					 
					
						
						
							
							[Misc] Improve LoRA spelling ( #13831 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 23:43:01 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e206b54331 
					 
					
						
						
							
							[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine ( #13837 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com > 
						
						
					 
					
						2025-02-26 14:58:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1d35662e6d 
					 
					
						
						
							
							[ROCm] Disable chunked prefill/prefix caching when running MLA on non-cuda platforms ( #13844 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sage Moore <sage@neuralmagic.com > 
						
						
					 
					
						2025-02-26 14:56:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e656f638de 
					 
					
						
						
							
							[Doc] fix the incorrect module path of tensorize_vllm_model ( #13863 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 22:56:19 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						145944cb94 
					 
					
						
						
							
							Improve pipeline partitioning ( #13839 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 18:53:56 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						094b7d9496 
					 
					
						
						
							
							[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues ( #13797 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 18:52:03 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e1fe7591f2 
					 
					
						
						
							
							[Misc]Code Cleanup ( #13859 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: noemotiovon <noemotiovon@gmail.com >
Co-authored-by: noemotiovon <noemotiovon@gmail.com > 
						
						
					 
					
						2025-02-26 10:44:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5629f26df7 
					 
					
						
						
							
							[V1][Spec Decode] Change Spec Decode Rejection Sampling API ( #13729 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 18:14:48 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9ba28043b5 
					 
					
						
						
							
							[misc] Show driver IP info when Ray fails to allocate driver worker ( #13858 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Rui Qiao <ruisearch42@gmail.com > 
						
						
					 
					
						2025-02-26 09:53:43 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24679788ed 
					 
					
						
						
							
							DeepSeek V2/V3/R1 only place lm_head on last pp rank ( #13833 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-26 01:24:57 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						07c4353057 
					 
					
						
						
							
							[Model] Support Grok1 ( #13795 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-26 01:07:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34e3494e70 
					 
					
						
						
							
							Fix failing MyGemma2Embedding test ( #13820 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-25 12:33:03 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f75aa72732 
					 
					
						
						
							
							[Neuron] Add custom_ops for neuron backend ( #13246 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com >
Co-authored-by: George Novack <gnovack@amazon.com >
Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com > 
						
						
					 
					
						2025-02-25 11:47:49 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						340e39e387 
					 
					
						
						
							
							Fix string parsing error ( #13825 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 08:20:29 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4133ce4e5 
					 
					
						
						
							
							[Bugfix] Revert inspection code in  #13743  ( #13832 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-26 00:18:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6522d55b6f 
					 
					
						
						
							
							Fix /v1/audio/transcriptions  Bad Request Error ( #13811 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 06:03:33 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ff518626c 
					 
					
						
						
							
							[Bugfix] Fix deepseek-vl2 inference with more than 2 images ( #13818 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 06:03:02 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa82074167 
					 
					
						
						
							
							[Bugfix] Flush TunableOp results before worker processes are destroyed. ( #13623 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nichols A. Romero <nick.romero@amd.com > 
						
						
					 
					
						2025-02-25 11:08:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75e9d49796 
					 
					
						
						
							
							[Bugfix] Initialize attention bias on the same device as Query/Key/Value ( #13468 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 02:13:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						32c3b6bfd1 
					 
					
						
						
							
							[Misc]Clarify Error Handling for Non-existent Model Paths and HF Repo IDs ( #13724 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen-0210 <chenjincong11@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-25 10:12:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37b6cb4985 
					 
					
						
						
							
							[CI/Build]  Fix V1 LoRA failure ( #13767 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 02:01:15 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aabeb2688f 
					 
					
						
						
							
							[ROCm][Quantization][Kernel] Using HIP FP8 header ( #12593 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 00:39:59 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2f42a4888c 
					 
					
						
						
							
							[Feature] Support KV cache offloading and disagg prefill with LMCache connector. ( #12953 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 00:38:42 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3173c3b34e 
					 
					
						
						
							
							[misc] Clean up ray compiled graph type hints ( #13731 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 00:37:08 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2d87d7d1ac 
					 
					
						
						
							
							[Bugfix] Modify modelscope api usage in transformer_utils ( #13807 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 00:36:07 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aab392774b 
					 
					
						
						
							
							[Core] xgrammar: Expand list of unsupported jsonschema keywords ( #13783 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-25 08:21:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6724e79164 
					 
					
						
						
							
							[Misc] Check that the model can be inspected upon registration ( #13743 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 00:18:19 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						03f48b3db6 
					 
					
						
						
							
							[Core] LoRA V1 - Add add/pin/list/remove_lora functions   ( #13705 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 00:18:02 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4d251ad00e 
					 
					
						
						
							
							Fix CompressedTensorsWNA16MoE with grouped scales ( #13769 )  
						
						 
						
						
						
						
					 
					
						2025-02-25 00:17:14 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						18e505930d 
					 
					
						
						
							
							[Bugfix] Support MLA for CompressedTensorsWNA16 ( #13725 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-25 06:10:31 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4a8cfc7551 
					 
					
						
						
							
							[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" ( #13802 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 20:33:59 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc32bc73aa 
					 
					
						
						
							
							[V1][Metrics] Implement vllm:lora_requests_info metric ( #13504 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 20:01:33 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ab1091d5f2 
					 
					
						
						
							
							[Misc][Attention][Quantization] init property earlier ( #13733 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-02-25 03:19:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e15aaef56 
					 
					
						
						
							
							[Bugfix][Quantization] Fix FP8 + EP ( #13784 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-02-25 10:54:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51010a1807 
					 
					
						
						
							
							[Misc] set single whitespace between log sentences ( #13771 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com > 
						
						
					 
					
						2025-02-25 10:26:12 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7196a3b1db 
					 
					
						
						
							
							[Doc] arg_utils.py: fixed a typo ( #13785 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 18:23:04 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cdc1fa12eb 
					 
					
						
						
							
							Remove unused kwargs from model definitions ( #13555 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 17:13:52 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f61528d46d 
					 
					
						
						
							
							[Misc][Chore] Clean Up AsyncOutputProcessing Logs ( #13780 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 16:39:07 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f0ae3ed0a 
					 
					
						
						
							
							[Misc] Clean Up EngineArgs.create_engine_config ( #13734 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com > 
						
						
					 
					
						2025-02-24 13:52:21 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						db986c19ea 
					 
					
						
						
							
							Fix precommit fail in fused_moe intermediate_cache2 chunking ( #13772 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-24 09:25:47 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						227578480d 
					 
					
						
						
							
							Revert "[V1][Core] Fix memory issue with logits & sampling" ( #13775 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 09:16:05 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						befc402d34 
					 
					
						
						
							
							[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) ( #10980 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-02-24 08:29:41 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						444b0f0f62 
					 
					
						
						
							
							[Misc][Docs] Raise error when flashinfer is not installed and VLLM_ATTENTION_BACKEND is set ( #12513 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: NickLucche <nlucches@redhat.com > 
						
						
					 
					
						2025-02-24 10:43:21 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ccc00515fd 
					 
					
						
						
							
							[BugFix]  Illegal memory access for MoE On H20 ( #13693 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 07:37:32 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						781096e385 
					 
					
						
						
							
							Expert Parallelism (EP) Support for DeepSeek V2 ( #12583 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 07:33:20 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7940d8a6a7 
					 
					
						
						
							
							[CI/Build] add python-json-logger to requirements-common ( #12842 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 06:10:33 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c0e3ecd6d2 
					 
					
						
						
							
							[Bugfix] fix(logging): add missing opening square bracket ( #13011 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 06:10:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						23eca9cf68 
					 
					
						
						
							
							[model][refactor] remove cuda hard code in models and layers ( #13658 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 06:10:14 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						437b76ff59 
					 
					
						
						
							
							[V1][Core] Fix memory issue with logits & sampling ( #13721 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 06:10:06 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f90a375593 
					 
					
						
						
							
							[ci] Add logic to change model to S3 path only when S3 CI env var is on ( #13727 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal > 
						
						
					 
					
						2025-02-24 06:32:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7ef74e26e 
					 
					
						
						
							
							Fix some issues with benchmark data output ( #13641 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-02-24 10:23:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cbae7af552 
					 
					
						
						
							
							[V1][BugFix] Fix engine core client shutdown hangs ( #13298 )  
						
						 
						
						... 
						
						
						
						Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method.
Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context.
Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-02-23 13:07:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eb24dc4a45 
					 
					
						
						
							
							[v1] torchrun compatibility ( #13642 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-23 22:47:24 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9bebc9512f 
					 
					
						
						
							
							[Misc] Deprecate --dataset from benchmark_serving.py ( #13708 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-02-23 13:32:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5a2ba16f5c 
					 
					
						
						
							
							[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms ( #13688 )  
						
						 
						
						
						
						
					 
					
						2025-02-23 02:54:29 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba5106e519 
					 
					
						
						
							
							[LMM] Implement merged multimodal processor for whisper ( #13278 )  
						
						 
						
						
						
						
					 
					
						2025-02-23 01:46:03 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d5ca2110f1 
					 
					
						
						
							
							[Quant] BaiChuan SupportsQuant ( #13710 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 19:21:15 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c5e637b57 
					 
					
						
						
							
							[ci] Use env var to control whether to use S3 bucket in CI ( #13634 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 19:19:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						322d2a27d6 
					 
					
						
						
							
							[BugFix] Minor: logger import in attention backend ( #13706 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Andy Lo <andy@mistral.ai > 
						
						
					 
					
						2025-02-22 16:51:13 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82e0d601fc 
					 
					
						
						
							
							[CI/Build] Fix pre-commit errors from  #13571  ( #13709 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-02-22 16:50:38 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78ac0f591d 
					 
					
						
						
							
							[CI/Build] fix uv caching in Dockerfile ( #13611 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 08:25:20 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b56155e7f3 
					 
					
						
						
							
							[XPU]fix setuptools version for xpu ( #13548 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 08:05:35 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						382f66fb08 
					 
					
						
						
							
							[Bugfix] Fix boolean conversion for OpenVINO env variable ( #13615 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 08:04:12 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8354f6640c 
					 
					
						
						
							
							[Doc] Dockerfile instructions for optional dependencies and dev transformers ( #13699 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 06:04:31 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c904fdddf6 
					 
					
						
						
							
							[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm ( #13231 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 05:54:38 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						558db8083c 
					 
					
						
						
							
							[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths ( #13095 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 05:25:41 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e109e598c7 
					 
					
						
						
							
							[NVIDIA] Support nvfp4 cutlass gemm ( #13571 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 05:24:05 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8db1b9d0a1 
					 
					
						
						
							
							Support SSL Key Rotation in HTTP Server ( #13495 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 05:17:44 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2382ad29d1 
					 
					
						
						
							
							[ci] fix linter ( #13701 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-22 20:28:59 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3e472d882a 
					 
					
						
						
							
							[core] set up data parallel communication ( #13591 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-22 19:28:59 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7f6bae561c 
					 
					
						
						
							
							[CI/Build] Fix pre-commit errors ( #13696 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 00:31:26 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						105b8ce4c0 
					 
					
						
						
							
							[Misc] Reduce LoRA-related static variable ( #13166 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 00:21:30 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2cb8c1540e 
					 
					
						
						
							
							[Metrics] Add --show-hidden-metrics-for-version CLI arg ( #13295 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 00:20:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cd981da4f 
					 
					
						
						
							
							[V1][Metrics] Support vllm:cache_config_info ( #13299 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 00:20:00 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fca20841c2 
					 
					
						
						
							
							Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size ( #13660 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 00:19:10 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da31b5333e 
					 
					
						
						
							
							[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler ( #13594 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-02-22 00:08:29 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb78fb318e 
					 
					
						
						
							
							[v1] Support allowed_token_ids in v1 Sampler ( #13210 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-02-22 14:13:05 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8aca27fa11 
					 
					
						
						
							
							[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len ( #13691 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: WangErXiao <863579016@qq.com > 
						
						
					 
					
						2025-02-22 14:10:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95c617e04b 
					 
					
						
						
							
							[Misc] Bump compressed-tensors ( #13619 )  
						
						 
						
						
						
						
					 
					
						2025-02-21 22:09:04 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a1f1da5d1 
					 
					
						
						
							
							[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA ( #13687 )  
						
						 
						
						
						
						
					 
					
						2025-02-21 22:07:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68d630a0c7 
					 
					
						
						
							
							[ROCM] fix native attention function call ( #13650 )  
						
						 
						
						
						
						
					 
					
						2025-02-21 22:07:04 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						68d535ef44 
					 
					
						
						
							
							[Misc] Capture and log the time of loading weights ( #13666 )  
						
						 
						
						
						
						
					 
					
						2025-02-21 22:06:34 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c6ed93860f 
					 
					
						
						
							
							[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid… ( #13672 )  
						
						 
						
						
						
						
					 
					
						2025-02-21 22:05:28 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ffdf8ce0c 
					 
					
						
						
							
							[HTTP Server] Make model param optional in request ( #13568 )  
						
						 
						
						
						
						
					 
					
						2025-02-21 21:55:50 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c0dd3d4df 
					 
					
						
						
							
							docs: Add a note on full CI run in contributing guide ( #13646 )  
						
						 
						
						
						
						
					 
					
						2025-02-21 21:53:59 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ada7c780d5 
					 
					
						
						
							
							[Misc] Fix yapf linting tools etc not running on pre-commit ( #13695 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-02-22 13:10:43 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						288cc6c234 
					 
					
						
						
							
							[Attention] MLA with chunked prefill ( #12639 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Patrick Horn <patrick.horn@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-02-21 15:30:12 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						900edbfa48 
					 
					
						
						
							
							fix typo of grafana dashboard, with correct datasource ( #13668 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: John Zheng <john.zheng@hp.com > 
						
						
					 
					
						2025-02-21 18:21:05 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2c3fc5d65 
					 
					
						
						
							
							[Bugfix][CPU] Fix cpu all-reduce using native pytorch implementation  ( #13586 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:24:17 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						839b27c6cc 
					 
					
						
						
							
							[Kernel]Add streamK for block-quantized CUTLASS kernels ( #12978 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:14:24 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34ad27fe83 
					 
					
						
						
							
							[ci] Fix metrics test model path ( #13635 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:12:10 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1c3c975766 
					 
					
						
						
							
							[FEATURE] Enables /score endpoint for embedding models ( #12846 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:09:47 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1cdc88614a 
					 
					
						
						
							
							Missing comment explaining VDR variable in GGUF kernels ( #13290 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:06:54 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						31aa045c11 
					 
					
						
						
							
							[V1][Sampler] Avoid an operation during temperature application ( #13587 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:05:56 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a30c093502 
					 
					
						
						
							
							[Bugfix] Add mm_processor_kwargs to chat-related protocols ( #13644 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:04:33 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c7b07a95a6 
					 
					
						
						
							
							Use pre-commit to update requirements-test.txt ( #13617 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:03:27 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						27a09dc52c 
					 
					
						
						
							
							[NVIDIA] Fix an issue to use current stream for the nvfp4 quant ( #13632 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 22:01:48 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						981f3c831e 
					 
					
						
						
							
							[Misc] Adding script to setup ray for multi-node vllm deployments  ( #12913 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 21:16:40 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44c33f01f3 
					 
					
						
						
							
							Add llmaz as another integration ( #13643 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kerthcet <kerthcet@gmail.com > 
						
						
					 
					
						2025-02-21 03:52:40 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						33170081f1 
					 
					
						
						
							
							[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth ( #13245 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lingfan Yu <lingfany@amazon.com > 
						
						
					 
					
						2025-02-20 17:45:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						71face8540 
					 
					
						
						
							
							[Bugfix] Fix max_num_batched_tokens for MLA ( #13620 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-20 17:45:20 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bfbc0b32c6 
					 
					
						
						
							
							[Frontend] Add backend-specific options for guided decoding ( #13505 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com > 
						
						
					 
					
						2025-02-20 15:07:58 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a417b8600 
					 
					
						
						
							
							fix neuron performance issue ( #13589 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 10:59:36 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3ea50113c 
					 
					
						
						
							
							[V1][Minor] Print KV cache size in token counts ( #13596 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-20 09:24:31 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						34aad515c8 
					 
					
						
						
							
							Update pre-commit's isort version to remove warnings ( #13614 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 08:00:14 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed6e9075d3 
					 
					
						
						
							
							[Bugfix] Fix deepseekv3 grouped topk error ( #13474 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Chen-XiaoBing <chenxb002@whu.edu.cn > 
						
						
					 
					
						2025-02-20 06:47:01 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						992e5c3d34 
					 
					
						
						
							
							Merge similar examples in offline_inference into single basic example ( #12737 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 04:53:51 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b69692a2d8 
					 
					
						
						
							
							[Kernel] LoRA - Refactor sgmv kernels ( #13110 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 07:28:06 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a64a84433d 
					 
					
						
						
							
							[2/n][ci] S3: Use full model path ( #13564 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <> 
						
						
					 
					
						2025-02-20 01:20:15 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa1e62d0db 
					 
					
						
						
							
							[ci] Fix spec decode test ( #13600 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 16:56:00 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						497bc83124 
					 
					
						
						
							
							[CI/Build] Use uv in the Dockerfile ( #13566 )  
						
						 
						
						
						
						
					 
					
						2025-02-19 23:05:44 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3738e6fa80 
					 
					
						
						
							
							[API Server] Add port number range validation ( #13506 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-02-20 15:05:13 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0023cd2b9d 
					 
					
						
						
							
							[ROCm] MI300A compile targets deprecation ( #13560 )  
						
						 
						
						
						
						
					 
					
						2025-02-19 23:05:00 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						041e294716 
					 
					
						
						
							
							[Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL ( #13533 )  
						
						 
						
						
						
						
					 
					
						2025-02-19 23:04:30 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9621667874 
					 
					
						
						
							
							[Misc] Warn if the vLLM version can't be retrieved ( #13501 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com > 
						
						
					 
					
						2025-02-20 06:24:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c755c3b6d 
					 
					
						
						
							
							[bugfix] spec decode worker get tp group only when initialized ( #13578 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 04:46:28 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba81163997 
					 
					
						
						
							
							[core] add sleep and wake up endpoint and v1 support ( #12987 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: cennn <2523403608@qq.com >
Co-authored-by: cennn <2523403608@qq.com > 
						
						
					 
					
						2025-02-20 12:41:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0d243f2a54 
					 
					
						
						
							
							[ROCm][MoE] mi300 mixtral8x7B perf for specific BS ( #13577 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com > 
						
						
					 
					
						2025-02-20 04:01:02 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88f6ba3281 
					 
					
						
						
							
							[ci] Add AWS creds for AMD ( #13572 )  
						
						 
						
						
						
						
					 
					
						2025-02-20 03:56:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						512368e34a 
					 
					
						
						
							
							[Misc] Qwen2.5 VL support LoRA ( #13261 )  
						
						 
						
						
						
						
					 
					
						2025-02-19 18:37:55 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						473f51cfd9 
					 
					
						
						
							
							[3/n][CI] Load Quantization test models with S3 ( #13570 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal > 
						
						
					 
					
						2025-02-20 10:12:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4c402a756 
					 
					
						
						
							
							[BugFix] Avoid error traceback in logs when V1 LLM terminates ( #13565 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-02-20 00:49:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						550d97eb58 
					 
					
						
						
							
							[Misc] Avoid calling unnecessary hf_list_repo_files for local model path ( #13348 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-02-19 18:57:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fbbe1fbac6 
					 
					
						
						
							
							[MISC] Logging the message about Ray teardown ( #13502 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com > 
						
						
					 
					
						2025-02-19 09:40:50 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						01c184b8f3 
					 
					
						
						
							
							Fix copyright year to auto get current year ( #13561 )  
						
						 
						
						
						
						
					 
					
						2025-02-19 16:55:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad5a35c21b 
					 
					
						
						
							
							[doc] clarify multi-node serving doc ( #13558 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-19 22:32:17 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5ae9f26a5a 
					 
					
						
						
							
							[Bugfix] Fix device ordinal for multi-node spec decode ( #13269 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-02-19 22:13:15 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						377d10bd14 
					 
					
						
						
							
							[VLM][Bugfix] Pass processor kwargs properly on init ( #13516 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-19 13:13:50 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						52ce14d31f 
					 
					
						
						
							
							[doc] clarify profiling is only for developers ( #13554 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-19 20:55:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						81dabf24a8 
					 
					
						
						
							
							[CI/Build] force writing version file ( #13544 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com > 
						
						
					 
					
						2025-02-19 18:48:03 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						423330263b 
					 
					
						
						
							
							[Feature] Pluggable platform-specific scheduler ( #13161 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com > 
						
						
					 
					
						2025-02-19 17:16:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						caf7ff4456 
					 
					
						
						
							
							[V1][Core] Generic mechanism for handling engine utility ( #13060 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-02-19 17:09:22 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f525c0be8b 
					 
					
						
						
							
							[Model][Speculative Decoding] DeepSeek MTP spec decode ( #12755 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com > 
						
						
					 
					
						2025-02-19 17:06:23 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						983a40a8bb 
					 
					
						
						
							
							[Bugfix] Fix Positive Feature Layers in Llava Models ( #13514 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com > 
						
						
					 
					
						2025-02-19 08:50:07 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdc5df6f54 
					 
					
						
						
							
							use device param in load_model method ( #13037 )  
						
						 
						
						
						
						
					 
					
						2025-02-19 16:05:02 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b05cd4555 
					 
					
						
						
							
							[perf-benchmark] Fix ECR path for premerge benchmark ( #13512 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal > 
						
						
					 
					
						2025-02-19 07:56:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d5d214ac7f 
					 
					
						
						
							
							[1/n][CI] Load models in CI from S3 instead of HF ( #13205 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal > 
						
						
					 
					
						2025-02-19 07:34:59 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fd84857f64 
					 
					
						
						
							
							[Doc] Add clarification note regarding paligemma ( #13511 )  
						
						 
						
						
						
						
					 
					
						2025-02-18 22:24:03 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8aada19dfc 
					 
					
						
						
							
							[ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe  ( #13503 )  
						
						 
						
						
						
						
					 
					
						2025-02-18 22:23:24 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9aa95b0e6a 
					 
					
						
						
							
							[perf-benchmark] Allow premerge ECR ( #13509 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal > 
						
						
					 
					
						2025-02-19 05:13:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d0a7a2769d 
					 
					
						
						
							
							[Hardware][Gaudi][Feature] Support Contiguous Cache Fetch  ( #12139 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yuzhou <yuzhou@habana.ai >
Signed-off-by: zhouyu5 <yu.zhou@intel.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-02-18 19:40:19 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						00b69c2d27 
					 
					
						
						
							
							[Misc] Remove dangling references to --use-v2-block-manager ( #13492 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-19 03:37:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c82229898 
					 
					
						
						
							
							[V1][Spec Decode] Optimize N-gram matching with Numba ( #13365 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-18 13:19:58 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c8d70e2437 
					 
					
						
						
							
							Pin Ray version to 2.40.0 ( #13490 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-18 12:50:31 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30172b4947 
					 
					
						
						
							
							[V1] Optimize handling of sampling metadata and req_ids list ( #13244 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-02-18 12:15:33 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4d577b379 
					 
					
						
						
							
							[V1][Tests] Adding additional testing for multimodal models to V1 ( #13308 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com > 
						
						
					 
					
						2025-02-18 09:53:14 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b203b7694 
					 
					
						
						
							
							[misc] fix debugging code ( #13487 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-18 09:37:11 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4fb8142a0e 
					 
					
						
						
							
							[V1][PP] Enable true PP with Ray executor  ( #13472 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-18 09:15:32 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a02c86b4dd 
					 
					
						
						
							
							[CI/Build] migrate static project metadata from setup.py to pyproject.toml ( #8772 )  
						
						 
						
						
						
						
					 
					
						2025-02-18 08:02:49 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3809458456 
					 
					
						
						
							
							[Bugfix] Fix invalid rotary embedding unit test ( #13431 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Liangfu Chen <liangfc@amazon.com > 
						
						
					 
					
						2025-02-18 11:52:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3231cb436 
					 
					
						
						
							
							[Bugfix] Handle content type with optional parameters ( #13383 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zifei Tong <zifeitong@gmail.com > 
						
						
					 
					
						2025-02-18 11:29:13 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						435b502a6e 
					 
					
						
						
							
							[ROCm] Make amdsmi import optional for other platforms ( #13460 )  
						
						 
						
						
						
						
					 
					
						2025-02-18 03:15:56 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29fc5772c4 
					 
					
						
						
							
							[Bugfix] Remove noisy error logging during local model loading ( #13458 )  
						
						 
						
						
						
						
					 
					
						2025-02-18 03:15:48 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2358ca527b 
					 
					
						
						
							
							[Doc]: Improve feature tables ( #13224 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-18 18:52:39 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8cf97f8661 
					 
					
						
						
							
							[Bugfix] Fix failing transformers dynamic module resolving with spawn multiproc method ( #13403 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-02-18 10:25:53 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e2603fefb8 
					 
					
						
						
							
							[Bugfix] Ensure LoRA path from the request can be included in err msg ( #13450 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-02-18 16:19:15 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b53d79983c 
					 
					
						
						
							
							Add outlines fallback when JSON schema has enum ( #13449 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-18 06:49:41 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9915912f7f 
					 
					
						
						
							
							[V1][PP] Fix & Pin Ray version in requirements-cuda.txt ( #13436 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-17 21:58:06 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d1b649f1ef 
					 
					
						
						
							
							[Quant] Aria SupportsQuant ( #13416 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 21:51:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ac19b519ed 
					 
					
						
						
							
							[core] fix sleep mode in pytorch 2.6 ( #13456 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-18 13:48:10 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a1074b3efe 
					 
					
						
						
							
							[Bugfix] Only print out chat template when supplied ( #13444 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 21:43:31 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						00294e1bc6 
					 
					
						
						
							
							[Quant] Arctic SupportsQuant ( #13366 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 21:35:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						88787bce1d 
					 
					
						
						
							
							[Quant] Molmo SupportsQuant ( #13336 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 21:34:47 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						932b51cedd 
					 
					
						
						
							
							[v1] fix parallel config rank ( #13445 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-18 12:33:45 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c7adf81fc 
					 
					
						
						
							
							[ROCm] fix get_device_name for rocm ( #13438 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Divakar Verma <divakar.verma@amd.com > 
						
						
					 
					
						2025-02-18 04:07:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67ef8f666a 
					 
					
						
						
							
							[Model] Enable quantization support for transformers backend ( #12960 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 19:52:47 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						efbe854448 
					 
					
						
						
							
							[Misc] Remove dangling references to SamplingType.BEAM ( #13402 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 19:52:35 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3942e157e 
					 
					
						
						
							
							[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue ( #13425 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-02-18 00:32:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cd4a72a28d 
					 
					
						
						
							
							[V1][Spec decode] Move drafter to model runner  ( #13363 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-17 15:40:12 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6ac485a953 
					 
					
						
						
							
							[V1][PP] Fix intermediate tensor values ( #13417 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-02-17 13:37:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c21ce9eba 
					 
					
						
						
							
							[V1] Get input tokens from scheduler ( #13339 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-17 11:01:07 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce77eb9410 
					 
					
						
						
							
							[Bugfix] Fix VLLM_USE_MODELSCOPE issue ( #13384 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 14:22:01 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						30513d1cb6 
					 
					
						
						
							
							[Bugfix] fix xpu communicator ( #13368 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: yan ma <yan.ma@intel.com > 
						
						
					 
					
						2025-02-17 20:59:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1f69c4a892 
					 
					
						
						
							
							[Model] Support Mamba2 (Codestral Mamba) ( #9292 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com > 
						
						
					 
					
						2025-02-17 20:17:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b623fca0b 
					 
					
						
						
							
							[VLM] Check required fields before initializing field config in DictEmbeddingItems ( #13380 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 01:36:07 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						238dfc8ac3 
					 
					
						
						
							
							[MISC] tiny fixes ( #13378 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 00:57:13 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45186834a0 
					 
					
						
						
							
							Run v1 benchmark and integrate with PyTorch OSS benchmark database ( #13068 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Huy Do <huydhn@gmail.com > 
						
						
					 
					
						2025-02-17 08:16:32 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f857311d13 
					 
					
						
						
							
							Fix spelling error in index.md ( #13369 )  
						
						 
						
						
						
						
					 
					
						2025-02-17 06:53:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						46cdd59577 
					 
					
						
						
							
							[Feature][Spec Decode] Simplify the use of Eagle Spec Decode ( #12304 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-02-16 19:32:26 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2010f04c17 
					 
					
						
						
							
							[V1][Misc] Avoid unnecessary log output ( #13289 )  
						
						 
						
						
						
						
					 
					
						2025-02-16 19:26:24 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						69e1d23e1e 
					 
					
						
						
							
							[V1][BugFix] Clean up rejection sampler & Fix warning msg ( #13362 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-16 12:25:29 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d67cc21b78 
					 
					
						
						
							
							[Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case ( #13358 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-02-16 18:55:27 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e18227b04a 
					 
					
						
						
							
							[V1][PP] Cache Intermediate Tensors ( #13353 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-16 10:02:27 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7b89386553 
					 
					
						
						
							
							[V1][BugFix] Add __init__.py to v1/spec_decode/ ( #13359 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-16 09:39:08 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da833b0aee 
					 
					
						
						
							
							[Docs] Change myenv to vllm. Update python_env_setup.inc.md ( #13325 )  
						
						 
						
						
						
						
					 
					
						2025-02-16 16:04:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5d2965b7d7 
					 
					
						
						
							
							[Bugfix] Fix 2 Node and Spec Decode tests ( #13341 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-16 22:20:22 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0231b7c25 
					 
					
						
						
							
							[platform] add base class for communicators ( #13208 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-16 22:14:22 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						124776ebd5 
					 
					
						
						
							
							[ci] skip failed tests for flashinfer ( #13352 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-16 22:09:15 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b7d309860e 
					 
					
						
						
							
							[V1] Update doc and examples for H2O-VL ( #13349 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Roger Wang <ywang@roblox.com > 
						
						
					 
					
						2025-02-16 10:35:54 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dc0f7ccf8b 
					 
					
						
						
							
							[BugFix] Enhance test_pos_encoding to support execution on multi-devices ( #13187 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wchen61 <wchen61@foxmail.com > 
						
						
					 
					
						2025-02-16 08:59:49 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d3d547e057 
					 
					
						
						
							
							[Bugfix] Pin xgrammar to 0.1.11 ( #13338 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 19:42:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						12913d17ba 
					 
					
						
						
							
							[Quant] Add SupportsQuant to phi3 and clip ( #13104 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 19:28:33 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						80f63a3966 
					 
					
						
						
							
							[V1][Spec Decode] Ngram Spec Decode  ( #12193 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com > 
						
						
					 
					
						2025-02-15 18:05:11 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						367cb8ce8c 
					 
					
						
						
							
							[Doc] [2/N] Add Fuyu E2E example for multimodal processor ( #13331 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 07:06:23 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						54ed913f34 
					 
					
						
						
							
							[ci/build] update flashinfer ( #13323 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 05:33:13 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9206b3d7ec 
					 
					
						
						
							
							[V1][PP] Run engine busy loop with batch queue ( #13064 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 03:59:01 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ed0de3e4b8 
					 
					
						
						
							
							[AMD] [Model] DeepSeek tunings ( #13199 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 03:58:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ad1bc7afe 
					 
					
						
						
							
							[V1][Metrics] Add iteration_tokens_total histogram from V0 ( #13288 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 03:56:19 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7fdaaf48ef 
					 
					
						
						
							
							[Bugfix] Fix qwen2.5-vl image processor ( #13286 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 03:00:11 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						067fa2255b 
					 
					
						
						
							
							[Bugfix]Fix search start_index of stop_checker ( #13280 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 21:39:42 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9076325677 
					 
					
						
						
							
							[BugFix] Don't scan entire cache dir when loading model ( #13302 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 21:33:31 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						97a3d6d995 
					 
					
						
						
							
							[Bugfix] Massage MLA's usage of flash attn for RoCM ( #13310 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 21:33:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						579d7a63b2 
					 
					
						
						
							
							[Bugfix][Docs] Fix offline Whisper ( #13274 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 21:32:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9f9d5b397 
					 
					
						
						
							
							[Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't build on ROCm ( #13235 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 20:30:42 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c73026844 
					 
					
						
						
							
							[V1][PP] Fix memory profiling in PP ( #13315 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-14 20:17:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6a854c7a2b 
					 
					
						
						
							
							[V1][Sampler] Don't apply temp for greedy-only ( #13311 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-02-14 18:10:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e7eea5a520 
					 
					
						
						
							
							[V1][CI] Fix failed v1-test because of min_p ( #13316 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-14 17:29:51 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a12934d3ec 
					 
					
						
						
							
							[V1][Core] min_p sampling support ( #13191 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aoyu <aoyuzhan@amazon.com >
Co-authored-by: Aoyu <aoyuzhan@amazon.com > 
						
						
					 
					
						2025-02-14 15:50:05 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3bcb8c75da 
					 
					
						
						
							
							[Core] Reduce TTFT with concurrent partial prefills ( #10235 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com >
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-02-14 15:36:07 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5e5c8e091e 
					 
					
						
						
							
							[Quant][Perf] Use moe_wna16 kernel by default for MoEs with many experts ( #13236 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-14 12:53:42 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9e2d644e7 
					 
					
						
						
							
							[Hardware][Gaudi][Bugfix] Fix error for guided decoding ( #12317 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 04:36:49 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7734e9a291 
					 
					
						
						
							
							[Core] choice-based structured output with xgrammar ( #12632 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 04:36:05 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6224a9f620 
					 
					
						
						
							
							Support logit_bias in v1 Sampler ( #13079 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 04:34:59 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						085b7b2d6c 
					 
					
						
						
							
							[V1] Simplify GPUModelRunner._update_states check ( #13265 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 04:33:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4da1f667e9 
					 
					
						
						
							
							[VLM] Keep track of whether prompt replacements have been applied ( #13215 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 04:20:46 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						556ef7f714 
					 
					
						
						
							
							[Misc] Log time consumption of sleep and wake-up ( #13115 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jun Duan <jun.duan.phd@outlook.com > 
						
						
					 
					
						2025-02-14 20:10:21 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						83481ceb49 
					 
					
						
						
							
							[Bugfix] Fix missing parentheses ( #13263 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 01:07:10 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						185cc19f92 
					 
					
						
						
							
							[Frontend] Optionally remove memory buffer used for uploading to URLs in run_batch ( #12927 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io > 
						
						
					 
					
						2025-02-14 08:22:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45f90bcbba 
					 
					
						
						
							
							[WIP] TPU V1 Support Refactored ( #13049 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 00:21:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b0ccfc565a 
					 
					
						
						
							
							[Bugfix][V1] GPUModelRunner._update_states should return True when there is a finished request in batch ( #13126 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 22:39:20 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ba59b78a9c 
					 
					
						
						
							
							[ROCm][V1] Add intial ROCm support to V1 ( #12790 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 22:21:50 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cbc40128eb 
					 
					
						
						
							
							[V1] LoRA - Enable Serving Usecase ( #12883 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-02-14 14:21:12 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f0b2da72a8 
					 
					
						
						
							
							Expand MLA to support most types of quantization ( #13181 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 22:19:22 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f2b20fe491 
					 
					
						
						
							
							Consolidate Llama model usage in tests ( #13094 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 22:18:03 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						40932d7a05 
					 
					
						
						
							
							[Misc] Remove redundant statements in scheduler.py ( #13229 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 22:07:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						84683fa271 
					 
					
						
						
							
							[Bugfix] Offline example of disaggregated prefill ( #13214 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 20:20:47 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						067678262a 
					 
					
						
						
							
							[Bugfix][CI] Inherit codespell settings from pyproject.toml in the pre-commit-config ( #13237 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 20:19:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						09545c0a94 
					 
					
						
						
							
							[Bugfix/CI] Turn test_compressed_tensors_2of4_sparse back on ( #13250 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 20:19:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd5ede4440 
					 
					
						
						
							
							[V1] Consolidate MM cache size to vllm.envs ( #13239 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 20:19:03 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8c32b08a86 
					 
					
						
						
							
							[Kernel] Fix awq error when n is not divisable by 128 ( #13227 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 20:07:05 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						410886950a 
					 
					
						
						
							
							[ROCm] Avoid using the default stream on ROCm ( #13238 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-02-14 09:29:26 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e38be640e6 
					 
					
						
						
							
							Revert "Add label if pre-commit passes" ( #13242 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 16:12:32 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c1e37bf71b 
					 
					
						
						
							
							[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels ( #13198 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-02-14 00:01:14 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2344192a55 
					 
					
						
						
							
							Optimize moe_align_block_size for deepseek_v3 ( #12850 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-13 18:43:37 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bffddd9a05 
					 
					
						
						
							
							Add label if pre-commit passes ( #12527 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-13 20:51:30 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d84cef76eb 
					 
					
						
						
							
							[Frontend] Add /v1/audio/transcriptions OpenAI API endpoint ( #12909 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 07:23:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						37dfa60037 
					 
					
						
						
							
							[Bugfix] Missing Content Type returns 500 Internal Server Error ( #13193 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 06:52:22 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1bc3b5e71b 
					 
					
						
						
							
							[VLM] Separate text-only and vision variants of the same model architecture ( #13157 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 06:19:15 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						02ed8a1fbe 
					 
					
						
						
							
							[Misc] Qwen2.5-VL Optimization ( #13155 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 06:17:57 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2092a6fa7d 
					 
					
						
						
							
							[V1][Core] Add worker_base for v1 worker ( #12816 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aoyu <aoyuzhan@amazon.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Aoyu <aoyuzhan@amazon.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-13 20:35:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c9d3ecf016 
					 
					
						
						
							
							[VLM] Merged multi-modal processor for Molmo ( #12966 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 04:34:00 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fdcf64d3c6 
					 
					
						
						
							
							[V1] Clarify input processing and multimodal feature caching logic ( #13211 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 03:43:24 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						578087e56c 
					 
					
						
						
							
							[Frontend] Pass pre-created socket to uvicorn ( #13113 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 00:51:46 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fa253f1a70 
					 
					
						
						
							
							[VLM] Remove input processor from clip and siglip ( #13165 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 00:31:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9605c1256e 
					 
					
						
						
							
							[V1][core] Implement pipeline parallel on Ray ( #12996 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 08:02:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0ccd8769fb 
					 
					
						
						
							
							[CI/Build] Allow ruff to auto-fix some issues ( #13180 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-13 07:45:38 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb944d5818 
					 
					
						
						
							
							Allow Unsloth Dynamic 4bit BnB quants to work ( #12974 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 23:13:08 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d46d490c27 
					 
					
						
						
							
							[Frontend] Move CLI code into vllm.cmd package ( #12971 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 23:12:21 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						04f50ad9d1 
					 
					
						
						
							
							[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case ( #13097 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 23:11:26 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						60c68df6d1 
					 
					
						
						
							
							[Build] Automatically use the wheel of the base commit with Python-only build ( #13178 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 23:10:28 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						009439caeb 
					 
					
						
						
							
							Simplify logic of locating CUDART so file path ( #13203 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-02-13 13:52:41 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc55d13070 
					 
					
						
						
							
							[VLM] Implement merged multimodal processor for Mllama ( #11427 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 20:26:21 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d88c8666a1 
					 
					
						
						
							
							[Bugfix][Example] Fix GCed profiling server for TPU ( #12792 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-02-13 11:52:11 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4fc5c23bb6 
					 
					
						
						
							
							[NVIDIA] Support nvfp4 quantization ( #12784 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 19:51:51 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9f9704dca6 
					 
					
						
						
							
							[perf-benchmark] cleanup unused Docker images and volumes in H100 benchmark instance ( #12706 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 19:51:33 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8eafe5eaea 
					 
					
						
						
							
							[CI/Build] Ignore ruff warning up007 ( #13182 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-13 11:48:31 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c0d93f4b2 
					 
					
						
						
							
							[V1][Bugfix] Copy encoder input ids to fix set iteration issue during VLM abort ( #13173 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com > 
						
						
					 
					
						2025-02-12 12:58:11 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14b7899d10 
					 
					
						
						
							
							[CI] Fix failing FP8 cpu offload test ( #13170 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-12 19:16:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						09972e716c 
					 
					
						
						
							
							[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity ( #13119 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 09:19:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						36a08630e8 
					 
					
						
						
							
							[CORE] [QUANT] Support for GPTQModel's dynamic quantization per module override/control ( #7086 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 09:19:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c2b560f48 
					 
					
						
						
							
							[CI/Build] Use mypy matcher for pre-commit CI job ( #13162 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-12 17:12:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						042c3419fa 
					 
					
						
						
							
							Introduce VLLM_CUDART_SO_PATH to allow users specify the .so path ( #12998 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-02-12 09:06:13 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						82cabf53a3 
					 
					
						
						
							
							[Misc] Delete unused LoRA modules ( #13151 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 08:58:24 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						314cfade02 
					 
					
						
						
							
							[Frontend] Generate valid tool call IDs when using tokenizer-mode=mistral ( #12332 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 08:29:56 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						985b4a2b19 
					 
					
						
						
							
							[Bugfix] Fix num video tokens calculation for Qwen2-VL ( #13148 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-12 11:55:23 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f4d97e4fc2 
					 
					
						
						
							
							[Bug] [V1] Try fetching stop_reason from EngineOutput before checking the request ( #13108 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 02:39:16 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f1042e86f0 
					 
					
						
						
							
							[Misc] AMD Build Improvements ( #12923 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 02:36:10 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7c4033acd4 
					 
					
						
						
							
							Further reduce the HTTP calls to huggingface.co ( #13107 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 02:34:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d59def4730 
					 
					
						
						
							
							Bump actions/setup-python from 5.3.0 to 5.4.0 ( #12672 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 16:41:22 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0c7d9effce 
					 
					
						
						
							
							Bump helm/chart-testing-action from 2.6.1 to 2.7.0 ( #12463 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 16:41:06 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						dd3b4a01f8 
					 
					
						
						
							
							Bump actions/stale from 9.0.0 to 9.1.0 ( #12462 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 00:40:25 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a0597c6b75 
					 
					
						
						
							
							Bump helm/kind-action from 1.10.0 to 1.12.0 ( #11612 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 00:40:19 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e92694b6fe 
					 
					
						
						
							
							[Neuron][Kernel] Support Longer Sequences in NKI-based Flash PagedAttention and Improve Efficiency ( #12921 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lingfan Yu <lingfany@amazon.com > 
						
						
					 
					
						2025-02-11 21:12:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						842b0fd402 
					 
					
						
						
							
							[ci] Add more source file dependencies for some tests ( #13123 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal > 
						
						
					 
					
						2025-02-11 20:38:10 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						974dfd4971 
					 
					
						
						
							
							[Model] IBM/NASA Prithvi Geospatial model  ( #12830 )  
						
						 
						
						
						
						
					 
					
						2025-02-11 20:34:30 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3ee696a63d 
					 
					
						
						
							
							[RFC][vllm-API] Support tokenizer registry for customized tokenizer in vLLM ( #12518 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Keyun Tong <tongkeyun@gmail.com > 
						
						
					 
					
						2025-02-12 12:25:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						72c2b68dc9 
					 
					
						
						
							
							[Misc] Move pre-commit suggestion back to the end ( #13114 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-11 22:34:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						14ecab5be2 
					 
					
						
						
							
							[Bugfix] Guided decoding falls back to outlines when fails to import xgrammar ( #12976 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-02-11 18:17:44 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						deb6c1c6b4 
					 
					
						
						
							
							[Doc] Improve OpenVINO installation doc ( #13102 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-11 18:02:46 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						565c1efa65 
					 
					
						
						
							
							[CI/Build][Bugfix] Fix CPU backend default threads num ( #13077 )  
						
						 
						
						
						
						
					 
					
						2025-02-11 16:55:56 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2b25b7d2e1 
					 
					
						
						
							
							Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 ( #13023 )  
						
						 
						
						
						
						
					 
					
						2025-02-11 08:38:48 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6c4dbe23eb 
					 
					
						
						
							
							[BugFix] Pop instead of del CUDA_VISIBLE_DEVICES ( #12962 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hollow Man <hollowman@opensuse.org > 
						
						
					 
					
						2025-02-12 00:21:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						21f5d50fa5 
					 
					
						
						
							
							[Bugfix] Do not use resource module on Windows ( #12858 ) ( #13029 )  
						
						 
						
						
						
						
					 
					
						2025-02-11 08:21:18 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf3e05215c 
					 
					
						
						
							
							[Misc] Fix typo at comments at metrics.py ( #13024 )  
						
						 
						
						
						
						
					 
					
						2025-02-11 08:20:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad9776353e 
					 
					
						
						
							
							Set torch_dtype in TransformersModel ( #13088 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-11 23:51:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75e6e14516 
					 
					
						
						
							
							[V1][Metrics] Add several request timing histograms ( #12644 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-02-11 10:14:00 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						110f59a33e 
					 
					
						
						
							
							[Bugfix] fix flaky test ( #13089 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com > 
						
						
					 
					
						2025-02-11 14:41:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2e3b969ec0 
					 
					
						
						
							
							[Platform] add pre_register_and_update function ( #12432 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com > 
						
						
					 
					
						2025-02-11 22:06:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						da317197dd 
					 
					
						
						
							
							[Build] Fix cuda link target of cumem_allocator in CPU env ( #12863 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: YuhongGuo <yuhong.gyh@antgroup.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-02-11 21:55:57 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7539bbc6a6 
					 
					
						
						
							
							[ROCm] Using a more precise memory profiling ( #12624 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com > 
						
						
					 
					
						2025-02-11 21:47:10 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9cf4759493 
					 
					
						
						
							
							[executor] init local_rank as device index ( #13027 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mengqing Cao <cmq0113@163.com > 
						
						
					 
					
						2025-02-11 21:20:53 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						41c5dd45b9 
					 
					
						
						
							
							[V1][Metrics] Add GPU prefix cache hit rate % gauge ( #12592 )  
						
						 
						
						
						
						
					 
					
						2025-02-11 08:27:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fc6485d277 
					 
					
						
						
							
							[Bugfix]: Reasoning output bug according to the chat template change ( #13025 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Ce Gao <cegao@tensorchord.ai > 
						
						
					 
					
						2025-02-11 15:49:03 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						78a141d768 
					 
					
						
						
							
							[Misc] LoRA - Refactor Punica ops tests ( #12970 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-02-11 07:26:03 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c320ca8edd 
					 
					
						
						
							
							[Core] Don't do platform detection at import time ( #12933 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-11 07:25:25 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58047c6f04 
					 
					
						
						
							
							[Benchmark] Add BurstGPT to benchmark_serving ( #13063 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com > 
						
						
					 
					
						2025-02-10 21:25:30 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cb080f32e3 
					 
					
						
						
							
							[Bugfix] Support missing tool parameters in mistral tokenizer ( #12884 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Florian Greinacher <florian.greinacher@siemens.com > 
						
						
					 
					
						2025-02-11 03:33:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2c0f58203c 
					 
					
						
						
							
							[Docs] Annouce Meta Meetup ( #13065 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <simon.mo@hey.com > 
						
						
					 
					
						2025-02-10 18:24:29 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ff4857678 
					 
					
						
						
							
							[V1][Minor] Move scheduler outputs to a separate file ( #13062 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-11 02:10:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						91e876750e 
					 
					
						
						
							
							[misc] Fix setup.py condition to avoid AMD from being mistaken with CPU ( #13022 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: kevin <kevin@anyscale.com > 
						
						
					 
					
						2025-02-10 18:06:16 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						08b2d845d6 
					 
					
						
						
							
							[Model] Ultravox Model: Support v0.5 Release ( #12912 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai > 
						
						
					 
					
						2025-02-10 22:02:48 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2ae889052c 
					 
					
						
						
							
							Fix seed parameter behavior in vLLM ( #13007 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com > 
						
						
					 
					
						2025-02-10 23:26:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						51f0b5f7f6 
					 
					
						
						
							
							[Bugfix] Clean up and fix multi-modal processors ( #13012 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-10 10:45:21 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fde71262e0 
					 
					
						
						
							
							[misc] Add retries with exponential backoff for HF file existence check ( #13008 )  
						
						 
						
						
						
						
					 
					
						2025-02-10 01:15:02 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						243137143c 
					 
					
						
						
							
							[Doc] Add link to tool_choice tracking issue in tool_calling.md ( #13003 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-02-10 06:09:33 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b2496bb07f 
					 
					
						
						
							
							[core] fix sleep mode and pytorch checkpoint compatibility ( #13001 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-10 13:03:43 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						44607e07d3 
					 
					
						
						
							
							Check if selected backend is None in get_attn_backend_cls() ( #12975 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yuan Tang <terrytangyuan@gmail.com > 
						
						
					 
					
						2025-02-10 11:45:07 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						67c4637ccf 
					 
					
						
						
							
							[V1] Use msgpack for core request serialization ( #12918 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-02-10 11:35:56 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa0ca5ebb7 
					 
					
						
						
							
							[core][rlhf] add colocate example for RLHF ( #12984 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-10 10:28:59 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						59fff4a01a 
					 
					
						
						
							
							[core] improve error handling when wake up from sleep mode ( #12981 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-10 09:38:57 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						29f1d47e73 
					 
					
						
						
							
							[MISC] Always import version library first in the vllm package ( #12979 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-02-09 18:56:40 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf797aa856 
					 
					
						
						
							
							[core] port pynvml into vllm codebase ( #12963 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-09 15:00:00 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						24700c346b 
					 
					
						
						
							
							[V1] Cache uses_mrope in GPUModelRunner ( #12969 )  
						
						 
						
						
						
						
					 
					
						2025-02-08 15:32:32 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d366ccc4e3 
					 
					
						
						
							
							[RFC] [Mistral] FP8 format ( #10130 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com > 
						
						
					 
					
						2025-02-08 14:12:53 -07:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						870c37481e 
					 
					
						
						
							
							[V1][Minor] Remove outdated comment ( #12968 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-08 12:48:30 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						86222a3dab 
					 
					
						
						
							
							[VLM] Merged multi-modal processor for GLM4V ( #12449 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-02-08 20:32:16 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fe743b798d 
					 
					
						
						
							
							[bugfix] fix early import of flash attention ( #12959 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-09 00:06:56 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						913df14da3 
					 
					
						
						
							
							[Bugfix] Remove unused seq_group_metadata_list from ModelInputForGPU ( #12935 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com > 
						
						
					 
					
						2025-02-08 14:46:19 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8a69e0e20e 
					 
					
						
						
							
							[CI/Build] Auto-fix Markdown files ( #12941 )  
						
						 
						
						
						
						
					 
					
						2025-02-08 04:25:15 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c8dd12ef3 
					 
					
						
						
							
							[Misc] Add qwen2.5-vl BNB support ( #12944 )  
						
						 
						
						
						
						
					 
					
						2025-02-08 04:24:47 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						256a2d29dc 
					 
					
						
						
							
							[Doc] Correct HF repository for TeleChat2 models ( #12949 )  
						
						 
						
						
						
						
					 
					
						2025-02-08 01:42:15 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c45d398e6f 
					 
					
						
						
							
							[CI] Resolve transformers-neuronx version conflict ( #12925 )  
						
						 
						
						
						
						
					 
					
						2025-02-08 01:41:35 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						011e612d92 
					 
					
						
						
							
							[Misc] Log time consumption on weight downloading ( #12926 )  
						
						 
						
						
						
						
					 
					
						2025-02-08 09:16:42 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7e1837676a 
					 
					
						
						
							
							[misc]  Add LoRA to benchmark_serving ( #12898 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-02-08 17:15:44 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						2880e21e3d 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi ( #12812 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai > 
						
						
					 
					
						2025-02-08 17:15:30 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						407b5537db 
					 
					
						
						
							
							[Build] Make pypi install work on CPU platform ( #12874 )  
						
						 
						
						
						
						
					 
					
						2025-02-08 01:15:15 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4ea48fb35c 
					 
					
						
						
							
							[V1][Minor] Move cascade attn logic outside _prepare_inputs ( #12943 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-08 00:39:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e31498bdcb 
					 
					
						
						
							
							[Misc] Add offline test for disaggregated prefill ( #12418 )  
						
						 
						
						
						
						
					 
					
						2025-02-08 08:38:20 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						91dd8f7aa6 
					 
					
						
						
							
							[bugfix] respect distributed_executor_backend in world_size=1 ( #12934 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-08 16:17:08 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d01f66b039 
					 
					
						
						
							
							[Bugfix] Fix multi-round chat error when mistral tokenizer is used ( #12859 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zifei Tong <zifeitong@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-02-08 07:04:34 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cc01223f3b 
					 
					
						
						
							
							[Misc] Fix typo in the example file ( #12896 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Zhao Ke <yingxiongraomingzk@gmail.com > 
						
						
					 
					
						2025-02-08 06:56:43 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						306923da82 
					 
					
						
						
							
							[Bugfix] Fix Qwen2_5_VLForConditionalGeneration packed_modules_mapping ( #12905 )  
						
						 
						
						
						
						
					 
					
						2025-02-07 21:02:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3243158336 
					 
					
						
						
							
							[V1] Move KV block hashes from Request to KVCacheManager ( #12922 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-07 19:14:10 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b21f0f9d17 
					 
					
						
						
							
							[V1][Minor] Remove outdated comment ( #12928 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-07 19:07:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						45cbc4991d 
					 
					
						
						
							
							[Bugfix] Fix disagg hang caused by the prefill and decode communication issues ( #12723 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-02-07 16:39:50 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						932c6b7461 
					 
					
						
						
							
							[V1] LM Eval With Streaming Integration Tests ( #11590 )  
						
						 
						
						
						
						
					 
					
						2025-02-07 15:07:03 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						eaa92d4437 
					 
					
						
						
							
							[ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing ( #12501 )  
						
						 
						
						
						
						
					 
					
						2025-02-07 08:13:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0630d4537a 
					 
					
						
						
							
							[V1] Logprobs and prompt logprobs support ( #9880 )  
						
						 
						
						... 
						
						
						
						This PR is adding support for sample logprobs & prompt logprobs to vLLM v1.
New behavior:
- During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order.
- In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized.
- During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.)
- Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer.
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Co-authored-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Co-authored-by: Nick Hill <nhill@redhat.com > 
						
						
					 
					
						2025-02-07 07:26:20 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						538fab93cd 
					 
					
						
						
							
							PR  #12718  ( #12718 )  
						
						 
						
						
						
						
					 
					
						2025-02-07 06:22:37 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ce26b16268 
					 
					
						
						
							
							[Misc] Remove unnecessary detokenization in multimodal processing ( #12868 )  
						
						 
						
						
						
						
					 
					
						2025-02-07 06:21:17 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1918aa1b80 
					 
					
						
						
							
							[MISC][EASY] Break check file names into entry and args in the pre-commit hooks ( #12880 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-02-07 13:04:39 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6e1fc61f0f 
					 
					
						
						
							
							Prevent unecessary requests to huggingface hub ( #12837 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 21:37:41 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aa375dca9f 
					 
					
						
						
							
							[Bugfix] Missing quant_config in deepseek embedding layer ( #12836 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 21:35:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						433c4a4923 
					 
					
						
						
							
							Make vllm compatible with verl ( #12824 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: zhangshulai <zhangshulai@bytedance.com > 
						
						
					 
					
						2025-02-07 11:54:20 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ef533d25fb 
					 
					
						
						
							
							[Bugfix] FA2 illegal memory access ( #12848 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 19:54:07 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b260782357 
					 
					
						
						
							
							[misc] Revert # 12833 ( #12857 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal > 
						
						
					 
					
						2025-02-06 16:29:12 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						741429a4cd 
					 
					
						
						
							
							[MISC] Check space in the file names in the pre commit checks ( #12804 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-02-06 15:36:21 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						aff404571b 
					 
					
						
						
							
							Add Bamba Model ( #10909 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-02-06 15:22:42 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						467a96a541 
					 
					
						
						
							
							[V1] LoRA Support ( #10957 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com > 
						
						
					 
					
						2025-02-06 09:32:51 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						8108ac841d 
					 
					
						
						
							
							[Bugfix] Fix unsupported FA version check for Turing GPU ( #12828 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 09:18:22 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						afe74f7a96 
					 
					
						
						
							
							[Doc] double quote cmake package in build.inc.md ( #12840 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 09:17:55 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						09b95e36ab 
					 
					
						
						
							
							[torch.compile] PyTorch 2.6 and nightly compatibility ( #12393 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-07 01:09:07 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						85ac82d228 
					 
					
						
						
							
							[Kernel] Make rotary_embedding ops more flexible with input shape ( #12777 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 08:46:13 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1e57b1ee63 
					 
					
						
						
							
							[Misc] Remove unnecessary decode call ( #12833 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 08:45:44 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e152f29502 
					 
					
						
						
							
							[misc] Reduce number of config file requests to HuggingFace ( #12797 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal >
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal > 
						
						
					 
					
						2025-02-06 14:59:18 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c786e757fa 
					 
					
						
						
							
							[Attention] Use FA3 for MLA on Hopper ( #12807 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com > 
						
						
					 
					
						2025-02-06 11:43:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cefd56ee35 
					 
					
						
						
							
							[Docs] Add Google Cloud Slides ( #12814 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 01:02:38 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ca9934fe7 
					 
					
						
						
							
							[Misc] Update w2 scale loading for GPTQMarlinMoE ( #12757 )  
						
						 
						
						
						
						
					 
					
						2025-02-06 01:02:14 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						0408efc6d0 
					 
					
						
						
							
							[Misc] Improve error message for incorrect pynvml ( #12809 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-06 15:23:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						449d1bce02 
					 
					
						
						
							
							[Misc] Remove duplicated DeepSeek V2/V3 model definition ( #12793 )  
						
						 
						
						
						
						
					 
					
						2025-02-05 23:16:20 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1a6fcad4c9 
					 
					
						
						
							
							Improve TransformersModel UX ( #12785 )  
						
						 
						
						
						
						
					 
					
						2025-02-05 22:24:57 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						56534cd577 
					 
					
						
						
							
							[Bugfix] Fix the test_ultravox.py's license ( #12806 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Lu Fang <lufang@fb.com > 
						
						
					 
					
						2025-02-06 13:25:54 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d88506dda4 
					 
					
						
						
							
							[Model] LoRA Support for Ultravox model ( #11253 )  
						
						 
						
						
						
						
					 
					
						2025-02-05 19:54:13 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9cdea30b4f 
					 
					
						
						
							
							[Misc][Easy] Remove the space from the file name  
						
						 
						
						
						
						
					 
					
						2025-02-05 19:23:35 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						76abd0c881 
					 
					
						
						
							
							[Bugfix] Better FP8 supported defaults  
						
						 
						
						
						
						
					 
					
						2025-02-05 19:22:19 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5b19b93082 
					 
					
						
						
							
							[ROCm][Kernel] Using the correct warp_size value  
						
						 
						
						
						
						
					 
					
						2025-02-05 19:15:08 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75404d041b 
					 
					
						
						
							
							[VLM] Update compatibility with transformers 4.49  
						
						 
						
						
						
						
					 
					
						2025-02-05 19:09:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bf3b79efb8 
					 
					
						
						
							
							[VLM] Qwen2.5-VL  
						
						 
						
						
						
						
					 
					
						2025-02-05 13:31:38 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						9a5b1554b4 
					 
					
						
						
							
							[Docs] Drop duplicate [source] links  
						
						 
						
						
						
						
					 
					
						2025-02-05 13:30:50 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a4ce74c14a 
					 
					
						
						
							
							[VLM] Use shared field to pass token ids to model  
						
						 
						
						
						
						
					 
					
						2025-02-05 13:30:46 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3b2005e1db 
					 
					
						
						
							
							Add: Support for Sparse24Bitmask Compressed Models  
						
						 
						
						
						
						
					 
					
						2025-02-05 13:30:43 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						af8486de49 
					 
					
						
						
							
							[Hardware][Intel-Gaudi] Enable FusedSDPA support for Intel Gaudi (HPU)  
						
						 
						
						
						
						
					 
					
						2025-02-05 13:29:45 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4c3aac51e1 
					 
					
						
						
							
							Merging PR  #12536  
						
						 
						
						... 
						
						
						
						Merged via CLI script 
						
						
					 
					
						2025-02-05 13:24:26 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bc1bdecebf 
					 
					
						
						
							
							[core][distributed] exact ray placement control ( #12732 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-06 02:03:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						022bcc701a 
					 
					
						
						
							
							[Bugfix] Fix 'ModuleNotFoundError: No module named 'intel_extension_for_pytorch'' for --tensor-parallel-size more than 1  ( #12546 )  
						
						 
						
						
						
						
					 
					
						2025-02-04 23:11:02 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c53dc466b1 
					 
					
						
						
							
							[Doc] Remove performance warning for auto_awq.md ( #12743 )  
						
						 
						
						
						
						
					 
					
						2025-02-04 22:43:11 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3d09e592a8 
					 
					
						
						
							
							[V1][Misc] Shorten FinishReason enum and use constant strings ( #12760 )  
						
						 
						
						
						
						
					 
					
						2025-02-04 22:43:02 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						fcf2e3d7fc 
					 
					
						
						
							
							[Bugfix] Fix OpenVINO model runner ( #12750 )  
						
						 
						
						
						
						
					 
					
						2025-02-04 22:42:46 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						58b218d7ae 
					 
					
						
						
							
							[Doc] Update PR Reminder with link to Developer Slack ( #12748 )  
						
						 
						
						
						
						
					 
					
						2025-02-04 22:42:09 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						7ff7a638b6 
					 
					
						
						
							
							[Model][Quant] Fix GLM, Fix fused module mappings for quantization ( #12634 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-02-05 05:32:06 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						686006a220 
					 
					
						
						
							
							[Misc] Bump the compressed-tensors version ( #12736 )  
						
						 
						
						
						
						
					 
					
						2025-02-04 20:44:48 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						98fd089fc9 
					 
					
						
						
							
							[VLM] Add MLA with pure RoPE support for deepseek-vl2 models ( #12729 )  
						
						 
						
						
						
						
					 
					
						2025-02-04 20:44:26 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						249824c3bf 
					 
					
						
						
							
							Refactor Linear handling in TransformersModel ( #12727 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com > 
						
						
					 
					
						2025-02-05 04:31:12 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						64862d106e 
					 
					
						
						
							
							[ROCM][AMD][TRITON] Halving warps number for fw_prefill to reduce spilling ( #12713 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com > 
						
						
					 
					
						2025-02-05 03:58:22 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b3a0d01e45 
					 
					
						
						
							
							[Core] add and implement VLLM_LOGITS_PROCESSOR_THREADS ( #12368 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com > 
						
						
					 
					
						2025-02-04 18:46:26 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						75e94309e8 
					 
					
						
						
							
							[Perf] Mem align KV caches for CUDA devices (MLA perf improvement) ( #12676 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: simon-mo <xmo@berkeley.edu >
Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: simon-mo <xmo@berkeley.edu > 
						
						
					 
					
						2025-02-04 18:22:24 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						233df6f5c4 
					 
					
						
						
							
							[V1][Metrics] Add request_success_total counter, labelled with finish reason ( #12579 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Mark McLoughlin <markmc@redhat.com > 
						
						
					 
					
						2025-02-04 19:46:54 -05:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						18016a5e62 
					 
					
						
						
							
							[Bugfix] Fix CI failures for InternVL and Mantis models ( #12728 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk > 
						
						
					 
					
						2025-02-04 23:54:23 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						649550f27e 
					 
					
						
						
							
							[Build] update requirements of no-device for plugin usage ( #12630 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com > 
						
						
					 
					
						2025-02-04 21:19:12 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						62467a834a 
					 
					
						
						
							
							Avoid unnecessary multi-modal input data copy when len(batch) == 1 ( #12722 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: imkero <kerorek@outlook.com > 
						
						
					 
					
						2025-02-04 21:03:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6469038b14 
					 
					
						
						
							
							[Bugfix] Fix loading of fine-tuned models based on Phi-3-Small ( #12689 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Michael Greenbaum <mgreenbaum@microsoft.com >
Co-authored-by: Michael Greenbaum <mgreenbaum@microsoft.com > 
						
						
					 
					
						2025-02-04 20:58:48 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						815079de8e 
					 
					
						
						
							
							[VLM] merged multimodal processor and V1 support for idefics3 ( #12660 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com > 
						
						
					 
					
						2025-02-04 20:00:51 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						18a88fcccc 
					 
					
						
						
							
							[V1] Remove scheduling constraint on partial requests ( #12674 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-04 02:43:58 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						d1ca7df84d 
					 
					
						
						
							
							[VLM] Merged multi-modal processor for InternVL-based models ( #12553 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com > 
						
						
					 
					
						2025-02-04 16:44:52 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						96b23621c1 
					 
					
						
						
							
							[Misc] Add BNB quantization for Whisper ( #12381 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Jee Jee Li <pandaleefree@gmail.com > 
						
						
					 
					
						2025-02-04 16:27:36 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c36ac98d01 
					 
					
						
						
							
							[AMD][ROCm] Enable DeepSeek model on ROCm ( #12662 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com > 
						
						
					 
					
						2025-02-04 08:24:11 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4896d0c2dd 
					 
					
						
						
							
							[Quant] Fix use_mla TypeError and support loading pure-sparsity Compressed Tensors configs ( #12711 )  
						
						 
						
						
						
						
					 
					
						2025-02-03 23:27:11 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						bb392af434 
					 
					
						
						
							
							[Doc] Replace ibm-fms with ibm-ai-platform ( #12709 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com > 
						
						
					 
					
						2025-02-04 07:05:04 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5d98d56089 
					 
					
						
						
							
							Support Pixtral-Large HF by using llava multimodal_projector_bias config ( #12710 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: mgoin <michael@neuralmagic.com > 
						
						
					 
					
						2025-02-04 11:55:46 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						73b35cca7f 
					 
					
						
						
							
							[Core] Improve hash collision avoidance in prefix caching ( #12621 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-03 16:28:20 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						5095e96606 
					 
					
						
						
							
							[V1] Revert uncache_blocks and support recaching full blocks ( #12415 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-02-03 15:04:53 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						cf58b9c4ca 
					 
					
						
						
							
							[MISC] Remove model input dumping when exception ( #12582 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Cody Yu <hao.yu.cody@gmail.com > 
						
						
					 
					
						2025-02-03 13:34:16 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						4797dad3ec 
					 
					
						
						
							
							[Model] Add Deepseek V3 fp8_w8a8 configs for B200 ( #12707 )  
						
						 
						
						
						
						
					 
					
						2025-02-03 13:30:39 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						6dd5e52823 
					 
					
						
						
							
							Squelch MLA warning for Compressed-Tensors Models ( #12704 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kyle Sayers <kylesayrs@gmail.com > 
						
						
					 
					
						2025-02-03 13:29:56 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c11de33dad 
					 
					
						
						
							
							[Bugfix][Kernel] Fix per-token/per-channel quantization for Hopper scaled mm ( #12696 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com > 
						
						
					 
					
						2025-02-03 13:04:59 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						33e0602e59 
					 
					
						
						
							
							[Misc] Fix improper placement of SPDX header in scripts ( #12694 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-03 11:16:59 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						a1a2aaadb9 
					 
					
						
						
							
							[Model]: Add transformers backend support ( #11330 )  
						
						 
						
						... 
						
						
						
						# Adds support for `transformers` as a backend
Following https://github.com/huggingface/transformers/pull/35235 , a
bunch of models should already be supported, we are ramping up support
for more models.
Thanks @Isotr0py for the TP support, and @hmellor for his help as well!
This includes: 
- `trust_remote_code=True` support: any model on the hub, if it
implements attention the correct way can be natively supported!!
- tensor parallel support
---------
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn > 
						
						
					 
					
						2025-02-03 21:30:38 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						1298a400e8 
					 
					
						
						
							
							[ci/build] fix gh200 test ( #12681 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-03 15:59:49 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						ad4a9dc817 
					 
					
						
						
							
							[cuda] manually import the correct pynvml module ( #12679 )  
						
						 
						
						... 
						
						
						
						fixes problems like https://github.com/vllm-project/vllm/pull/12635  and
https://github.com/vllm-project/vllm/pull/12636  and
https://github.com/vllm-project/vllm/pull/12565 
---------
Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-03 15:58:21 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b9986454fe 
					 
					
						
						
							
							Fix for attention layers to remain unquantized during moe_wn16 quant ( #12570 )  
						
						 
						
						... 
						
						
						
						Fix to AWQ quant loading of the new R1 model
The new optimized MoE kernels for a large number of experts `moe_wn16`
uses AWQ quant which requires the attention layers to be in 16bit
The current merge has broken this, and the `get_quant_method` must
return None for it to work correctly again
---------
Signed-off-by: Srikanth Srinivas <srikanth@astrum.ai >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Beim <beim2015@outlook.com >
Signed-off-by: rshaw@neuralmagic.com  <rshaw@neuralmagic.com >
Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: simon-mo <xmo@berkeley.edu >
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Ryan N <ryan.nguyen@centml.ai >
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: Vicente Herrera <vicenteherrera@vicenteherrera.com >
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Shawn Du <shawnd200@outlook.com >
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Beim <805908499@qq.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: mgoin <michael@neuralmagic.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Nishidha <nishidha.panpaliya@partner.ibm.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Aleksandr Malyshev <164964928+maleksan85@users.noreply.github.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: simon-mo <simon.mo@hey.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Kevin H. Luu <kevin@anyscale.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Ryan Nguyen <96593302+xpbowler@users.noreply.github.com >
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com >
Co-authored-by: fade_away <1028552010@qq.com >
Co-authored-by: weilong.yu <weilong.yu@shopee.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Eldar Kurtic <eldarkurtic314@gmail.com >
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Vicente Herrera <vicenteherrera@vicenteherrera.com >
Co-authored-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: Shawn Du <shawnd200@outlook.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-03 13:46:19 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						c5932e5dac 
					 
					
						
						
							
							Properly check if all fused layers are in the list of targets ( #12666 )  
						
						 
						
						... 
						
						
						
						Thanks @kylesayrs for catching this! 
						
						
					 
					
						2025-02-03 13:42:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						20579c0fae 
					 
					
						
						
							
							make sure mistral_common not imported for non-mistral models ( #12669 )  
						
						 
						
						... 
						
						
						
						When people use deepseek models, they find that they need to solve cv2
version conflict, see https://zhuanlan.zhihu.com/p/21064432691  .
I added the check, and make all imports of `cv2` lazy.
---------
Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-03 13:40:25 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						95460fc513 
					 
					
						
						
							
							[Kernel] port sgl moe_align_block_size kernels ( #12574 )  
						
						 
						
						... 
						
						
						
						sgl_moe_align_block_size is based on:
ded9fcd09a 
moe_align_block_size is based on:
ba5112ff69 
Signed-off-by: Yang Chen <yangche@fb.com > 
						
						
					 
					
						2025-02-03 13:09:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						326fcc8b9f 
					 
					
						
						
							
							[Doc] Deprecate Discord ( #12668 )  
						
						 
						
						
						
						
					 
					
						2025-02-02 19:19:56 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e64330910b 
					 
					
						
						
							
							[doc][misc] clarify VLLM_HOST_IP for multi-node inference ( #12667 )  
						
						 
						
						... 
						
						
						
						As more and more people are trying deepseek models with multi-node
inference, https://github.com/vllm-project/vllm/issues/7815  becomes more
frequent. Let's give clear message to users.
Signed-off-by: youkaichao <youkaichao@gmail.com > 
						
						
					 
					
						2025-02-03 09:32:18 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e489ad7a21 
					 
					
						
						
							
							[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )  
						
						 
						
						... 
						
						
						
						- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date:   Fri Jan 31 14:18:24 2025 -0500
    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/ 
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date:   Fri Jan 31 14:36:32 2025 -0500
    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-02 11:58:18 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f256ebe4df 
					 
					
						
						
							
							[Hardware][Intel GPU] add XPU bf16 support ( #12392 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Kunshang Ji <kunshang.ji@intel.com > 
						
						
					 
					
						2025-02-02 10:17:26 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						f8ece6e17f 
					 
					
						
						
							
							[Core][v1] Unify allocating slots in prefill and decode in KV cache manager ( #12608 )  
						
						 
						
						... 
						
						
						
						As mentioned in RFC https://github.com/vllm-project/vllm/issues/12254 ,
this PR achieves the task: combine allocate_slots and append_slots.
There should be no functionality change, except that in decode, also
raise exception when num_tokens is zero (like prefill), and change the
unit test case accordingly.
@comaniac @rickyyx @WoosukKwon @youkaichao @heheda12345 @simon-mo
---------
Signed-off-by: Shawn Du <shawnd200@outlook.com > 
						
						
					 
					
						2025-02-02 16:40:58 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						abfcdcdf27 
					 
					
						
						
							
							[V1][Minor] Avoid frequently creating ConstantList ( #12653 )  
						
						 
						
						... 
						
						
						
						A small optimization to avoid creating a new `ConstantList` every time `request.kv_block_hashes` is used.
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu > 
						
						
					 
					
						2025-02-01 23:43:20 -08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						e497f33491 
					 
					
						
						
							
							[Core] Silence unnecessary deprecation warnings ( #12620 )  
						
						 
						
						... 
						
						
						
						I noticed during testing that I was getting a lot of these deprecation
warnings about `local_lora_path`:
```
DeprecationWarning: The 'lora_local_path' attribute is deprecated
     and will be removed in a future version.
     Please use 'lora_path' instead.
```
The check used for emitting this warning was always True, even when the
parameter was not actually specified. It will always be in
`__struct_fields__`. We should be checking for a non-None value,
instead.
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com > 
						
						
					 
					
						2025-02-02 15:35:50 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						baaa2b24da 
					 
					
						
						
							
							[Bugfix] fix moe_wna16 get_quant_method ( #12648 )  
						
						 
						
						... 
						
						
						
						Fix https://github.com/vllm-project/vllm/issues/12647 
The `get_quant_method` of `moe_wna16` always return moe method,
GPTQ-based linear method or AWQ-based linear method, even when the
target module is attention layer.
baeded2569/vllm/attention/layer.py (L86-L92) 
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com > 
						
						
					 
					
						2025-02-02 15:29:56 +08:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						b4e5c03306 
					 
					
						
						
							
							doc: fixing minor typo in readme.md ( #12643 )  
						
						 
						
						... 
						
						
						
						Word "evolved" was mistyped
Signed-off-by: Vicente Herrera <vicenteherrera@vicenteherrera.com >
---------
Signed-off-by: Vicente Herrera <vicenteherrera@vicenteherrera.com > 
						
						
					 
					
						2025-02-01 17:17:29 +00:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
					 
					
						
						
							
						
						3194039c0e 
					 
					
						
						
							
							Apply torch.compile to fused_moe/grouped_topk ( #12637 )  
						
						 
						
						
						
						
					 
					
						2025-02-01 16:16:19 +00:00