|
d3d2aad5a2
|
[Log] Use Debug Once for DeepGEMM E8M0 When not Enabled (#23858)
|
2025-08-28 22:18:10 +00:00 |
|
|
cb293f6a79
|
[V1] Enable prefill optimization for Gemma3n (#22628)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-28 14:54:30 -07:00 |
|
|
7ffbf27239
|
[BugFix][FlashInfer] Fix potential race condition for paged_kv_indptr_cpu (#23737)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-28 14:22:46 -07:00 |
|
|
27e88cee74
|
chore: build release image by default (#23852)
Signed-off-by: Codex <codex@openai.com>
|
2025-08-28 13:17:15 -07:00 |
|
|
16a45b3a28
|
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
|
2025-08-28 19:36:50 +00:00 |
|
|
57d4ede520
|
[bugfix] [spec-decoding] fix data race in sample_recovered_tokens_kernel (vLLM v1) (#23829)
Signed-off-by: He-Jingkai <he-jingkai@outlook.com>
|
2025-08-28 19:05:20 +00:00 |
|
|
04d1dd7f4a
|
[ROCm][Aiter] Add triton fp8 bmm kernel for mla (#23264)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>
|
2025-08-28 18:18:08 +00:00 |
|
|
f32a5bc505
|
Migrate Llama4ImagePatchInputs to TensorSchema (#22021)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-28 17:29:37 +00:00 |
|
|
8805ad9fa9
|
Add scale_config.yml file for Meta autoscalers for GH Actions (#23840)
Signed-off-by: Jean Schmidt <contato@jschmidt.me>
|
2025-08-28 09:31:20 -07:00 |
|
|
0583578f42
|
[ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757)
Signed-off-by: Jean Schmidt <contato@jschmidt.me>
|
2025-08-28 08:59:19 -07:00 |
|
|
db74d60490
|
[Bugfix] Add fake mode around passes (#23349)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-08-28 11:25:56 -04:00 |
|
|
95089607fa
|
[Model][gpt-oss] Support DP+EP for GPT-OSS with FlashInfer trtllm-gen MoE (#23819)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-08-28 06:56:20 -07:00 |
|
|
1f096f9b95
|
[CI] Fix linting error on main (#23835)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-28 06:52:01 -07:00 |
|
|
66548f6603
|
[Bugfix] Fix benchmark_moe.py for blockwise fp8. (#23823)
Signed-off-by: crischeng <420985011@qq.com>
Co-authored-by: cris <grace@guisenbindeMacBook-Pro.local>
|
2025-08-28 21:44:09 +08:00 |
|
|
d3da2eea54
|
[Doc]: fix typos in Python scripts (#23828)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-28 05:37:38 -07:00 |
|
|
bfab219648
|
[Model] [gpt-oss] fix gpt-oss pp support (#23815)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-08-28 05:36:55 -07:00 |
|
|
a3432f18fd
|
[BugFix][Spec Decode] Use float64 for uniform_probs (#23803)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-28 12:26:45 +00:00 |
|
|
67cee40da0
|
[CI/Build][Bugfix] Fix Qwen VL tests on CPU (#23818)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-28 11:57:05 +00:00 |
|
|
d99c3a4f7b
|
[Doc]: fix typos in .md files (including those of #23751) (#23825)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-28 04:38:19 -07:00 |
|
|
3462c1c522
|
[FIXBUG] Add return_success parameter to moe_wna16_weight_loader function (#22797)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-28 09:03:22 +00:00 |
|
|
c5d004aaaf
|
[Model] Add PP support and VLM backbone compatability for GPT-OSS (#23680)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-28 16:03:28 +08:00 |
|
|
11a7fafaa8
|
[New Model]: Support GteNewModelForSequenceClassification (#23524)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-28 15:36:42 +08:00 |
|
|
186aced5ff
|
[Kernel] cuda kernels for upcoming decode context parallel feature (#23791)
Co-authored-by: hongchao <hongchao@msh.team>
|
2025-08-28 15:29:11 +08:00 |
|
|
daa1273b14
|
[Bugfix] when set offline model running error (#23711)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-28 07:27:45 +00:00 |
|
|
c07a73317d
|
[CI] enable idefics3 and fuyu-8b test in multimodal test (#23790)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-08-28 14:51:24 +08:00 |
|
|
22feac8e95
|
[Transform] [Quantization] Add transforms to compressed tensors (#22486)
|
2025-08-28 02:43:48 -04:00 |
|
|
c8851a4723
|
Add deprecation warning for lora_extra_vocab_size (#23635)
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
|
2025-08-27 22:34:29 -07:00 |
|
|
f48a9af892
|
[CI] make all multi-gpu weight loading tests run nightly (#23792)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
|
2025-08-27 21:27:36 -07:00 |
|
|
a11adafdca
|
Gracefully handle edge cases in harmony utils (#23155)
Signed-off-by: Jan Kessler <jakessle@uni-mainz.de>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-27 20:14:00 -07:00 |
|
|
a781e84ec2
|
[Perf] Tune configs for triton block fp8 gemm H100/H200 (#23748)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-28 11:12:53 +08:00 |
|
|
1b7b161a09
|
[Feature] models: pass layer prefix to replace_linear_class for per-layer quantization routing. Addresses #23239 (#23556)
Signed-off-by: Shrey Gupta <shreyg1303@gmail.com>
|
2025-08-27 20:12:44 -07:00 |
|
|
a69693e38f
|
Migrate Qwen inputs to TensorSchema (#23473)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-08-28 10:43:26 +08:00 |
|
|
5da4f5d857
|
[Bugfix] Fix for V1 priority scheduling crashes at preemption (#23713)
Signed-off-by: Hanchenli <lihanc2002@gmail.com>
|
2025-08-28 00:44:52 +00:00 |
|
|
321938e9ac
|
[Feature] Add VLLM_DISABLE_PAD_FOR_CUDAGRAPH to Avoid Hang Issue (#23595)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-27 21:52:24 +00:00 |
|
|
f9ca2b40a0
|
[Bugfix] Fix Marlin NVFP4 for modelopt (#23659)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-27 17:48:16 -04:00 |
|
|
082cc07ef8
|
DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100 (#23608)
|
2025-08-27 17:33:21 -04:00 |
|
|
853c371fc3
|
[V1][Mamba] - Enable V1 by default for Mamba Models (#23650)
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
|
2025-08-27 20:53:30 +00:00 |
|
|
8bf6266a17
|
[Multimodal] Generate mm_hash based on request metadata when caching is turned off (#23690)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-08-27 20:24:31 +00:00 |
|
|
0585a9e73c
|
Disable torch.compile for dynamic rope models in Transformers backend (#23738)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-27 19:03:05 +00:00 |
|
|
3c0ef769ba
|
ci: Add arm64 docker build to release pipeline (#23210)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Signed-off-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>
|
2025-08-27 10:41:48 -07:00 |
|
|
4e4d017b6f
|
[Docs] Fix warnings in mkdocs build (continued) (#23743)
Signed-off-by: Zerohertz <ohg3417@gmail.com>
Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com>
|
2025-08-27 17:17:29 +00:00 |
|
|
dd58932280
|
[V1] [Hybrid] Enable compile and piecewise CUDA graph for MiniMax-Text models (#22589)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-27 10:05:16 -07:00 |
|
|
52883ed084
|
[Model] Merge SupportsMultiModalWithRawInput with SupportsMultiModal (#23749)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-27 10:01:50 -07:00 |
|
|
4f35be10a9
|
[BugFix] Fix topk_softmax assert (#19764)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
|
2025-08-27 09:47:28 -07:00 |
|
|
2b61d2e22f
|
[Docs] Remove in-tree Gaudi install instructions (#23628)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-27 09:22:21 -07:00 |
|
|
3ce8285d6d
|
[LogitsProcs] Deduplicate built-in LP implementation logic (#23362)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-27 23:11:33 +08:00 |
|
|
83f555f637
|
[Doc]: upgrade version of crate-ci tool for improved typo detection (#23755)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-27 07:59:34 -07:00 |
|
|
841490434a
|
[Model] Enable native HF format InternVL support (#23742)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-27 14:45:17 +00:00 |
|
|
3af47c3cc6
|
[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-08-27 14:09:08 +00:00 |
|
|
513c1fe255
|
Only run get_attr_docs if generating help text (#23723)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-27 13:55:12 +00:00 |
|