|
a24e01a257
|
修改oneDNN为国内镜像地址
|
2025-09-01 16:26:59 +08:00 |
|
|
6d8d0a24c0
|
Add think chunk (#21333)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
v0.10.0
v0.10.0rc2
|
2025-07-23 21:51:32 -07:00 |
|
|
11ef7a611e
|
[BugFix] Set CUDA_VISIBLE_DEVICES before spawning the subprocesses (#21211)
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-23 21:44:04 -07:00 |
|
|
dc2f159f8a
|
Dump input metadata on crash for async scheduling (#21258)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-23 21:10:30 -07:00 |
|
|
d5b981f8b1
|
[DP] Internal Load Balancing Per Node [one-pod-per-node ] (#21238)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-23 20:57:32 -07:00 |
|
|
eec6942014
|
[BugFix] Fix KVConnector TP worker aggregation (#21473)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-23 20:56:49 -07:00 |
|
|
fd48d99ffd
|
[BugFix]: Batch generation from prompt_embeds fails for long prompts (#21390)
Signed-off-by: KazusatoOko <kazusto.oko@sakana.ai>
Co-authored-by: KazusatoOko <kazusto.oko@sakana.ai>
|
2025-07-23 20:43:17 -07:00 |
|
|
f8c15c4efb
|
[Bugfix] Fix example disagg_example_p2p_nccl_xpyd.sh zombie process (#21437)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-07-23 20:42:11 -07:00 |
|
|
aa08a954f9
|
[Bugfix] Fix casing warning (#21468)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-07-23 20:41:23 -07:00 |
|
|
13e4ee1dc3
|
[XPU][UT] increase intel xpu CI test scope (#21492)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
2025-07-23 20:24:04 -07:00 |
|
|
772ce5af97
|
[Misc] Add dummy maverick test to CI (#21324)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-23 20:22:42 -07:00 |
|
|
63d92abb7c
|
[Frontend] Set MAX_AUDIO_CLIP_FILESIZE_MB via env var instead of hardcoding (#21374)
Signed-off-by: Deven Labovitch <deven@videa.ai>
|
2025-07-23 20:22:19 -07:00 |
|
|
11599b0e1f
|
feat(gguf_loader): accept HF repo paths & URLs for GGUF (#20793)
Signed-off-by: Hardik <hardikgupta1999@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-23 20:21:02 -07:00 |
|
|
f3137cdd81
|
[Core] Freeze gc during cuda graph capture to speed up init (#21146)
Signed-off-by: Codex <codex@openai.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-23 17:20:14 -07:00 |
|
|
82ec66f514
|
[V0 Deprecation] Remove Prompt Adapters (#20588)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-23 16:36:48 -07:00 |
|
|
78c13e30e1
|
[V1] Fix local chunked attention always disabled (#21419)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-23 15:59:30 -07:00 |
|
|
5c9b807b34
|
[Core] Add reload_weights RPC method (#20096)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-07-23 14:24:52 -07:00 |
|
|
14bf19e39f
|
[TPU][TEST] Fix the downloading issue in TPU v1 test 11. (#21418)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-23 11:29:36 -07:00 |
|
|
4ac7713e32
|
Add test case for compiling multiple graphs (#21044)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-23 11:00:47 -07:00 |
|
|
8560a5b258
|
[Core][Model] PrithviMAE Enablement on vLLM v1 engine (#20577)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2025-07-23 11:00:23 -07:00 |
|
|
316b1bf706
|
[Tests] Add tests for headless internal DP LB (#21450)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-23 07:49:25 -07:00 |
|
|
7c734ee09b
|
[Bugfix][Qwen][DCA] fixes bug in dual-chunk-flash-attn backend for qwen 1m models. (#21364)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
|
2025-07-23 06:34:37 -07:00 |
|
|
f59ec35b7f
|
[V1] Check all pooling tasks during profiling (#21299)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-23 05:53:26 -07:00 |
|
|
2671334d45
|
[Model] add Hunyuan V1 Dense Model support. (#21368)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-23 03:54:08 -07:00 |
|
|
2cc5016a19
|
[Docs] Clean up v1/metrics.md (#21449)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-23 03:37:25 -07:00 |
|
|
6929f8b437
|
[Misc] fixed nvfp4_moe test failures due to invalid kwargs (#21246)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-07-23 01:41:43 -07:00 |
|
|
32ec9e2f2a
|
Mamba V2 Test not Asserting Failures. (#21379)
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
|
2025-07-23 01:40:27 -07:00 |
|
|
accac82928
|
[Sampler] Introduce logprobs mode for logging (#21398)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-07-23 01:39:25 -07:00 |
|
|
23637dcdef
|
[Docs] Fix bullets and grammars in tool_calling.md (#21440)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-07-23 01:23:20 -07:00 |
|
|
6364af92f8
|
Fixed typo in profiling logs (#21441)
|
2025-07-23 01:18:54 -07:00 |
|
|
7aaa2bd5a8
|
[Bugfix] ensure tool_choice is popped when tool_choice:null is passed in json payload (#19679)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-07-23 00:30:05 -07:00 |
|
|
2f5c14de6a
|
add clear messages for deprecated models (#21424)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-07-23 00:03:16 -07:00 |
|
|
f002e9a870
|
[Cleanup] Only log MoE DP setup warning if DP is enabled (#21315)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-23 00:02:48 -07:00 |
|
|
a1f3610fc6
|
[Core] Add basic unit test for maybe_evict_cached_block (#21400)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-23 00:02:02 -07:00 |
|
|
4ecedd1806
|
[Bugfix] Fix nightly transformers CI failure (#21427)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-23 00:01:01 -07:00 |
|
|
107111a859
|
Changing "amdproduction" allocation. (#21409)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-07-22 20:48:31 -07:00 |
|
|
2dec7c1a5d
|
[Bugfix][CUDA] fixes CUDA FP8 kv cache dtype supported (#21420)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-07-22 20:34:50 -07:00 |
|
|
08d2bd78da
|
[BUGFIX] deepseek-v2-lite failed due to fused_qkv_a_proj name update (#21414)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2025-07-22 20:33:57 -07:00 |
|
|
4f76a05f4f
|
[BugFix] Update python to python3 calls for image; fix prefix & input calculations. (#21391)
Signed-off-by: Eric Hanley <ericehanley@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-22 20:33:00 -07:00 |
|
|
f154bb9ff0
|
Simplify weight loading in Transformers backend (#21382)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-22 20:29:43 -07:00 |
|
|
3ec7170ff1
|
[Bugfix][ROCm][Build] Fix build regression on ROCm (#21393)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-07-22 20:27:41 -07:00 |
|
|
c401c64b4c
|
[CI/Build] Fix model executor tests (#21387)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-22 20:25:37 -07:00 |
|
|
b77c7d327f
|
[BugFix] Fix ray import error mem cleanup bug (#21381)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-07-22 16:19:55 -07:00 |
|
|
35bc8bd5fb
|
[Misc] Copy HF_TOKEN env var to Ray workers (#21406)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-07-22 16:18:42 -07:00 |
|
|
4594fc3b28
|
[Model] Add Qwen3CoderToolParser (#21396)
Signed-off-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: simon-mo <xmo@berkeley.edu>
|
2025-07-22 15:05:57 -07:00 |
|
|
ae268b6326
|
Fix Flashinfer Allreduce+Norm enable disable calculation based on fi_allreduce_fusion_max_token_num (#21325)
Signed-off-by: XIn Li <xinli@nvidia.com>
|
2025-07-22 12:42:31 -07:00 |
|
|
35366ae57c
|
[CI/Build] Fix test failure due to updated model repo (#21375)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-22 08:39:35 -07:00 |
|
|
2226d5bd85
|
[Bugfix] Decode Tokenized IDs to Strings for hf_processor in llm.chat() with model_impl=transformers (#21353)
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
|
2025-07-22 08:27:28 -07:00 |
|
|
44554a0068
|
Add tokenization_kwargs to encode for embedding model truncation (#21033)
|
2025-07-22 08:24:00 -07:00 |
|
|
226b452a20
|
Revert "[Refactor] Fix Compile Warning #1444-D (#21208)" (#21384)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-22 08:22:10 -07:00 |
|