|
7b0365efef
|
[Doc] Add the DCO to CONTRIBUTING.md (#9803)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-10-30 05:22:23 +00:00 |
|
|
04a3ae0aca
|
[Bugfix] Fix multi nodes TP+PP for XPU (#8884)
Signed-off-by: YiSheng5 <syhm@mail.ustc.edu.cn>
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn>
|
2024-10-29 21:34:45 -07:00 |
|
|
62fac4b9aa
|
[ci/build] Pin CI dependencies version with pip-compile (#9810)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-10-30 03:34:55 +00:00 |
|
|
226688bd61
|
[Bugfix][VLM] Make apply_fp8_linear work with >2D input (#9812)
|
2024-10-29 19:49:44 -07:00 |
|
|
64cb1cdc3f
|
Update README.md (#9819)
|
2024-10-29 17:28:43 -07:00 |
|
|
1ab6f6b4ad
|
[core][distributed] fix custom allreduce in pytorch 2.5 (#9815)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-29 17:06:24 -07:00 |
|
|
bc73e9821c
|
[Bugfix] Fix prefix strings for quantized VLMs (#9772)
|
2024-10-29 16:02:59 -07:00 |
|
|
8d7724104a
|
[Docs] Add notes about Snowflake Meetup (#9814)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-10-29 15:19:02 -07:00 |
|
|
882a1ad0de
|
[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
|
2024-10-29 15:07:37 -07:00 |
|
|
67bdf8e523
|
[Bugfix][Frontend] Guard against bad token ids (#9634)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-29 14:13:20 -07:00 |
|
|
0ad216f575
|
[MISC] Set label value to timestamp over 0, to keep track of recent history (#9777)
Signed-off-by: Kunjan Patel <kunjanp@google.com>
|
2024-10-29 19:52:19 +00:00 |
|
|
7585ec996f
|
[CI/Build] mergify: fix rules for ci/build label (#9804)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-29 19:24:42 +00:00 |
|
|
ab6f981671
|
[CI][Bugfix] Skip chameleon for transformers 4.46.1 (#9808)
|
2024-10-29 11:12:43 -07:00 |
|
|
ac3d748dba
|
[Model] Add LlamaEmbeddingModel as an embedding Implementation of LlamaModel (#9806)
|
2024-10-29 10:40:35 -07:00 |
|
|
0ce7798f44
|
[Misc]: Typo fix: Renaming classes (casualLM -> causalLM) (#9801)
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
|
2024-10-29 10:39:20 -07:00 |
|
|
0f43387157
|
[Bugfix] Use host argument to bind to interface (#9798)
|
2024-10-29 10:37:59 -07:00 |
|
|
08600ddc68
|
Fix the log to correct guide user to install modelscope (#9793)
Signed-off-by: yuze.zyz <yuze.zyz@alibaba-inc.com>
|
2024-10-29 10:36:59 -07:00 |
|
|
74fc2d77ae
|
[Misc] Add metrics for request queue time, forward time, and execute time (#9659)
|
2024-10-29 10:32:56 -07:00 |
|
|
622b7ab955
|
[Hardware] using current_platform.seed_everything (#9785)
Signed-off-by: wangshuai09 <391746016@qq.com>
|
2024-10-29 14:47:44 +00:00 |
|
|
09500f7dde
|
[Model] Add BNB quantization support for Mllama (#9720)
|
2024-10-29 08:20:02 -04:00 |
|
|
ef7865b4f9
|
[Frontend] re-enable multi-modality input in the new beam search implementation (#9427)
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
|
2024-10-29 11:49:47 +00:00 |
|
|
eae3d48181
|
[Bugfix] Use temporary directory in registry (#9721)
|
2024-10-28 22:08:20 -07:00 |
|
|
e74f2d448c
|
[Doc] Specify async engine args in docs (#9726)
|
2024-10-28 22:07:57 -07:00 |
|
|
7a4df5f200
|
[Model][LoRA]LoRA support added for Qwen (#9622)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-10-29 04:14:07 +00:00 |
|
|
c5d7fb9ddc
|
[Doc] fix third-party model example (#9771)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-28 19:39:21 -07:00 |
|
|
76ed5340f0
|
[torch.compile] add deepseek v2 compile (#9775)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-28 14:35:17 -07:00 |
|
|
97b61bfae6
|
[misc] avoid circular import (#9765)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-28 20:51:23 +00:00 |
|
|
aa0addb397
|
Adding "torch compile" annotations to moe models (#9758)
|
2024-10-28 13:49:56 -07:00 |
|
|
5f8d8075f9
|
[Model][VLM] Add multi-video support for LLaVA-Onevision (#8905)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-28 18:04:10 +00:00 |
|
|
8b0e4f2ad7
|
[CI/Build] Adopt Mergify for auto-labeling PRs (#9259)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-28 09:38:09 -07:00 |
|
|
2adb4409e0
|
[Bugfix] Fix ray instance detect issue (#9439)
|
2024-10-28 07:13:03 +00:00 |
|
|
feb92fbe4a
|
Fix beam search eos (#9627)
|
2024-10-28 06:59:37 +00:00 |
|
|
32176fee73
|
[torch.compile] support moe models (#9632)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-27 21:58:04 -07:00 |
|
|
4e2d95e372
|
[Hardware][ROCM] using current_platform.is_rocm (#9642)
Signed-off-by: wangshuai09 <391746016@qq.com>
|
2024-10-28 04:07:00 +00:00 |
|
|
34a9941620
|
[Bugfix] Fix load config when using bools (#9533)
|
2024-10-27 13:46:41 -04:00 |
|
|
e130c40e4e
|
Fix cache management in "Close inactive issues and PRs" actions workflow (#9734)
|
2024-10-27 10:30:03 -07:00 |
|
|
3cb07a36a2
|
[Misc] Upgrade to pytorch 2.5 (#9588)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-10-27 09:44:24 +00:00 |
|
|
8549c82660
|
[core] cudagraph output with tensor weak reference (#9724)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-27 00:19:28 -07:00 |
|
|
67a6882da4
|
[Misc] SpecDecodeWorker supports profiling (#9719)
Signed-off-by: Abatom <abatom@163.com>
|
2024-10-27 04:18:03 +00:00 |
|
|
6650e6a930
|
[Model] Add classification Task with Qwen2ForSequenceClassification (#9704)
Signed-off-by: Kevin-Yang <ykcha9@gmail.com>
Co-authored-by: Kevin-Yang <ykcha9@gmail.com>
|
2024-10-26 17:53:35 +00:00 |
|
|
07e981fdf4
|
[Frontend] Bad words sampling parameter (#9717)
Signed-off-by: Vasily Alexeev <alvasian@yandex.ru>
|
2024-10-26 16:29:38 +00:00 |
|
|
55137e8ee3
|
Fix: MI100 Support By Bypassing Custom Paged Attention (#9560)
|
2024-10-26 12:12:57 +00:00 |
|
|
5cbdccd151
|
[Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716)
|
2024-10-26 10:59:06 +00:00 |
|
|
067e77f9a8
|
[Bugfix] Steaming continuous_usage_stats default to False (#9709)
Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>
|
2024-10-26 05:05:47 +00:00 |
|
|
6567e13724
|
[Bugfix] Fix crash with llama 3.2 vision models and guided decoding (#9631)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: pavlo-ruban <pavlo.ruban@servicenow.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-25 15:42:56 -07:00 |
|
|
228cfbd03f
|
[Doc] Improve quickstart documentation (#9256)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-10-25 14:32:10 -07:00 |
|
|
ca0d92227e
|
[Bugfix] Fix compressed_tensors_moe bad config.strategy (#9677)
|
2024-10-25 12:40:33 -07:00 |
|
|
9645b9f646
|
[V1] Support sliding window attention (#9679)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-10-24 22:20:37 -07:00 |
|
|
a6f3721861
|
[Model] add a lora module for granite 3.0 MoE models (#9673)
|
2024-10-24 22:00:17 -07:00 |
|
|
9f7b4ba865
|
[ci/Build] Skip Chameleon for transformers 4.46.0 on broadcast test #9675 (#9676)
|
2024-10-24 20:59:00 -07:00 |
|