84cf78acee
[Model] Pooling models default to using chunked prefill & prefix caching if supported. ( #20930 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-08-11 09:41:37 -07:00
c49848396d
Refactor sliding window configuration to Transformers best practice ( #21927 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-08-09 20:50:48 -07:00
86ae693f20
[Deprecation][2/N] Replace --task
with --runner
and --convert
( #21470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-07-27 19:42:40 -07:00
8632e831ba
[Core] Add update_config
RPC method ( #20095 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-07-14 00:49:18 +00:00
020f58abcd
[Core] Support multiple tasks per model ( #20771 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-07-12 19:40:11 -07:00
6f1229f91d
[Model][2/N] Automatic conversion of CrossEncoding model ( #19978 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-07-03 13:59:23 +00:00
ccbfb1d1c9
[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models ( #20322 )
...
Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com >
2025-07-02 12:53:36 +00:00
c05596f1a3
[Perf] Validate @config in pre-commit instead of dynamically ( #20200 )
...
Signed-off-by: Lionel Villard <villard@us.ibm.com >
2025-07-01 05:10:28 -04:00
b692e9cd07
[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config ( #19660 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-06-16 06:30:29 +00:00
a2142f0196
Support non-string values in JSON keys from CLI ( #19471 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-11 09:34:04 +00:00
da511d54d8
Fix CompilationConfig repr ( #19091 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-06-06 16:23:35 +08:00
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-06-03 11:20:17 -07:00
109e15a335
Add pt_load_map_location
to allow loading to cuda ( #16869 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-05-01 23:23:42 -07:00
13698db634
Improve configs - ModelConfig
( #17130 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-30 10:38:22 +08:00
2ef5d106bb
Improve literal dataclass field conversion to argparse argument ( #17391 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 16:25:08 +00:00
d27ea94034
Improve configs - TokenizerPoolConfig
+ DeviceConfig
( #16603 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 11:19:42 +00:00
47512b3200
Default to generation_config
from model ( #12622 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-08 14:46:15 +08:00
76c89fcadd
Use smaller embedding model when not testing model specifically ( #13891 )
2025-02-28 00:50:43 -08:00
2c5e637b57
[ci] Use env var to control whether to use S3 bucket in CI ( #13634 )
2025-02-22 19:19:45 -08:00
a64a84433d
[2/n][ci] S3: Use full model path ( #13564 )
...
Signed-off-by: <>
2025-02-20 01:20:15 -08:00
d5d214ac7f
[1/n][CI] Load models in CI from S3 instead of HF ( #13205 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal >
2025-02-19 07:34:59 +00:00
f2b20fe491
Consolidate Llama model usage in tests ( #13094 )
2025-02-13 22:18:03 -08:00
d84cef76eb
[Frontend] Add /v1/audio/transcriptions
OpenAI API endpoint ( #12909 )
2025-02-13 07:23:45 -08:00
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
ff7424f491
[Frontend] Support override generation config in args ( #12409 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com >
2025-01-29 01:41:01 -08:00
8f10d5e393
[Misc] Split up pooling tasks ( #10820 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-11 01:28:00 -08:00
133707123e
[Model] Replace embedding models with pooling adapter ( #10769 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-01 08:02:54 +08:00
2ac6d0e75b
[Misc] Consolidate pooler config overrides ( #10351 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-15 06:59:00 +00:00
b41fb9d3b1
[Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers ( #9982 )
...
Signed-off-by: Sourashis Roy <sroy@roblox.com >
2024-11-12 10:53:57 -08:00
b09895a618
[Frontend][Core] Override HF config.json
via CLI ( #5836 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-09 16:19:27 +00:00
aa9078fa03
Adds method to read the pooling types from model's files ( #9506 )
...
Signed-off-by: Flavia Beo <flavia.beo@ibm.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Max de Bayser <mbayser@br.ibm.com >
2024-11-07 08:42:40 +00:00
db7db4aab9
[Misc] Consolidate ModelConfig code related to HF config ( #10104 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-07 06:00:21 +00:00
051eaf6db3
[Model] Add user-configurable task for models that support both generation and embedding ( #9424 )
2024-10-18 11:31:58 -07:00
7e7eae338d
[Misc] Standardize RoPE handling for Qwen2-VL ( #9250 )
2024-10-16 13:56:17 +08:00
421e218b37
[Bugfix] Bump transformers to 4.43.2 ( #6752 )
2024-07-24 13:22:16 -07:00
1bedf210e3
Bump transformers
version for Llama 3.1 hotfix and patch Chameleon ( #6690 )
2024-07-23 13:47:48 -07:00
dcbf4286af
[Frontend] Customizable RoPE theta ( #5197 )
2024-06-11 10:42:26 -07:00
1102bef219
[Bugfix / Core] Prefix Caching Guards (merged with main) ( #4846 )
...
Co-authored-by: rsnm2 <rshaw@neuralmagic.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com >
2024-05-27 15:18:17 -07:00
9b9a10d6cb
[Frontend] Dynamic RoPE scaling ( #4638 )
2024-05-22 01:32:35 -04:00
69e1d2fb69
[Core] Refactor model loading code ( #4097 )
2024-04-16 11:34:39 -07:00
54be8a0be2
Fix assertion failure in Qwen 1.5 with prefix caching enabled ( #3373 )
...
Co-authored-by: Cade Daniel <edacih@gmail.com >
2024-03-14 13:56:57 -07:00