Compare commits

...

3986 Commits

Author SHA1 Message Date
ca38d640b7 fix 2025-10-17 22:21:10 +02:00
b8011a3dc5 fix 2025-10-17 21:30:29 +02:00
a118c8e1c4 fix 2025-10-17 20:59:00 +02:00
0109c42409 fix 2025-10-17 20:30:23 +02:00
3c7552f733 fix 2025-10-17 15:40:54 +02:00
4757bf062b fix 2025-10-17 15:12:12 +02:00
aceaa7ce97 fix 2025-10-17 15:05:09 +02:00
c9293376a0 fix 2025-10-17 12:04:50 +02:00
e69d3ca150 check 1 2025-10-17 10:44:24 +02:00
bffad7f4fb check 1 2025-10-17 09:21:09 +02:00
740f952218 check 1 2025-10-17 06:57:10 +02:00
950c4e5303 check 1 2025-10-17 06:28:55 +02:00
89970f4797 check 1 2025-10-17 03:03:25 +02:00
a4a46e62a5 check 1 2025-10-16 21:32:04 +02:00
9b36498d5f 1 2025-10-16 21:16:53 +02:00
eefbf4ac8b 🌐 [i18n-KO] Translated llama4.md to Korean (#40396)
* docs: ko: llama4.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/model_doc/llama4.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/model_doc/llama4.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/model_doc/llama4.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/model_doc/llama4.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-10-16 11:28:27 -07:00
50ca781d78 🌐 [i18n-KO] Translated code_llama.md to Korean (#40558)
* docs: ko: code_llama.md

* feat: nmt draft

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
2025-10-16 11:27:46 -07:00
8739fc05c4 [i18n-KO] Translated big_bird.md to Korean (#40445)
* docs: ko: BigBird.md

* feat: nmt draft

* fix: manual edits
2025-10-16 11:23:56 -07:00
77b5ad65ee 🌐 [i18n-KO] Translated sam_hq.md to Korean (#41340)
* fix: manual edits

* Apply suggestions from code review

Apply suggestions from code review

Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com>

* Apply suggestions from code review

Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-10-16 11:10:16 -07:00
a9731a725e 🌐 [i18n-KO] Translated chat_extras.md to Korean (#39863)
* docs: ko: chat_extras.md

* feat: nmt draft

* fix: manual edits

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs/source/ko/chat_extras.md
2025-10-16 10:41:03 -07:00
bdbc2d037b [Trainer] [Breaking change] use_cache default to False (#41585)
* use_cache default to `False` when training

* style

* Fix comment

* add checks

* style

* set

* switch
2025-10-16 18:51:36 +02:00
fe11cbb808 Erroring when KernelConfig is passed without use_kernels = True (#41657)
* update

* update
2025-10-16 18:08:46 +02:00
6344371a91 improve utils/check_bad_commit.py (#41658)
* robust

* robust

* robust

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-16 15:51:19 +00:00
a408384a88 Improve package version check (#41661)
fix
2025-10-16 17:31:58 +02:00
f7c33abab3 Small changes to benchmarking script (#41662) 2025-10-16 17:25:49 +02:00
9839d57a02 Fix serving continuous batching (#41624)
* udpate-serving-cb

* style

* style

* check none

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-10-16 17:24:21 +02:00
e85d5ab2bb Fix dtype casting with quantization (#41665)
fix dtype casting
2025-10-16 17:19:32 +02:00
1c36d407d5 Add in-out modalities as class attribute per model (#41366)
* update all models

* fix copies

* explanation comment

* better notation in omni model

* style

* fix copies

* output_modalities under generation mixin

* fix copies

* oh, glm4v also needs conversion
2025-10-16 17:11:06 +02:00
0215846d98 Switch to CB if cache_implementation == paged (#41655)
* Add a switch to CB in case of paged cache

* Added paged as a valid cache implem

* Added a fallback on inputs_ids as a name

* Rookie mistake

* Removed paged from cache implems

* Added warning about some  beam search args

* Moved up CB warning
2025-10-16 17:00:18 +02:00
9e99198e5e Use | for Optional and Union typing (#41646)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-16 14:29:54 +00:00
bf815e9b5e [Masks] Fix mask handling in eager for vision models (#41625)
add mask handling in case of models that do use it
2025-10-16 16:27:26 +02:00
vb
4a43e3d57c purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet (#41656) 2025-10-16 16:17:09 +02:00
8725ce10ed [Fix] Deepseek V3 expert bias routing (#41647)
* [Fix] Deepseek V3 expert bias routing

* [Fix] fix-copies

* [Fix] Run make style
2025-10-16 14:04:48 +00:00
1fb3fc4db0 [kernels] refactor function kernel calling (#41577)
* refactor function kernel callling

* nit

* don't pass the mapping

* use _kernels_available

* rm import
2025-10-16 15:43:02 +02:00
9176af574a Double router compute? (#41653)
* weird double router compute?

* flip it
2025-10-16 15:17:21 +02:00
503c933f36 Fix confusing cls assignment (#41642)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-16 13:01:07 +00:00
2aff20aff6 Fix typos in documentation (#41641)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-16 12:58:46 +00:00
981370c038 Format MarkDown documentation and tiny fixes (#41638)
* Fix MarkDown syntax

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-16 12:58:06 +00:00
eef9fb2af3 Fix EncoderDecoder cache (#41612)
* Fix EncoderDecoder cache

* Add the option for the ddp data tuples to have 2 elems

* Modifiy the order of the KV and sliding

* Adapted RAG and Whisper to new EncoderDecoderCache

* A single comma

* Remove kwargs in map

* Fixed order in manual injection cache test

* Slight changes to support legacy format

* Removed Nonnes
2025-10-16 14:55:41 +02:00
35dc8f0a2e Adjust device logging level and add minor fixes (#41636)
This commit addresses a noisy warning and improves the robustness of the base pipeline implementation.

- The device placement message in the pipeline base class has been changed from a `warning` to a `debug` log. This reduces log noise for users who are aware of their device setup, while still providing the information for debugging purposes.

- Additionally, potential `UnboundLocalError` exceptions in the `_pad` and `check_model_type` functions have been prevented by initializing variables before their conditional assignment.
2025-10-16 12:47:39 +00:00
2935a1be19 Fix fp32_ln for various models (#41605)
* Add is_causal to KosmosTextAttention

* Move get target_dtype to be imported elsewhere

* Fix fp32 flash attention bug in bark

* Fix is_causal in mllama

* Fix fp32 issue on StableLM

* Fix repo-consistency
2025-10-16 14:18:49 +02:00
b9bd8c45a1 [CI] Build translated docs (#41632)
fix
2025-10-16 14:01:33 +02:00
baecdb8a97 [Ernie 4.5 Moe] Fix Moe and offloading (#41385)
fix
2025-10-16 13:59:01 +02:00
44539827d5 [Executorch] Simplify for encoder models (#41627)
* Trigger Build

* revert extra treatment for executorch as we default to no vmapping now
2025-10-16 13:57:52 +02:00
143acfe2ce fix check inputs for text2text pipeline (#41556)
fix check inputs

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-16 11:42:41 +00:00
67fae90519 Fix FP-Quant quantization fallback CPU dispatch. (#41619)
* fp_quant fix

* Update quantizer_fp_quant.py
2025-10-16 11:41:01 +00:00
af2a66ced9 Migrate transformers cli to Typer (#41487)
* Add typer-slim as explicit dependency

* Migrate CLI to Typer

* code quality

* bump release candidate

* adapt test_cli.py

* Remove ./commands + adapt tests

* fix quality

* consistency

* doctested

* do not serve model in chat

* style

* will it fix them?

* fix test

* capitalize classes

* Rebase

* Rebase

* tests + fixup

tests + fixup

* csutom error message

* fix ?

* should be good

* fix caplog globally

* inner caplog

* last attempt

* Retry

* Let's try with capsys disabled

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-10-16 13:29:42 +02:00
a59124e27e Add missing dates to docs (#41576)
add dates
2025-10-16 09:32:28 +00:00
81f97b17d2 Remove randomly added script (#41650)
remove
2025-10-16 11:23:53 +02:00
c0a5cf19ad Fix tokenization test (#41649)
fix
2025-10-16 11:14:20 +02:00
3ef6f2c415 Allow passing tp_plan in from_pretrained directly (#41435)
* start

* allow passing it

* fix plans

* fix

* fix

* style

* style

* fix

* add_test

* oupsi indent

* fix

* fix

* fix for CI without accelerator

* fix import
2025-10-16 11:12:07 +02:00
59efd86da2 Add aux loss for GLM-4.5V (#41564)
* add aux

* update

* update config to text_config

* use qwen data class to avoid repeat again

* format

* update

* use 1e-4

* update

* update for remove init

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-10-16 09:04:21 +00:00
7b7d17f9bf 🚨 [v5] Toggle the serialization format in processors (#41474)
* toggle the serialization

* prob this fixes it

* fix tests

* typo

* delete legacy save entirely

* remove extra nesting in if

* revert test and serialzie a public attr instead of private
2025-10-16 10:19:22 +02:00
e20df45bf6 Add Backbone API fine-tuning tutorial (#41590)
---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-15 18:42:32 +02:00
19df66dcba Update executorch.md (#41582)
* Update executorch.md

* Update executorch.md

* Update executorch.md

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-15 09:01:46 -07:00
9f71e3a604 [docs] Duplicate entry (#41591)
fix
2025-10-15 17:02:36 +02:00
bc9900562d Fix quantization base class (#41613)
* fix

* fix

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-15 16:58:17 +02:00
72fd67929b Remove deprecated code (#41616)
remove

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-15 16:57:52 +02:00
da382917aa Remove the head masking block in some vision models (#41620)
* old

* new

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-15 15:51:01 +02:00
313afcc468 [chat template] update when "push_to_hub" (#39815)
* update templates push to hub

* rvert jinja suffix and move it to processor file
2025-10-15 13:49:59 +00:00
7bba4d1202 Fix video processing channel format (#41603)
fix
2025-10-15 15:48:01 +02:00
ab92534377 enable sdpa enable gqa logic for Ascend NPU (#41601)
* enable gqa logic for Ascend NPU

* remove redundant comments

* fix comments about Ascend NPU

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-10-15 13:45:28 +00:00
56a727dde5 Add fast path for bidirectional mask creation to fix regression (#41586)
* fixed performance regression

* also fixed the older_torch function

* Update src/transformers/masking_utils.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

* more general

* fix slicing

* fix data dependent

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-15 15:30:39 +02:00
dc6fdeb705 Update a dataset reop link (#41618)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-15 14:41:38 +02:00
3953b65440 Reinstate early CUDA init fix (#41617)
* Reinstate early CUDA init fix

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Delay import further

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-15 14:41:10 +02:00
96d245a83d torch 2.9 don't ❤️ torchcodec 💔 (#41610)
pin

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-15 14:34:00 +02:00
bb0c3af995 More markdown file fixes (#41599)
* Format markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-15 12:29:27 +00:00
70e871959c Fix trainer simple tests (#41449)
* fix

* fix ray

* train to tune

* breaking changes wrt generation config

* Fix !

* fix

* fix

* fix deepspeed !

* fix

* fix

* fix

* improve logic

* revert and fix

* revert comment

* oups

* revert change

* fix

* style

* typo in comment

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-15 14:09:00 +02:00
c4210796e0 Import expand_device_map instead of redefining it (#41608)
remove it
2025-10-15 14:00:09 +02:00
fcd1ccdb78 [Docs] Fix changed references (#41614)
* fix

* fix

* other ln
2025-10-15 13:59:13 +02:00
2b2c20f315 Update issue template (#41573)
* update

* fix
2025-10-15 13:54:37 +02:00
e2122c4bcb remove ray_scope and check_quantized_param (#41587)
remove
2025-10-15 13:10:35 +02:00
e89cef6625 fix some case failures lead by "torch.compile recompiled part of th… (#41558)
* fix some case failures lead by "`torch.compile` recompiled part of the forward pass" in xpu

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update comment

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-10-15 10:45:29 +00:00
26b7f66850 Add logits_to_keep to many older CausalLM models (#41335)
* Add logits_to_keep to CausalLM models

* Skip failing test for git model

* Remove unused return_dict from kosmos2 signature

* Revert BlipForQuestionAnswering
2025-10-15 11:56:01 +02:00
5db730786d [device_map] Accelerate loading by computing device_map much faster (#41548)
* start

* add the important fix

* continue

* big cleanup

* type hints

* add method

* fix typehints

* typehints

* fix

* oupsi

* remove space

* improve function

* CI
2025-10-15 11:18:57 +02:00
13a35a5057 Enable non-streaming mode in transformers serve (#41446)
* Enable non-streaming in transformers serve

Remove typos

Remove typos

Remove typos

* Fix tests

* Arthur review
2025-10-15 09:37:26 +02:00
94df0e6560 Benchmark overhaul (#41408)
* Big refactor, still classes to move around and script to re-complexify

* Move to streamer, isolate benches, propagate num tokens

* Some refacto

* Added compile mode to name

* Re-order

* Move to dt_tokens

* Better format

* Fix and disable use_cache by default

* Fixed compile and SDPA backend default

* Refactor results format

* Added default compile mode

* Always use cache

* Fixed cache and added flex

* Plan for missing modules

* Experiments: no cg and shuffle

* Disable compile for FA

* Remove wall time, add sweep mode, get git commit

* Review compliance, start

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Update benchmark_v2/framework/benchmark_runner.py

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Disable workflow

* Pretty print

* Added some pretty names to have pretty logs

* Review n2 compliance (end?)

* Style and end of PR

---------

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
2025-10-14 21:41:43 +02:00
9e4199ede3 Gemma3 fixes (#41572)
* Multiple device error fix

* FA2 equivalence fix

* Move the train fwd in cfg test

* Style

* Added comment

* Made the comment more clear
2025-10-14 18:33:27 +02:00
4c8d293599 Fix typsetting and content of llm_tutorial_optimization.md (#41172)
* Fix typsetting of llm_tutorial_optimization

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix errors

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-14 08:40:26 -07:00
a99b1be3c7 Revert some breaking changes bnb (#41581)
fix
2025-10-14 16:28:16 +02:00
82cae9eb52 Add __iter__ to DynamicCache (#41569)
* Add __iter__ to DynamicCache

* Fix tests that use ddp init
2025-10-14 16:16:32 +02:00
4fad35ee4a [VisionEncoderDecoderModel] Update loss function (#40863)
Update loss function
2025-10-14 16:03:00 +02:00
ae6f6cc3e0 Revert "add rmsnorm kernels support for Intel XPU" (#41579)
Revert "add rmsnorm kernels support for Intel XPU (#41563)"

This reverts commit fd787c5f6d667d3e00def70f588972af4437f631.
2025-10-14 15:49:33 +02:00
fd787c5f6d add rmsnorm kernels support for Intel XPU (#41563)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-10-14 13:26:09 +00:00
4e4f2af586 Add conditional checks to _check_and_adjust_attn_implementation() (#41542) 2025-10-14 13:00:07 +00:00
3648fde486 Add DINOv3Backbone for ConvNext variant (#40651)
---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-10-14 14:57:04 +02:00
abf5b57a68 delete some tokenizer tests using pickle (#41514)
* hate pickle

* hate pickle

* hate pickle

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-14 14:50:51 +02:00
8fe4db5399 [kernels] rm mra kernels (#41507)
* fix modeling

* remove kernel

* fix style
2025-10-14 13:34:04 +02:00
c620c38bb0 [Qwen3VLMoe] Fixed: Expected self.dtype to be equal to src.dtype - routing_weights casting (#41420)
* Fixed Expected self.dtype to be equal to src.dtype on eval

* Fixed Expected self.dtype to be equal to src.dtype on eval

* Fixed Expected self.dtype to be equal to src.dtype on eval

* generated modeling_qwen3_vl_moe.py file

* Fixed Ernie_4_5_MoE router casting

* Fixed routing_weights dtype casting (ernie4_5_moe, hunyuan_v1_moe, qwen2_moe, qwen3_moe, qwen3_next,qwen3_omni_moe)

* rollback hunyuan_v1_moe changes

---------

Co-authored-by: Daniel Oliveira <daniel-oliveira-11@hotmail.com>
Co-authored-by: Daniel Oliveira <36623265+daniel3303@users.noreply.github.com>
2025-10-14 13:14:49 +02:00
0798797ec9 Fix an import error with PreTrainModel (#41571) 2025-10-14 13:13:37 +02:00
0566b6f5bd Patch MistralCommonTokenizer (#41439)
* Fix token_to_id and add add_generation_prompt

* Fix spm download

* Refactor spm

* Try another possibly non-gated spm

* Improve get_vocab

* lint

* Improve get_vocab

* Add warn to piece_to_id

* Improve from_pretrained raise and revert model spm

* Revert fast
2025-10-14 11:13:19 +00:00
b3e3c3dc93 [Qwen3VL] fix device mismatch error for FSDP2 training (#41536)
For FSDP2, parameters might be on a meta device, and the weight.device attribute may
not accurately reflect where the actual computation will happen during forward passes.

```log
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 776, in forward
    pos_embeds = self.fast_pos_embed_interpolate(grid_thw)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 745, in fast_pos_embed_interpolate
    pos_embeds = self.pos_embed(idx_tensor) * weight_tensor[:, :, None]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1879, in _call_impl
    return inner()
           ^^^^^^^
  File "torch/nn/modules/module.py", line 1827, in inner
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/sparse.py", line 192, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "torch/nn/functional.py", line 2546, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)
```
https://github.com/volcengine/verl/pull/3686#issuecomment-3380981817

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-14 10:28:25 +00:00
b84c0b31c6 Remove references to AutoModelForVision2Seq (#41513)
* Since Vision2Seq is deprecated, remove it from pipelines and docstrings

* Catch some more references
2025-10-13 17:00:07 +01:00
1ee3b288a6 [from_pretrained] Small refactor from_pretrained: move around unrelated stuff (#41445)
* drafts

* up

* simplify modeling utils

* more simplifications

* type kwargs

* up

* move more accelerate related stuff

* safeguarding?

* nits

* remove func when func is NOPE

* more

* nits

* styling

* yups

* up

* ups

* revert

* protect trainer utils iport

* fix doc

* Update src/transformers/integrations/peft.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* review

* update

* ?

* fixx

* update

* super small update

* ups

* style

* this is stupid

* 🤦 well this was the issue

* small nit

* fix

* nit

* damn the missing return

* one last stupid fix

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-13 16:33:32 +02:00
cad74496ca [model] Add VideoLLaMA3 implementation (#40499)
* Add VideoLLaMA3 implementation

* Run style fix

* Switch to modular

* Fix config and smart_resize

* Fix

* Fix

* Fix style

* Fix

* Ruff fix

* Rename

* Rename

* Fix

* Clean

* Fix consistency

* Add doc

* Fix

* Fix

* Fix doc

* Update generated code

* remove test_initialization

* fix tests

* simplify

* tests

* Add VideoLlama3IntegrationTest

* replace asserts

* fix tests

---------

Co-authored-by: steven-ccq <55176896+steven-ccq@users.noreply.github.com>
Co-authored-by: steven-ccq <1456320989@qq.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-13 15:54:34 +02:00
3813a8e3a1 Add VideoMAE video processor (#41534)
* Add video processor for VideoMAE

* Document VideoMAE video processor

* Add regression tests for VideoMAE video processor

* refactor: Use direct batch key access for pixel_values_videos

* test: add parity test for VideoMAEVideoProcessor vs VideoMAEImageProcessor

* docs(videomae): update model docstring example to demonstrate VideoMAEVideoProcessor (TorchCodec-based decoding and sampling)
2025-10-13 15:42:27 +02:00
66d8d7a077 Fixed typos and formatting (#34215)
#hacktoberfest
2025-10-13 13:38:06 +00:00
d621be8286 🚨 [v5] generate delegates default cache initialization to the model (#41505) 2025-10-13 13:20:48 +01:00
d7c9fbdb64 Enable modular files from other libraries (#41372)
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-13 13:48:32 +02:00
41e763decd Add AMD developer cloud support (#41126)
* Add AMD developer cloud support

* Add AMD remote svg link.

* Update notebooks/README.md

Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>

---------

Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>
2025-10-13 12:17:24 +02:00
cf1e9834ec Restore cuda graphs to continuous batching (#41421)
* Type hints and small fixes

* Remove unusued params

* Made slice inputs the default

* ruffed

* Updated some var name and moved index slicing

* Logging arg in example

* Added some padding debug var and reformat out cg

* First working CG, fixe size

* Working flexible CG

* CG are compatible with all implementations

* Fixed CG API

* Update example

* Documentation

* Fix padding tokens in FA

* Review compliance

* Better doc around weird bug

* Style

* Fix for sliding with CG
2025-10-13 11:57:56 +02:00
6c901bdc0e [SAM] Fix typing hints (#41506)
fix
2025-10-13 11:52:00 +02:00
58f9e13313 Fixed Type-hints in function defintions (#41525)
* Explicitly annotate default None parameters as Optional

* make style.

* make style.

* Fixed check_copies.

* fix consistency.
2025-10-13 11:48:37 +02:00
eb28242251 Add MLlama fast image processor (#41391)
* Merge conflict

* add fast processor

* add fast processor

* make style

* add new convert rgb

* use nested group by shape in mllama fast, add support for multiple inputs in group by shape

* refactor after review

---------

Co-authored-by: Vincent <phamvinh257@gmail.com>
2025-10-13 09:16:05 +00:00
65cb8fac6d [Qwen3VL] fix: hidden_states in place modification error (#41535)
```
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 941, in forward
    hidden_states = self._deepstack_process(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 960, in _deepstack_process
    hidden_states[visual_pos_masks, :] = local_this
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Output 0 of SliceBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.
```

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-13 10:50:14 +02:00
3927ffed31 [testing] reduce runtime of HunYuanMoEV1IntegrationTest:test_model_generation (#41373)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-10 22:27:01 +02:00
7164924a7e Fix Latex typesetting in documentation (#41177)
Fix Latex typsetting in documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-10 08:54:27 -07:00
26a5368c44 Allow optuna's catch kwargs passthrough (#41496)
* allow optuna's catch kwargs passthrough

* apply ruff formatting

---------

Co-authored-by: nicha <nicha.api@nectec.or.th>
2025-10-10 13:58:07 +00:00
feca4f3de7 remove tpu_num_cores (#41383)
* remove-tpu-num-cores

* fix

* let's remove it

* style

* Update examples/legacy/seq2seq/finetune_tpu.sh

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-10 15:53:28 +02:00
c6042a4169 Remove outdated flags (#41512)
remove flags
2025-10-10 14:34:47 +02:00
dfd4121cd4 add Trainer import to .md in appropriate cell block for training.ipynb transformers_doc (#41484)
add Trainer import to .md in appropriate cell block for docs
2025-10-10 12:04:07 +00:00
60f6ec438a Fix detectron2 import (#41510)
* fix

* fix

* typo
2025-10-10 13:33:47 +02:00
f9f8bf5a10 Revert local_rank deletion and some cleaning (#41504)
* forgot those

* clean

* Fix

* merge

* fix

* fix
2025-10-10 12:23:04 +02:00
b4067472ae Bump to hfh 1.0.0.rc5 to fix test (#41508) 2025-10-10 12:12:08 +02:00
bc529a3368 More trainer cleaning (#41489)
clean
2025-10-10 11:55:43 +02:00
b92fc0c6e1 [QoL] modular conversion shows LoC saved (#41500)
smol qol conversion
2025-10-10 11:55:23 +02:00
2eae7c7452 Set truncation to False in Qwen3Omni to avoid default truncation (#41473)
* Set `truncation` to `False` in Qwen3Omni to avoid default truncation

* move `padding` and `truncation` to audio default args

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-10-10 09:55:18 +00:00
c5094a4f97 [voxtral] language detection + skipping lang:xx (#41225)
* proc + doc update

* improve doc

* add lang:xx in decode

* update voxtral test

* nit

* nit

* update test value

* use regex
2025-10-10 09:18:30 +00:00
f4487ec521 fix gemma3n case failure (#41426)
* fix gemma3n case failure

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update dependency_versions_table.py

* change the case argument passing way to make the case PASS,
generation_config way need re-visit

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-10 09:15:27 +00:00
e8194fe84f Fix some tests (#41503)
* fix

* fix

* doc
2025-10-10 11:05:09 +02:00
9556b36b2f [causallm tester] automate pipeline mappings + bloom tests (#41318) 2025-10-10 10:02:00 +01:00
5aca530b34 [Parakeet] unnecessary warning & auto mapping (#41412)
* add parakeet to CONFIG_MAPPING_NAMES

* TOKENIZER_MAPPING_NAMES update

* fix auto tokenizer

* update

* fix
2025-10-10 11:00:15 +02:00
4f323369db Fixed tiny incorrect imports in glm4v (#41483)
Fixed tiny import issue in glm4v
2025-10-10 08:57:01 +00:00
f5f3457278 Try to remove pickle - BloomTokenizerFast (#41466)
* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-10 10:52:51 +02:00
3585737746 [kernels] rm yoso kernel (#41495)
* disable kernel mapping

* rm kernel

* delete files

* style

* typo
2025-10-10 10:50:12 +02:00
b543679d0e [kernels] Remove RWKV kernel finally ! (#41493)
* rm kernel

* fix style
2025-10-10 10:32:05 +02:00
ac7777be16 fix bnb model loading (#41499) 2025-10-10 08:27:29 +00:00
17c31a98ac Streaming should be handled at the request-level rather than at the istance level (#41444)
* Streaming should be handled at the request-level rather than at the instance level

* Add tests

* Require torch GPU
2025-10-10 10:24:55 +02:00
b28902c86b Remove DISABLE_KERNEL_MAPPING flag (#41475)
rm disable
2025-10-10 10:19:25 +02:00
d0271be18f Update philosophy (#41438)
* update philosophy

* Update docs/source/en/philosophy.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/philosophy.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* emphasis

---------

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-10 06:52:18 +00:00
0419ff881d Remove local_rank arg from TrainingArguments (#41382) 2025-10-09 18:54:12 +02:00
081391b20e deprecate jit_mode_eval (#41376) 2025-10-09 18:50:45 +02:00
1ddbbdef48 [Trainer] deprecate ray scope (#41403) 2025-10-09 18:50:00 +02:00
c20849bad1 [CI] Fix copies on main (#41486)
fix copies
2025-10-09 18:38:14 +02:00
776eea8612 deprecate overwrite_output_dir (#41323)
* dep

* style

* rm

* wut

* style
2025-10-09 18:36:19 +02:00
3839d51013 report_to default changed to "none" + cleaning deprecated env var (#41375)
* reporting

* fix

* fix
2025-10-09 18:28:48 +02:00
78f79ba5af Update GLM-4.6 doc (#41471)
Update glm4_moe.md
2025-10-09 09:18:05 -07:00
11c597b1b8 Remove deprecated args in Trainer for v5 (#41404)
remove deprecated code
2025-10-09 18:10:14 +02:00
b450d55a91 Remove past_index (#41384)
* remove-tpu-num-cores

* fix

* rm past index

* Revert "fix"

This reverts commit 7608a6c059210957d3a77812e66178c8b79a9313.

* Revert "remove-tpu-num-cores"

This reverts commit ef08a51d71389849851518d67d8ad6c9ea8f04fc.
2025-10-09 18:06:46 +02:00
1a3a5f5289 Remove SigOpt (#41479)
* remove sigopt

* style
2025-10-09 18:05:55 +02:00
823fab4860 Fix bnb fsdp loading for pre-quantized checkpoint (#41415)
* fix

* fix

* get_param_name

* fix device name
2025-10-09 18:05:35 +02:00
42d4e13a0b RT-Detr correct 2d positional embeddings for non-square images (#41380)
* Correct 2d positional embeddings for non-square images

* Simplify bug fix propagate changes to other models

---------

Co-authored-by: Konstantinos Pitas <kostasp210@gmail.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-10-09 17:58:22 +02:00
0eae41ad36 Add Code World Model (CWM) (#41199)
* [wip][cwm] Code World Model stubs and setup in HF Transformers

* [wip] Get other things working

* [wip] Working

* Tokenizer pad

* fix: cwm window attn

* temp remove test

* temp remove test

* Fixes

* Temporarily add auto config remapping option until VLLM 0.11 is out

* Fix model type and add layer validation

* Lint, remove CwmForSequenceClassification

* Lint, tests

* Remove CwmForSequenceClassification

* Lint

* Remove intermediary layer expors/doc errorss, fix tests

* Lint

* run python utils/sort_auto_mappings.py --check_only

* Remove Cwm processor mapping, get check_repo passing

* Remove CwmTextConfig from test

* Add docstring for CwmConfig

* remove global_window and window_pattern params from config

* Fix docstrings

* Revert change to auto docstring util

* lint

* Fixes minus test improvements

* Alter tests to simply check logits

* lint

* Have slow tests use repo, make CwmPretrainedModel passthrough

* Remove decoder layer implementation, use Llama3Decoder + CwmAttetion

* Use linear w/o bias for CwmAttention, add token-level integration test

* Don't ignore config attention bias

* Remove attention bias parameter entirely from config

---------

Co-authored-by: galco <galco@meta.com>
2025-10-09 17:57:45 +02:00
589fc29c9d enhance patched_tearDown to support python 3.11+ (#41429)
* enhance to support python 3.11+

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-09 21:19:29 +05:30
26b5b52676 [Fix] Fix test file error (#40973)
Fix test file error
2025-10-09 15:30:53 +00:00
34b861abd1 🚨 [Attention Masks] Bidirectional masks for encoder and encoder-decoder models (#41265)
* new masks

* fixes

* adjust comments

* fix unnecessary mask creation on sdpa

* simplify masks more

* propogate to other models

* style + repo consistency

* copies

* no comment

* fix attempt

* finally fix grounding dinos

* fix distilbert

* fix executorch

* move to own module

* address first few comments WIP

* revert device comments, simplify executorch further

* fix typo

* add a test for cuda graphs

* move cleanup...

* fix conflict with new main

* fix esm and evolla
2025-10-09 16:56:11 +02:00
b44d91570f [v5] remove load_in_4bit and load_in_8bit (#41287)
* [v5] remove load_in_4bit and load_in_8bit

* fix

* reveert

* fix

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-09 16:34:04 +02:00
d99069195b Cleaning hub kernels (#41477)
* disable kernel mapping

* cleaning

* revert

* fix style
2025-10-09 16:32:18 +02:00
bf38b2d11d Change RT-Detr docs to reflect fixed 640x640 input size (#41364)
* Update rt_detr docs to mention 640x640 input size

The authors of RT-Detr mention that the model was trained on 640x640 images and was meant to be used for inference on 640x640 images.
Also, the current implementation has certain quirks that make training/inferring on images of different sizes problematic. For example,
the pixel masks used for batches of varying image sizes are discarded. I've added a few lines in the docs to notify the user about these issues.

* Batching not possible with variable image sizes

* Remove reference to batching

---------

Co-authored-by: Konstantinos Pitas <kostasp210@gmail.com>
2025-10-09 14:29:16 +00:00
72a3fc275c Remove infer_device (#41088)
* Remove infer_device

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix docs using accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 14:05:39 +00:00
9ef804472b Pickle - part 2 (#41476)
* pickle 2

* pickle 2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-09 13:46:53 +00:00
2b5e4c0d13 Import Callable from collections.abc (#41130)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 12:12:43 +00:00
add4df62ba Fix tests fsdp (#41422)
* Fix tests

* fix !

* fix
2025-10-09 14:09:52 +02:00
3e87072666 Fix auto model configuration for encoder of perceptionlm (#41464)
* fix auto model configuration for encoder of perceptionlm

* delete perception_encoder auto registrations
2025-10-09 14:08:03 +02:00
f0544d7e7c Remove KERAS_NLP_IMPORT_ERROR (#41468)
Remove unused variables of error messages

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 11:58:30 +00:00
d1c6310d6a 🚨 [v5] Rendundant code in nested configs (#41314)
* batch update models

* delete even more

* fix modular super init location

* fix

* fix copies

* fix again, these have force-set values in configs

* fix copies
2025-10-09 13:47:44 +02:00
927aa8bef2 [kernels] Cleanup deta kernel (#41470)
* cleanup deta kernel

* fix modeling
2025-10-09 13:17:42 +02:00
1951f3be8e Update GLM-4.1V MMRope implementation (#41182)
* update for 4D mask

* update

* Update modular_glm4v.py

* 1

* Revert "1"

This reverts commit d13a763e876fa049c5fb70a8b3447b335dbb6098.

* update as glm4v logtic

* update

* 1

* update

* Create convert_glm4v_moe_mgt_weights_to_hf.py

* update

* update
2025-10-09 12:15:47 +02:00
f50fd7fb6b [v5] rm utils/tf_ops/ (#41402)
rm utils/tf_ops/
2025-10-09 10:27:47 +01:00
be3fa93b29 Subconfig is a class attribute (#41308)
* delete

* fix this test

* fix copies

* oke, more tests to fix

* fix last tests on DPT

* deleted accidentally
2025-10-09 10:46:44 +02:00
8137dbdbbd 🚨 [v5] Rename left traces of past_key_value in BERT-like models (#41448)
rename everything
2025-10-09 10:44:44 +02:00
7aa888b7fa Fix doc (#41457)
* dummy

* remove
2025-10-08 20:13:21 +02:00
bfe2b623ef Fix generate outputs and simplify cache tests (#41440)
* start refactoring

* simplify

* tests

* tests

* fix

* zamba

* final fix

* fix
2025-10-08 19:04:18 +02:00
b9be8a8775 enable some falcon-mamba uts on xpu (#41428)
* enable some falcon-mamba uts on xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-08 18:48:04 +02:00
bef73bf8d7 Update hqq.md (#41452)
mistake in loading model
2025-10-08 07:44:56 -07:00
89a4115a6b Validate processing kwargs with @strict from huggingface_hub (#40793)
* initial design draft

* delete

* fix a few tests

* fix

* fix the rest of tests

* common-kwargs

* why the runner complains about typing with "|"?

* revert

* forgot to delete

* update

* fix last issues

* add more detalis in docs

* pin the latest hub release

* fix tests for new models

* also fast image processor

* fix copies

* image processing ast validated

* fix more tests

* typo.and fix copies

* bump

* style

* fix some tests

* fix copies

* pin rc4 and mark all TypedDict as non-total

* delete typed dict adaptor

* address comments

* delete optionals
2025-10-08 16:14:09 +02:00
82ffeb28ad Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation (#40837)
* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: ArminAzizi98 <147081650+ArminAzizi98@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Yuchao Zhang <418121364@qq.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Bo Zheng <368586905@qq.com>
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: 艾力可 <178652170+thalahors@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Shane A <shanea@allenai.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Yaswanth Gali <82788246+yaswanth19@users.noreply.github.com>
Co-authored-by: Akshay Babbar <priv.akshay@outlook.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: Duc-Viet Hoang <vietyb00@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: lilin-1 <256404019@qq.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Jack <32371937+jackzhxng@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: souvikku <107592858+souvikku@users.noreply.github.com>
2025-10-08 13:37:51 +00:00
e064dc05c2 [testing] Fix JetMoeIntegrationTest (#41377)
* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-08 13:11:53 +00:00
20282f13fa [JetMoe] Fix KV head repetition and padding free (#41423)
fix jetmoe
2025-10-08 14:27:22 +02:00
c528f50663 Remove Python 3.9 classifier (#41410)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-08 12:20:36 +00:00
8dfc8e8cfc 🤦 CB nit! (#41413)
* 🤦

* updates

* update cb simple

* merge

* up

* update

* fix

* up

* nit

* rumble this is annoying

* update

* update

* up

* fix

* ....

* cleanup a bit

* nit

* typo

* typing and typo

* nit

* updates

* up

* final fix!

* update

* fix more import issues

* nuke is paged

* up
2025-10-08 13:36:27 +02:00
2166e26cb1 [torchao] Add regex support for ModuleFqnToConfig (#41242)
* Add regex support for ModuleFqnToConfig

Summary:
Similar to https://github.com/pytorch/ao/pull/3084 we added regex support
in transformers so people can use regex to quantize the models.

See https://github.com/pytorch/ao/pull/3084 for docs and precedence of different
configurations

Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev

Test Plan:
pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex

Reviewers:

Subscribers:

Tasks:

Tags:

* Apply style fixes

* add assert for

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-08 11:05:15 +00:00
b13ee63b5a enable new model uts to xpu and fix some failures on xpu (#41386)
* enable new model uts to xpu and fix some failures on xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* add more

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update test_modeling_internvl.py

* Update test_modeling_llava.py

* Update test_modeling_qwen2_5_omni.py

* Update test_modeling_llava_next_video.py

* Update test_modeling_qwen3.py

* Update test_modeling_whisper.py

* Update test_modeling_whisper.py

* Update test_modeling_llava.py

* Update test_modeling_llava.py

* Update test_modeling_qwen2_5_omni.py

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-08 10:14:50 +00:00
1c5ac899e8 Use accelerator API to free device memory (#41195)
* Use accelerator API to free device memory

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use clear_device_cache

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Cleanup

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Cleanup

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-08 12:11:18 +02:00
957b1f3696 Fixing comments in __init__ file (#41414)
nit
2025-10-08 12:07:26 +02:00
13791d8f48 [v5] Bump min version of bitsandbytes to 0.46.1 (#41283)
* bump bitsandbytes to 0.46.1

* huge cleanup

* style

* fix

* req

* fix

* importerror

* fix
2025-10-08 12:04:26 +02:00
7e475552be 🚨 [v5] Prune prune_heads (#41417)
* remove _prune_heads

* remove prune_heads

* finalize the purge

* remove another patterns
2025-10-08 10:25:13 +01:00
46db0edf3b 🚨🚨 Remove all traces of legacy cache format (#41378)
* remove

* more

* add back

* tests

* revert classes

* tests

* add exceptions

* reapply modular

* rename

* oupsi

* start with whisper

* fix tests

* fix

* fix

* fix

* typing
2025-10-08 11:14:44 +02:00
ee5488440b Tiny Cleanup - Removed duplicate class field definition's (#41293)
* Removed duplicate-class-field-definition
's using RUFF PIE794

* Removed duplicate-class-field-definition
's using RUFF PIE794

* Ruff format.

* Removed duplicate-class-field-definition

* Added New ruff rule to detect duplicate class field defs

* remove comment

* order

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-08 10:49:34 +02:00
34dcd73b57 v5 dev version (#41436) 2025-10-08 10:45:33 +02:00
3553f0bc23 Fix overriding common_kwargs defaults in processor calls (#41381)
* set common_kwargs defaults before updating with kwargs

* change order to override defaults common_kwargs
2025-10-07 23:13:56 -04:00
242eb9cbdc Remove deprecation warning (#41425)
* remove

* fix space
2025-10-07 19:21:14 +02:00
50090c3fc8 [v5] Delete left traces of feature extractor (#41321)
delete the left traces
2025-10-07 18:24:08 +02:00
ccbaa1670a Fix incorrect assignment in update_device_map for GPTQ quantizer (#41328)
Fix incorrect assignment in update_device_map for GPTQ quantizer

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-07 17:28:55 +02:00
c562c5d801 [v5] Bump accelerate to 1.1.0 (#41234)
* bump to 1.1.0 !

* bump accelerate

* fix

* None

* fixed !

* style
2025-10-07 17:18:32 +02:00
88e946e062 Fix early CUDA initialisation (#41409)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-07 14:37:17 +01:00
93464a0279 Prefer raising TypeError exception for invalid type (#41346)
* Fixed raising of TypeError exception for invalid type

* Fixed failing tests.
2025-10-07 13:11:42 +00:00
0c9a72e457 [Model] Lfm2Moe (#41401)
* [new-models] LFM2-MoE

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [docs] add in template lfm2_moe doc files

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration] update configuration class

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm] minor: fix rotary_emb typo

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling] modular/modeling files for Lfm2Moe

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] fix Lfm2Moe modular/modeling

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration][lfm2_moe] update configuration keys with latest config changes

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] make fixup

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm2_moe] address comments: dtype, mlp, buffers

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration][lfm2_moe] add initializer_range

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm2_moe] include init_weights to pass test_initialization

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests][causal_lm] include pos_emb as possible rope attribute

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] remove load_balancing_loss_func due to lack of support for hooking expert biases

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] make style

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] MoE refactor PR update in LFM2Moe

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests] lfm2_moe: unit tests

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] update LFM2-8B-A1B repo id

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests] lfm2: update ModelTests for lfm2

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* Update LFM2 documentation

Updated the LFM2 documentation to reflect the addition of a new model size and clarified architectural details.

* Add Lfm2Moe documentation

Add Lfm2Moe model documentation with overview and example usage.

* [misc] fix ci

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [docs] remove trust_remote_code

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] ci: fix modular

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* reapply modular

* simplify

* remove static address and inplace op

* simplify

* simplify a bit more the modular

* imports

---------

Signed-off-by: Paul Pak <paulpak58@gmail.com>
Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-07 15:09:58 +02:00
b4428d545f Fix test for model with dotted name and relative imports (#41343) 2025-10-07 13:55:54 +01:00
0464d9eb37 [Cache] lfm2 cache: allocate empty kv layers during init (#41396)
* [Cache] lfm2 cache: allocate empty kv layers during init

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [Cache] lfm2_cache: update modular file

Signed-off-by: Paul Pak <paulpak58@gmail.com>

---------

Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-10-07 14:01:31 +02:00
da7b8ce11f [kernels] Kernel Config (#41232)
* first config

* add kernel_config

* add import logic

* fixing style

* compare class name

* add comments

* rm import

* adding kernel md files

* add to toctree

* adding to main_classes

* simplify required config

* add to doc

* style

* store the mapping

* remove nested func

* add hub mixin

* fix

* imports

* fix
2025-10-07 13:58:20 +02:00
4763b8c5b8 Correct numerical regression in vision embeddings (#41374)
created modeling file
2025-10-07 13:43:24 +02:00
caa14e7dab fix resample in asr pipeline (#41298) 2025-10-06 17:31:10 +00:00
73f8c4b8ad fix asr ut failures (#41332)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-06 17:12:19 +00:00
57e82745f9 [v5] Sync Bert and Bart eager attention (#41248)
* remove from modeling files

* remaining changes

* style / copies

* revert deprecated models and fixup some models

* oops

* sync attn impl

* fix style/copies

* fix distilbert

* remove dim check
2025-10-06 18:49:01 +02:00
505387c05b Update from pretrained error when loading (#33380)
* init commit

* style

* take comments into account

* mrege with main and simplify

* nits

* final

* small fixes

* fix

* super small update!

* add another test

* up up

* update

* fixes

* sort them by default
2025-10-06 16:10:19 +00:00
e00f46f16e serve: add non-streaming mode to /v1/responses; stream event parity; remove placeholder logprobs (#41353) 2025-10-06 16:04:17 +00:00
0395ed52ae [CB] Refactors the way we access paged (#41370)
* up

* refactor the way we handle paged attention

* affect serve as well

* update

* fix

* cup
2025-10-06 17:55:31 +02:00
39b0c9491b Remove unused function patameters (#41358)
Remove unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-06 15:38:17 +00:00
11e4b5e5ee make some ut cases pass on xpu w/ latest torch (#41337)
* make some ut cases pass on xpu w/ latest torch

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update test_modeling_llava_onevision.py

* Apply style fixes

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-06 15:38:00 +00:00
fa36c973fc Remove unnecessary list comprehension (#41305)
Remove unnecessary comprehension

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-06 14:49:02 +00:00
7a1aeec36e Fixes in check_model_inputs, GPTBigCodeModel and ImageGPTModel (#40811)
* misc fixes

* fix

* Update src/transformers/models/imagegpt/modeling_imagegpt.py

* Apply suggestion from @IlyasMoutawwakil

* pickup use_cache from args input as well

* fix
2025-10-06 16:34:24 +02:00
297a41a6cf Use canonical get_size_with_aspect_ratio (with max_size) from transformers.image_transforms to fix #37939 (#41284)
* Use canonical get_size_with_aspect_ratio (with max_size) from transformers.image_transforms to fix #37939

* Fix import sorting/style

* Fix import order

* Refactor: use canonical get_size_with_aspect_ratio across image processors (except YOLOS)

This commit updates image processing utilities in multiple model processors to use the shared
transformers.image_transforms.get_size_with_aspect_ratio for consistent resizing logic and
aspect ratio handling.

YOLOS processors are intentionally left unchanged in this commit to preserve their current
behavior and avoid breaking model-specific padding/resizing assumptions. YOLOS will be updated
in a dedicated follow-up PR once compatibility is fully verified.

* ruff fixes

* Fix check_copies.py references for get_size_with_aspect_ratio to use canonical transformers.image_transforms version

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-10-06 10:15:56 -04:00
ae60c77689 Fix flash_attention.py: wrong argument passing for attn_implementation (#41347)
* Fix flash_attention.py: wrong argument passing for attn_implementation

The name of the attn type argument for `_flash_attention_forward()` should be `implementation`, instead of `attn_implementation` which currently uses in the function call. This would result in wrong type specification.

* modify the kwargs inside _flash_attention_forward

* fix the doc

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-06 15:36:40 +02:00
6bf6e36d3b [testing] update test_longcat_generation_cpu (#41368)
* fix

* Update tests/models/longcat_flash/test_modeling_longcat_flash.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-10-06 13:21:29 +00:00
4903cd4087 🚨 Remove BetterTransformer (#41367)
remove
2025-10-06 15:18:12 +02:00
a5700c497e Better typehints for apply_chat_template (#41355) 2025-10-06 13:14:03 +00:00
089d573aca Fix typo in model proposal template (#41352) 2025-10-06 13:06:50 +00:00
c27b67f0cd 🚨 [v5] Remove relative position embeddings (for bert like models) (#41170)
* remove from modeling files

* remaining changes

* style / copies

* revert deprecated models and fixup some models

* oops
2025-10-06 14:21:41 +02:00
a89bdcf5f1 Fixing a typo for BLT model (#41325) 2025-10-06 12:16:45 +00:00
0452f28544 [ModularChecker] QOL for the modular checker (#41361)
* update

* fancy table fancy prints

* download to cache folder, never need it everagain

* stule

* update based on review
2025-10-06 12:52:10 +02:00
9db58abd6e Check model inputs - hidden states (#40994)
* update all models

* fix copies

* skip aria tests

* update other models

* skip should be in test, not tester

* i think this is more descriptive as a name

* find and replace for new models
2025-10-06 11:48:52 +02:00
db711210d2 Fix trainer for py3.9 (#41359)
fix
2025-10-06 11:36:05 +02:00
163601c619 Standardize PretrainedConfig to PreTrainedConfig (#41300)
* replace

* add metaclass for full BC

* doc

* consistency

* update deprecation message

* revert
2025-10-06 11:34:02 +02:00
55b172b8eb 🚨 Bump to Python 3.10 and rework how we check 3rd-party libraries existence (#41268)
* cleanup

* add check

* fix

* remove all global variables

* fix

* add lru caches everywhere

* fix

* fix

* style

* improve

* reorder all functions

* fix order

* improve

* fix

* fix

* fix
2025-10-06 11:04:19 +02:00
1ec0b54414 Rope for Qwen2--5-vl (#41173)
qwen2--5-vl
2025-10-06 10:56:29 +02:00
0947b9042c Fixed tiny incorrect import in gemma3 (#41354)
Fixed tiny import issue in gemma3
2025-10-06 10:55:42 +02:00
e11a00a16f JetMoe Fix jetmoe after #40132 (#41324)
* update

* up
2025-10-04 11:02:13 +02:00
1bc75db9bd Fix lr_scheduler_parsing (#41322)
* fix

* fix
2025-10-03 17:51:17 +02:00
c2b3cc3e64 Fix jamba (#41309)
* reactivate tests

* first pass

* fix

* fix bias

* fix and simplify

* finally fix this stupid bug

* add skips

* remove bad stuff

* fix copies

* simplify
2025-10-03 16:54:19 +02:00
5abfa43f02 Security/fuyu (#41320)
remove reference to compromised repo
2025-10-03 14:13:41 +00:00
217ff1e4ef AutoAWQ tests (#41295)
* initial commit

* fix

* fix multi gpu

* fix expected output

* fix

* latest

* add comment

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-03 15:17:10 +02:00
5339f72b9b 🚨 [unbloating] unify TypedDict usage in processing (#40931)
* just squash commits into one

* fix style
2025-10-03 14:17:59 +02:00
42bcc81ba2 Minor security fix for ssh-runner.yml (#41317)
security issue

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-03 14:14:34 +02:00
cd4422922e Add modular detector (#41289)
* doc

* doc

* no remote code

* safe-ize the release + remove remote

* fixes

* add some documentation as well
2025-10-03 14:11:10 +02:00
59eba49237 download and use HF Hub Cache (#41181)
use hub cache

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-03 11:11:37 +02:00
de3ee737cf Fix README.md error when installing from source (#41303) 2025-10-02 16:08:27 -07:00
b914445f77 Italian translation for README.md (#41269)
chore: add Italian translation for README.md
2025-10-02 15:59:28 -07:00
41e5abac5c FIX: Bug in PEFT integration delete_adapter method (#41252)
The main content of this PR is to fix a bug in the delete_adapter method
of the PeftAdapterMixin. Previously, it did not take into account
auxiliary modules from PEFT, e.g. those added by modules_to_save. This
PR fixes this oversight.

Note that the PR uses a new functionality from PEFT that exposes
integration functions like delete_adapter. Those will be contained in
the next PEFT release, 0.18.0 (yet unreleased). Therefore, the bug is
only fixed when users have a PEFT version fullfilling this requirement.
I ensured that with old PEFT versions, the integration still works the
same as previously. The newly added test for this is skipped if the PEFT
version is too low.

(Note: I tested locally with that the test will pass with PEFT 0.18.0)

While working on this, I also cleaned up the following:

- The active_adapter property has been deprecated for more than 2 years
  (#26407). It is safe to remove it now.
- There were numerous small errors or outdated pieces of information in
  the docstrings, which have been addressed.

When PEFT < 0.18.0 is used, although we cannot delete modules_to_save,
we can still detect them and warn about it.
2025-10-02 18:36:57 +02:00
da3c7d1d36 🚨 [DistilBert] Refactor Attention (#41163)
* refactor

* allow pos ids for flattened sequences
2025-10-02 17:50:48 +02:00
e54defcfc2 [Flex Attn] Fix lse x attention sinks logic (#41249)
fix
2025-10-02 17:49:39 +02:00
b3bd815786 Fix mxfp4 dequantization (#41292)
fix
2025-10-02 16:47:42 +02:00
e4930d6bde 🚨 [V5] Remove deprecated resume_download (#41122)
Remove deprecated `resume_download`

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-02 16:44:34 +02:00
7adb43e60a Build doc in 2 jobs: en and other languages (#41290)
* separate

* separate

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-02 14:33:57 +00:00
e1f1d32af0 Remove some previous team members from allow list of triggering Github Actions (#41263)
* delete

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-02 16:32:28 +02:00
1d7ebff398 Fix - remove deprecated args checking in deepspeed intergrations (#41282)
Remove deprecated args checking in deepspeed intergrations

Signed-off-by: nguyen599 <pnvmanh2123@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-02 13:59:50 +00:00
9d02602f0f Remove test_initialization (#41261)
remove it
2025-10-02 15:23:43 +02:00
248e7ef8bc [docs] remove references to recently deleted classes in non-en docs (onnx, feature processors) (#41286)
remove references to old classes
2025-10-02 12:59:28 +00:00
bc33fd3fc2 Add processor and intergration test for qwen3vl (#41277)
* support aux loss in qwen3vlmoe

* update qwen3vl processor test!

* add integration tests for qwen3vl-30a3

* remove duplicated decorator

* code clean

* fix consistency

* do not inherit from nn.Linear for better quantization

* pass check
2025-10-02 14:59:04 +02:00
639ad8ccd9 feat: use aws-highcpu-32-priv for amd docker img build (#41285)
* feat: use `aws-highcpu-32-priv` for amd docker img build

* feat: add `workflow_dispatch` event to docker build CI
2025-10-02 12:53:14 +00:00
894a2bdd8c Fix pylint generator warnings (#41258)
Fix pylint generator warnings

Signed-off-by: cyy <cyyever@outlook.com>
2025-10-02 12:35:42 +00:00
1cc9069551 Fix unnecessary single-item container checks (#41279)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 12:35:11 +00:00
4f286fbbf8 Biogptlogits (#41270)
added logits slicing to BioGpt for seq classifier

Signed-off-by: Aviral <aviralkamaljain@gmail.com>
2025-10-02 12:33:48 +00:00
1d91a8a454 Use max/min (#41280)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 12:15:27 +00:00
f1b64c5b06 Unify is_torchvision_v2_available with is_torchvision_available (#41259)
Fix is_torchvision_v2_available

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 11:56:37 +00:00
2f3e266692 fix async client for transformers chat (#41255)
* fix-client

* fix
2025-10-02 13:23:37 +02:00
313504bcdd 🚨 [v5] remove deprecated generate classes (constraints and beam scorers) (#41223)
rm
2025-10-02 12:11:11 +01:00
8f14300663 Allow private Space id for Trackio (#40948)
* allow prive space id for trackio

* complete docstring
2025-10-02 12:38:25 +02:00
734732140a Deprecate Trackio environment variables and deploy to Spaces by default (#40950)
* allow prive space id for trackio

* complete docstring

* Deprecate environment variables for Trackio integration; use TrainingArguments instead and deploy by default

* style

* Enhance documentation for Trackio Space ID in TrainingArguments
2025-10-02 12:37:55 +02:00
7938e91faa MoE + vllm = 😻 (#40132)
* update modeling mixtral

* oups[13;2u

* fix

* better naming?

* compute softmax and top_k inside the experts

* update minamax as well

* models that will need an update

* more models that need a fix

* stash

* fix mixtral

* update olmoe

* update

* update

* current changes

* nits

* molmoe is now fixed

* olmoe is good to go!

* refactor qwen2_moe

* fixes

* fixed moe

* fix qwen2 modular

* nit

* qwen2_moie test script works

* tricky rope !

* fix qwen3

* DeepSeek v3 MoE Standardization (#40538)

* DeepSeek-v3

Shared

Shared

* Dependents of DS3

* Standardize GLM4V MoE (#40539)

* up

* Standardize VitPose's MoE (#40549)

* VitPose

* outside

* outside

* outside

* fix

* update dbrx

* dbrx... the magix

* Refactor Ernie 4.5's MoE (#40547)

* Isolate Ernie fixes

* fix moe

---------

Co-authored-by: Vasqu <antonprogamer@gmail.com>

* fix style

* style

* fix copies

* style

* latest changes

* fixes

* had to stage

* current updaters

* up

* another modular

* modular graniteMoe

* some update

* draft another modular moe

* updaters

* up

* fix nit

* q3 nit

* fix phi moe

* we're going up up up up its our mooooment

* fix switch transformers this time around

* up

* gptsan japanese is deprecated forget about it

* fix mixtral to not be a linear (gives us more freedom)

* update

* fix copies gone wrong try catch nothing

* fix mixtral

* new refactor again

* update aria as well

* up dbrx and deepseekv3

* nit

* fix phimoe?

* fix deepseek v3

* nits

* don't bother with this one please

* up olmoe

* ??

* fix olmoe

* yups

* fiupx

* ish

* hot patch

* new qwen3

* updates

* up

* nit

* fix copies

* fix

* nits

* we're going up up up

* nits

* switch_transformesr edge case

* lol modular gptsan?

* fix deepseek

* finally all modeling match modular

* update

* up

* up

* dang

* up

* up aria

* fix dbrx

* nits here and there

* finish fixing dbrx

* fix deepseek

* upd

* up

* fix flex olmo

* updated

* update jamba

* JAMBA is stil a bit todo

* forward forward

* fix dots11

* update

* fix hunyuan

* fix some other

* update phimoe

* fuck you phimoe you are now submitted

* submit granitemoe as well

* try to fix some other models, reduces some of the failures

* fix olmoe and qwem2moe

* up

* up

* fix qwen2_moe

* update modular make it again, simpler

* nits

* up

* up

* fix

* someswitch reductions

* up

* fix qwen3vl

* some fixes to jetmo

* these should be shipped to the modular to fix jetmoe

* fix most of the nllb failures

* more nllb fixes

* fix the modular

* remove nllb modular as it sucks for now

* ?

* fix granitemoe

* granitemoehybrid don't have rope

* use rope when rope, no rope when no rope

* updates

* finish fixing dumbgrainite

* fix most of minimax

* fix

* update modular

* ?

* up

* up jetmoe still broken

* up

* fix, now align the moe

* fix jetmoe

* fix styling and qwen3 repo consitency

* updatge

* up up

* update ruff?

* nits

* modeling is goot now for switch

* fix

* more fixses to switch!

* fix some siwtch test

* ?

* ?

* up

* fix switch modular!

* nit?

* uip

* subtest

* can't believe I wasted so much time on this...

* fix

* updates

* nits

* nit jamba is fucking annoying

* ?

* fix?

* oups

* good good

* styling

* up

* make sure qwen2 sliding works!

* fix dbrx small

* lol

* nits

* fix one test

* fix load balancing loss issue

* fix jamba

* fix nllbmoe

* fix jamba consistency and doc?

* up

* thse are correct

* up

* up

* up

* some of the final cleanup

* update

* up

* fix some revert in granimoe

* bring back attention multipliers for the granite family we'll see later on if they need removal

* small jamba fix docstring and typing

* fix phimoe

* yup

* fix unk returndict in granitemoes

* up

* fix qwen config

* fix phiemoe check quality

* nits

* update based on caught non relative imports!

* fix dbrx

* Apply suggestions from code review

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix copies

* fiuxp

* fix dot1 regression!

* fix phimoe issue

* fix phi moe

* fix float() for some models

* fix jamba regression

* ui

* more dtype issues

* fix deepseek2 and 3?

* proper update

* fix modular deepseek!

* jamba jambaaaaaa

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-02 12:12:44 +02:00
e6a8e7debe Fix binding of video frames to video placeholder in InternVL model (#41237)
* Fix binding video frames to video placeholder in prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add test on binding video frames to prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix code style issues

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix broken tests on `InternVLProcessor`

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add `return_tensors` to video processor defaults

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

---------

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
2025-10-02 09:43:35 +00:00
30b79effb5 Remove SageMakerTrainer (#41267)
* Remove SageMakerTrainer

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More removal

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 09:16:32 +00:00
aabf0a03cb Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229)
* fix multi-video timestamp bug in qwen3vl,glm4v

* run make fix-copies to sync modular files

* run make fix-copies to sync modular files

---------

Co-authored-by: UBT <daqin.luo@ubtrobot.com>
2025-10-02 11:15:57 +02:00
bcdd5532bf Use regex defailed flags (#41264)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 08:34:09 +00:00
55d63e86ea fix asr pipeline ut failures (#41275)
* fix asr pipeline ut failures

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* make style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-02 10:32:03 +02:00
522b79a346 add more activation kernels, follow up (#40944)
* add more activation kernels

* fixing style

* fix version
2025-10-02 08:45:05 +02:00
9f2d5666f8 docs: update bitsandbytes platform support (#41266) 2025-10-01 14:27:19 -04:00
9d8f693c7e add peft team members to issue/pr template (#41262)
* add

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-10-01 17:26:59 +00:00
94bbf8e199 Resolve remote custom module path warnings (#41243) 2025-10-01 15:55:42 +00:00
c4b505d0f7 Don't convert to safetensors on the fly if the call is from testing (#41194)
* don't convert

* disable

* Update src/transformers/modeling_utils.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix

* disable

* disable

* disable

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-01 17:46:21 +02:00
01c9e1ba68 [t5gemma] fix get_text_config and related fixes (#40939)
* tmp commit

* t5gemma fixes
2025-10-01 15:55:26 +01:00
025531981c [FA3] Fix masking and loading logic in same process (#41217)
fix loading and fa3 masking
2025-10-01 16:36:12 +02:00
3256773974 FP-Quant NVFP4 and Python 3.9 support (#39876)
* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

* nvfp4

* nvfp4 tests

* FP-Quant version bumped

* nvfp4 default and docs update

* trainable

* cpu if pseudoquant

* proper group size selection

* gsr

* qutlass requirement version bumo

* Upstream docker copy

* docs update

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-01 13:58:22 +00:00
d848a3953a Remove all instances of is_safetensors_available (#41233)
* safetensors is a core dep

* fix

* ok

* simplify branching

* keep it for now

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-01 13:57:28 +00:00
e4913bdf50 🚨 [v5] Remove SinkCache (#41107)
Remove SinkCache

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:46:55 +00:00
1c8f206ecc Fix pylint warnings (#41222)
* Remove unused variables

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove reimported packages

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint warnings

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:16:22 +00:00
3016717f0d Use removeprefix and removesuffix (#41240)
* Use removeprefix and removesuffix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:13:04 +00:00
ca975f1cb8 [V5] Remove deprecated transformers.onnx (#41214)
* Remove deprecated transformers.onnx

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove onnx docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-10-01 12:17:04 +00:00
1d1ac07893 [repo utils] Update models_to_deprecate.py (#41231)
* update models_to_deprecate

* exclude this file

* handle typos and aliases

* don't commit files

* PR suggestions; make fixup
2025-10-01 12:01:52 +00:00
bcec3e2175 fix TrainerIntegrationDeepSpeed UT failures (#41236)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-01 13:55:01 +02:00
ae879f67f8 🚨 [v5] Delete feature extractors used for vision (#41174)
* bye bye

* remove from docs

* do not use feature extractor here

* fix docs

* do not delete it

* forgot these
2025-10-01 13:20:58 +02:00
1c4d9982d3 Use math.log2 (#41241)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 09:52:31 +00:00
db1cc65c06 Video processor accepts single frames on cuda (#41218)
* fix

* why was is np if input is in torch
2025-10-01 10:55:11 +02:00
f22cb1e868 fix qwen text config (#41158)
* fix qwen text config

* fix tests

* fix one more test

* address comments
2025-09-30 17:23:44 +00:00
374ded5ea4 Fix white space in documentation (#41157)
* Fix white space

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix autodoc

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 09:41:03 -07:00
16a141765c [docs] Fix tp_plan (#41205)
remove manual
2025-09-30 09:27:50 -07:00
5d1e853032 [Trainer] deprecate num_train_tokens (#41165)
* dep

* fix

* fix
2025-09-30 15:53:16 +00:00
cecd92849e [v5] Remove train kwargs (#41127)
* rm train kwargs

* fix
2025-09-30 17:43:25 +02:00
103fa6d235 [v5] Remove deprecated prediction loop (#41123)
* rem deprecated

* more

* rm all instances of legacy arg
2025-09-30 17:43:01 +02:00
aa3e8798ba [v5] Remove tokenizer from Trainer (#41128)
* tokenizer deprecated

* style

* forgot this

* style
2025-09-30 17:42:10 +02:00
e99dee6470 Remove old sagemaker api support (#41161)
* fix

* fix
2025-09-30 17:41:52 +02:00
dded9fd112 [v5] More Training Args cleaning (#41131)
clean
2025-09-30 17:38:07 +02:00
6fb6117abe Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults" (#41124)
* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults (#3…"

This reverts commit df67cd35f0ca1a1cbf7147b2576db31b16200cf4.

* fix
2025-09-30 17:37:42 +02:00
5bdb70450d Fix sliding window attn mask (#41228)
* Fix sliding window attn mask

* Clearer test

* Apply style fixes

* If Picasso made ascii drawings he would have made this

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-30 17:22:53 +02:00
a61fc6a0b9 Fix typing of train_args (#41142)
* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix fsdp typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 14:28:02 +00:00
919a4845fb Unify is_torchvision_v2_available with is_torchvision_available (#41227)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 15:21:49 +01:00
8e7b0655f1 update code owners (#41221)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-30 16:21:19 +02:00
2dd175e6bb Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore (#41143)
Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore.

Co-authored-by: frozenleaves <frozen@Mac.local>
2025-09-30 14:19:57 +00:00
cf0887f62c Remove old Python code (#41226)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 14:15:59 +00:00
52f5eca7c9 🚨 [v5] Remove headmasking (#41076)
* first attempt at removing

* copies

* last bits in core

* quick fixes

* tests purge

* docs and examples

* some fixes

* more

* another round of cleanups

* fix

* fix a bunch of models

* fix dummy bert

* fix

* fix new model

* fix signature change

* fix

* fix style/copies

* new models

* fix copies didnt find that damn

* test

* this shouldnt have happened during model addition
2025-09-30 16:04:57 +02:00
a80f05dfcb [generate] cache missing custom generate file (#41216)
* cache missing custom generate file

* make fixup
2025-09-30 13:32:24 +00:00
1f1e93e095 Align pull request template to bug report template (#41220)
The only difference is that I don't users to https://discuss.huggingface.co/ for hub issues.
2025-09-30 14:25:41 +02:00
2a596f5b2f [ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006)
add accepts_loss_kwargs=False to EsmPreTrainedModel

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-30 12:06:47 +00:00
3edd8048b0 Trainer: Pass num_items_in_batch to compute_loss in prediction_step (#41183)
* Add num_items_in_batch computation to predict_step.

* address comments.

* Fix test cases.

* fixup

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-30 09:45:17 +00:00
59035fd0e1 Avoid assumption that model has config attribute in deepspeed (#41207)
Avoid assumption that model has config in deepspeed
2025-09-30 11:42:50 +02:00
d97397787e Wait for main process in _save_checkpoint to ensure best checkpoint exists (#40923)
* Update trainer.py

* fix

* fix format

* move barrier, delete redundant
2025-09-30 11:41:03 +02:00
06c04e0851 Deprecate half_precision_backend (#41134)
* deprecate

* remove

* rm apex

* fix

* fix

* fix doc
2025-09-30 11:36:44 +02:00
0e5a975608 Fix Qwen3-Omni audio_token_id serialization issue (#41192)
Fix Qwen3-Omni audio_token_id serialization by overriding parent's attribute_map

- Override attribute_map in Qwen3OmniMoeThinkerConfig to prevent inheritance of incorrect mapping
- Parent class maps audio_token_id -> audio_token_index, but implementation uses audio_token_id directly
- Fixes issue where custom audio_token_id values were not preserved during save_pretrained/from_pretrained cycles

Fixes #41191
2025-09-30 11:15:56 +02:00
42c682514b docs/examples(speech): pin CTC commands to Hub datasets; add Windows notes (#41027)
* examples(speech): load Common Voice from Hub; remove deprecated dataset-script references (Windows-friendly notes)

* docs/examples(speech): pin CTC streaming & other CTC commands to Hub datasets; add Windows notes

* make style

* examples(speech): align DataTrainingArguments help with datasets docs; minor wording fixes

* docs/examples(speech): address review  remove Hub subsection & Whisper tip; align dataset help text

* style: apply ruff/black/usort/codespell on examples/speech-recognition

* Apply style fixes

* Update examples/pytorch/speech-recognition/README.md

* update doc to match load_dataset

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-30 08:38:31 +00:00
aaf1269d83 Remove unnecessary Optional typing (#41198)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 08:38:05 +00:00
4a02bc7004 [docs] Fix links (#41110)
fix
2025-09-30 08:53:07 +02:00
def4a37e19 Embed interactive timeline in docs (#41015)
* embed timeline in docs (test web componentand Iframe)

* test scaling

* test multiple scales

* compensate scale in width

* set correct syle and scale

* remove bottom space created by scale

* add timeline as a separate page

* reformulate docs after review
2025-09-30 01:36:08 +00:00
3e975acc8b Fix docker quantization (#41201)
* launch docker

* remove gptq for now

* run tests

* Revert "run tests"

This reverts commit f85718ce3a21d5937bf7405b8925c125c67d1a3e.

* revert
2025-09-29 16:36:30 +00:00
8635d8e796 Fix 8bit bnb loading (#41200)
* Fix 8bit

* oups forgot the case where it is not prequantized
2025-09-29 18:34:46 +02:00
1f0e9a4778 Fix EXAONE-4.0 dummy id (#41089)
* Fix EXAONE-4.0 dummy id

* Fix exaone4 dummy (#1)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-29 16:30:55 +00:00
bd37c45354 Add EdgeTAM (#39800)
* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* add rough necessarry changes

* first working edgetam

* fix issue with object pointers

* Use modular as much as possible

* nit fixes + optimization

* refactor spatial perceiver

* cleanup after merge

* add working edgetam

* improve perceiver resampler code

* simplify/unify rope attention logic

* Improve comments in apply_rotary_pos_emb_2d

* add working tests

* fix test timmwrapper

* add docs

* make fixup

* nits

* fix modular

* fix modular

* PR review part 1

* split apply_rotary_pos_emb_2d

* add granularity to _prepare_memory_conditioned_features

* add dates to doc

* add separate mlp for memory attention

* Fix memory on wrong device

* store processed frames in dict

* update checkpoints in tests

* update dates

---------

Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-09-29 11:54:54 -04:00
c1db38686a [Kernels Attention] Change fallback logic to error out on explicit kernels request and include FA3 (#41010)
* fix

* be more strict

* change logic to include fa3

* fix the case where nothing is requested

* modify old tests + add kernels related tests

* style
2025-09-29 17:10:59 +02:00
5426edecab Make quantizers good citizens loading-wise (#41138)
* fix param_needs_quantization

* rewrite most hqq

* clean

* fix

* comment

* remove it from exception of safetensors

* start on bnb 4bits

* post-rebase fix

* make bnb4 bit a good citizen

* remove forgotten print

* make bnb 8bits a good citizen

* better hqq

* fix

* clean

* remove state dict from signature

* switch method

* make torchao a good citizen

* fixes

* fix torchao

* add check

* typo
2025-09-29 17:04:45 +02:00
399c589dfa Separate docker images for Nvidia and AMD in benchmarking (#41119)
Separate docker images for Nvidia and AMD
2025-09-29 17:03:27 +02:00
52cbc7c868 Fix attention sink implementation in flex attention (#41083)
* Fix attention sink implementation in flex attention

* fix dim

* fix

* Remove print

* raisae error when return_lse is False yet s_aux is providewd

* Clean test files for merge

* Update src/transformers/integrations/flex_attention.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* force return lse

* Add to doc

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-09-29 14:33:03 +00:00
de9a75f5b0 fix(trainer): Avoid moving model with device_map (#41032)
* fix(trainer): Avoid moving model with device_map

When a model is loaded with `device_map="auto"` and is too large to fit on a single GPU, `accelerate` will offload some layers to the CPU or disk. The `Trainer` would previously attempt to move the entire model to the specified device, causing a `RuntimeError` because a model dispatched with `accelerate` hooks cannot be moved.

This commit fixes the issue by adding a check in `_move_model_to_device` to see if the model has an `hf_device_map` attribute. If it does, the device placement is assumed to be handled by `accelerate`, and the `model.to(device)` call is skipped.

A regression test is added to ensure the `Trainer` can be initialized with a model that has a `hf_device_map` that simulates offloading without raising an error.

* Added the logger warning for the move model

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2025-09-29 14:31:42 +00:00
bcc0dae77c enable flex attention ut cases on XPU (#40989)
* enable flex attention ut cases on XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 14:30:49 +00:00
fcd483f0ff Bump hfh prerelease version (#41175) 2025-09-29 16:28:36 +02:00
a3fa1d3993 Fix inaccurate train_tokens_per_second when resuming from checkpoint (#41156)
* fix(trainer): Fix the issue of inaccurate token count in training sessions

During the training process, the initial token count was not saved, leading to inaccurate speed calculation. Now, the initial token count is saved and the increment during the session is calculated, ensuring that the speed metric accurately reflects the performance of the current training session.

* 修复错误

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 16:22:35 +02:00
ad74fba085 [v5] Remove model_parallel deprecated feature (#41166)
* fix

* remove model parallel

* style

* removed a bit too much

* rm comments

* fix
2025-09-29 16:14:03 +02:00
38a08b6e8a More typing fixes (#41102)
* Fix noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* remove noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix chars

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-29 13:11:53 +00:00
4fade1148f [tests] CausalLMTester automatically infers other test classes from base_model_class 🐛 🔫 (#41066)
* halfway through the models

* update test checks

* refactor all

* another one

* use tuples

* more deletions

* solve bad inheritance patterns

* type

* PR ready?

* automatic model class inference from the base class

* vaultgemma

* make fixup

* make fixup

* rebase with gpt2

* make fixup :'(

* gpt2 is special
2025-09-29 15:05:08 +02:00
cdba28c344 [XPU] Add MXFP4 support for XPU (#41117)
* XPU supports gpt-oss MXFP4

* Complete MXFP4 UT file and comment information

* Complete MXFP4 UT file and comment information

* Fix code style

* Fix code style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 12:10:41 +02:00
2dcb20dcec CI Runners - move amd runners mi355 and 325 to runner group (#41193)
* Update CI workflows to use devmi355 branch

* Add workflow trigger for AMD scheduled CI caller

* Remove unnecessary blank line in workflow YAML

* Add trigger for workflow_run on main branch

* Update workflow references from devmi355 to main

* Change runner_scale_set to runner_group in CI config
2025-09-29 11:14:19 +02:00
d0d574b1e4 Modernbert fix (#41056)
* Add FA to docker

* Fixed padding for mdernbert

* Fixed logits and hidden states extraction in ModernBertForMultipleChoice

* Added a test for ModernBertForMultipleChoice

* fixes

* More fixes and GREEN CI

* consistency

* moar consistency
2025-09-29 10:52:44 +02:00
071eb5334f handle flash slow tests (#41072)
* handle flash slow tests

* update patch mask to 1/0 for flash

* don't skip flash

* flash

* raise tols

* rm flash support :(

* nits

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-7.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-230.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-214.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-147.ec2.internal>
2025-09-26 16:24:31 +00:00
50d2448a1a Enable fa in amd docker (#41069)
* Add FA to docker

* Use caching mechanism for qwen2_5

* Fix a typo in important models list

* Partial fixes for gemma3

* Added a commit ID for FA repo

* Detailled  the expectation storage format

* Rebase fix

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-26 13:57:58 +02:00
10f6891fc5 Remove data from examples (#41168)
Remove telemetry
2025-09-26 13:52:45 +02:00
97ca0b4712 Fix flash-attn for paged_attention when no kernels (#41078)
* Fix non-kernels flash attention paged implementation

* Cover all cases

* Style

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Apply style fixes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-26 10:41:21 +02:00
53838edde7 Improve add_dates script (#41167)
* utils/add_dates.py

* put lfm2-vl in correct category
2025-09-25 16:00:05 -04:00
449533af73 Add language specifiers to code blocks of markdown files (#41114)
* Add language specifiers to code blocks of markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/qwen3_omni_moe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update nemotron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phimoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix syntax error

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-25 10:29:57 -07:00
e691f84412 Force new vision models addition to include a fast image processor (#40802)
* add test

* fix test and change cutoff date

* Add documentation to test
2025-09-25 15:58:18 +00:00
e54bb62a73 Simplify and improve model loading logic (#41103)
* remove unexpected keys from inputs (they have nothing to do there)

* remove input

* simplify a lot init

* fix

* fix check for non-persistent buffer

* revert because too many old and bad models...

* remove comment

* type hint

* make it a real test

* remove model_to_load -> always use the same model

* typo

* remove legacy offload_folder (we never waste that memory anymore)

* do not change prefix anymore

* change very bad function name

* create adjust method

* remove useless method

* restrict

* BC

* remove unused method

* CI

* remove unused args

* small fix

* fix

* CI

* CI

* avoid too many loops

* fix regex

* cleaner

* typo

* fix

* fix
2025-09-25 17:28:27 +02:00
6dc9ed87a0 Fix format of compressed_tensors.md (#41155)
* Fix table format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-25 14:50:15 +00:00
a579de7f5e Add Parakeet (#39062)
* first commit

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update to handle masking for bs>1

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tests and docs

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update model ids

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update docs and improve style

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update librosa location

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* import guard torch too

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff code checks fix

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff format check

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* updated to parakeet names

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update script

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tokenizer decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Remove other model dependency

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* clean tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* linting

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff lint warnings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to seperate folders

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet ctc model code

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* simplify encoder structure

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update documentation

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet to toctree

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet doc

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Address comments

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Update featurizer to compute lens directly

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix encoding format

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix minor ctc decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* revert modular_model_converter.py changes

* revert check_config_attributes.py changes

* refactor: fastconformer & parakeet_ctc -> parakeet

* modeling update

* test update

* propagate feature extractor updates

* propagate doc changes

* propagate doc changes

* propagate tokenization changes

* propagate conversion changes

* remove fastconformer tests

* remove modular

* update processor

* update processor

* tset update

* diverse fixes

* 100% macthing greedy batched

* Update conversion script.

* Refactor docs.

* Reafactor auto loading.

* Refactor and fix tokenization and processing.

* Update integration test.

* Modeling fixes:
- ensure correct attention mask shape
- ensure layer drop returns valid output
- correct blank token ID when computing CTC loss

* Format and repo consistency.

* Update model doc.

* Fix feature extraction tests.

* Fix (most) tokenizer tests.

* Add pipeline example.

* Fixes

* Use eager_attention_forward from Llama.

* Small tweaks.

* Replace Sequential with ModuleList

* Add check if not all layers copied

* Clean tokenizer.

* Standardize FastSpeech2ConformerConvolutionModule for Parakeet.

* Switch to modular for modeling and processing.

* Add processor tests.

* Fix modeling tests.

* Formating and docstrings.

* Add `return_attention_mask` like other feature extractors.

* clean up after merging main.

* nits on modeling

* configuration update

* nit

* simplification: use PretrainedTokenizerFast, simplify processor

* add dtype arg to mel_filter_bank

* feature extraction: simplify!

* modeling update

* change to ParakeetTokenizerFast

* correct attention mask handling

* auto update

* proc update

* test update

* feature extraction fixes

* modeling update

* conversion script update

* udpate tests feature integration

* update tokenization and tests

* processor tests

* revert audio_utils

* config docstring update

* blank_token -> pad_token

* modeling udpate

* doc update

* fix tests

* fix test

* fix tests

* address review comments

* add comment

* add comment

* explicitly not support flash

* atttention straightforward masking

* fix

* tokenizer update: skipping blank tokens by default

* doc update

* fix max_positions_embeddings handling

* nits

* change atol faeture extraction integration tests

* doc update + fix loss

* doc update

* nit

* update integration test for A10

* repo id name

* nit

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eric B <ebezzam@gmail.com>
2025-09-25 13:52:24 +00:00
1dd22a234c extend gemma3n integration ut cases on XPU (#41071)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-09-25 13:46:37 +00:00
05fb90c969 Fix single quotes in markdown (#41154)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-25 13:03:26 +00:00
44682e7131 Adapt and test huggingface_hub v1.0.0 (#40889)
* Adapt and test huggingface_hub v1.0.0.rc0

* forgot to bump hfh

* bump

* code quality

* code quality

* relax dependency table

* fix has_file

* install hfh 1.0.0.rc0 in circle ci jobs

* repostiryo

* push to hub now returns a commit url

* catch HfHubHTTPError

* check commit on branch

* add it back

* fix ?

* remove deprecated test

* uncomment another test

* trigger

* no proxies

* many more small changes

* fix load PIL Image from httpx

* require 1.0.0.rc0

* fix mocked tests

* fix others

* unchange

* unchange

* args

* Update .circleci/config.yml

* Bump to 1.0.0.rc1

* bump kernels version

* fix deps
2025-09-25 11:13:50 +00:00
750dd2a401 Fix: align Qwen2.5-VL inference rope index with training by passing s… (#41153)
Fix: align Qwen2.5-VL inference rope index with training by passing second_per_grid_ts
2025-09-25 10:33:46 +00:00
7258ea44bc Fix loading logic flaw with regards to unexpected and missing keys (#40850)
* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-24 16:44:42 +02:00
2c4caa19e7 dummy commit (#41133)
* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-24 16:31:46 +02:00
6d1875924c Fixed loading LongT5 from legacy checkpoints (#40724)
* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head
2025-09-24 13:13:18 +01:00
3ca43d34b1 Fixed MXFP4 model storage issue (#41118) 2025-09-24 12:11:51 +00:00
b33cb70097 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)
* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment
2025-09-24 11:54:55 +00:00
b0c7034d58 Remove self-assignment (#41062)
* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-24 12:43:17 +01:00
04a0bb569c Fix broken `` expressions in markdown files (#41113)
Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 11:34:12 +00:00
071c7b1423 Fix the error where a keyword argument appearing before *args (#41099)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 11:27:37 +00:00
80f20e0ff8 [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)
* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-24 11:18:27 +00:00
1d81247b0c [torchao safetensors] integrate torchao safetensors support with transformers (#40735)
* enable torchao safetensors

* enable torchao safetensors support

* add more version checking
2025-09-24 12:32:47 +02:00
b533cec74d Support loading LFM2 GGUF (#41111)
* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-24 10:17:41 +00:00
65dcd66cc8 🚨 [V5] Remove deprecated training arguments (#41017)
* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 12:01:27 +02:00
43a613c8da Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-09-24 06:37:21 +00:00
f64354e89a Format empty lines and white space in markdown files. (#41100)
* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 16:20:01 -07:00
99b0995138 Remove bad test skips (#41109)
* remove bad skips

* remove more

* fix inits
2025-09-23 20:39:28 +02:00
00f3d90720 Fix _get_test_info for inherited tests (#41106)
* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 19:35:24 +02:00
cfa022e719 [tests] gpt2 + CausalLMModelTester (#41003)
* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder
2025-09-23 18:07:06 +01:00
869735d37d 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)
* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma
2025-09-23 16:20:00 +00:00
71717ce91c docs: Fix Tool Use links and remove dead RAG links (#41104)
docs: Fix tool use links. Remove dead RAG links. Fix style
2025-09-23 09:18:49 -07:00
946e5f95ea fix wrong height and width when read video use torchvision (#41091) 2025-09-23 12:35:44 +00:00
870add3daf Remove tf and flax from Chinese documentation (#41057)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:43:17 +00:00
ae60692821 Remove unused arguments (#40916)
* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:40:51 +00:00
f682797866 Fix typing (#40788)
* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:36:02 +00:00
f4a6c65951 Fix typos in documentation (#41087)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:27:04 +00:00
89e0f472f4 Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)
Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:14:11 +00:00
62ce6fcb60 Fix argument name in benchmarking script (#41086)
* Fix argument name in benchmarking script

* Adjust vars
2025-09-23 13:05:27 +02:00
257fe5eea8 Switch to python:3.10-slim for CircleCI docker images (#41067)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 12:48:48 +02:00
0ec0325781 Minor addition, no split modules for VideoMAEE (#41051)
* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-09-23 11:53:51 +02:00
577fa6f167 fix crash when using chat to send 2+ request to gptoss (#40536)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2025-09-23 09:50:23 +00:00
03c92884b5 Update team member list for some CI workflows (#41094)
* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 09:48:40 +00:00
cbb290ec23 Improve documentation and errors in Mamba2-based models (#41063)
* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files
2025-09-22 10:36:20 -07:00
8048c614bf [i18n-bn] Add Bengali language README file (#40935)
* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions
2025-09-22 09:51:39 -07:00
aa30e0642e Update quantization CI (#41068)
* fix

* new everything

* fix
2025-09-22 18:10:16 +02:00
1bb69cce82 Fix CI jobs being all red 🔴 (false positive) (#41059)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 16:51:00 +02:00
f15258dec2 Remove <frameworkcontent> and <pt> tags from documentation (#41055)
* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-22 14:29:50 +00:00
2ec37649e2 Ci utils (#40978)
* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License
2025-09-22 16:16:19 +02:00
b9d337b6f3 Add write token for uploading benchmark results to the Hub (#41047)
* Separate write token for Hub upload

* Address review comments

* Address review comments
2025-09-22 14:13:46 +00:00
646ff51d1a Simplify unnecessary Optional typing (#40839)
Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:57:50 +00:00
c9939b3ab6 Remove repeated import (#40937)
* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:57:13 +00:00
4f36011545 [testing] Fix seed_oss (#41052)
* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-22 14:54:30 +02:00
2b8a7e82b5 Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)
* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking
2025-09-22 13:42:34 +01:00
226667ec2f Remove doc of tf and flax (#41029)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 13:42:26 +01:00
6eff44bb8d Fix outdated torch version check (#40925)
Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:38:07 +00:00
9ff47a71e4 Fix condition for emitting warning when generation exceeds max model length (#40775)
correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-09-22 12:21:38 +00:00
ae9ef2e151 docs: improved RoPE function Docstrings (#41004)
* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-22 13:21:15 +01:00
f3c481ed87 Use torch.autocast (#40975)
* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:18:24 +00:00
37152f8446 Fix typos in English/Chinese documentation (#41031)
* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:31:46 +00:00
8a52288dba Remove optax (#41030)
Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:30:39 +00:00
5f891b36cd Fix typing of tuples (#41028)
* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:29:07 +00:00
c05f9d2f0e [testing] Fix qwen2_audio (#41018)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 10:45:31 +00:00
55a1eaf6f0 Fix Qwen video tests (#41049)
fix test
2025-09-22 12:28:11 +02:00
db802aafa4 Modify Qwen3Omni parameter name since VL changed it (#41045)
Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-09-22 10:06:59 +00:00
8a2f24a321 Making compute_loss_func always take priority in Trainer (#40632)
* logger warn, if-else logic improved

* redundant if condition fix
2025-09-22 09:47:34 +00:00
ebbcf00ad1 Adding support for Qwen3Omni (#41025)
* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-09-21 23:46:27 +02:00
67097bf340 Fix benchmark runner argument name (#41012) 2025-09-20 10:53:56 +02:00
8076e755e5 Update after #41007 (#41014)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 21:55:46 +02:00
022c882e14 Fix Glm4v test (#41011)
fix
2025-09-19 18:54:26 +02:00
966b3dbcbe Fix PhimoeIntegrationTest (#41007)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 16:43:46 +00:00
04bf4112f2 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)
* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-09-19 16:41:22 +01:00
dfc230389c 🚨 [v5] remove deprecated entry point (#40997)
* remove old entry point

* update references to transformers-cli
2025-09-19 14:40:27 +00:00
8010f5d1d9 Patch more unittest.case.TestCase.assertXXX methods (#41008)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 16:38:12 +02:00
5bf633b32a [tests] update test_left_padding_compatibility (and minimize overwrites) (#40980)
* update test (and overwrites)

* better test comment

* 0 as a default for
2025-09-19 15:36:26 +01:00
df12617914 🚨 [v5] remove generate output retrocompatibility aliases (#40998)
remove old type aliases
2025-09-19 14:36:12 +00:00
2a538b2ed4 fix dict like init for ModelOutput (#41002)
* fix dict like init

* style
2025-09-19 16:14:44 +02:00
96a3e898cd RUFF fix on CI scripts (#40805)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:50:26 +00:00
98c8523434 Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)
* Fix model cards and modalities in toctree

* fix new models
2025-09-19 09:47:28 -04:00
767f8a4c75 Fix typoes in src and tests (#40845)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:18:38 +00:00
9d9c4d24c5 Make EfficientLoFTRModelTest faster (#41000)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 12:51:05 +00:00
b4ba4e1da0 [RMSNorm] Fix rms norm init for models that center around 1 (#40796)
* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check
2025-09-19 12:15:36 +00:00
fce746512b [docs] rm stray tf/flax autodocs references (#40999)
rm tf references
2025-09-19 12:04:12 +01:00
ddfa3d4402 blt wip (#38579)
* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
2025-09-19 11:55:55 +02:00
46ea7e613d [testing] test num_hidden_layers being small in model tester (#40992)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 11:45:07 +02:00
ebdc17b8e5 ENH: Enable readline support for transformers chat (#40911)
ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).
2025-09-19 10:39:21 +01:00
e2dbde280f Remove [[autodoc]] refs to TF/Flax objects (#40996)
* remove refs

* more
2025-09-19 11:28:34 +02:00
155f7e2e62 🔴[Attention] Bert-based Models Attention Refactor (#38301)
* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style
2025-09-19 11:23:58 +02:00
61eff450d3 Benchmarking v2 GH workflows (#40716)
* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description
2025-09-19 08:54:49 +00:00
5f6e278a51 Remove set_model_tester_for_less_flaky_tests (#40982)
remove
2025-09-18 18:56:10 +02:00
4df2529d79 🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide (#40760)
* setup

* start the purge

* continue the purge

* more and more

* more

* continue the quest: remove loading tf/jax checkpoints

* style

* fix configs

* oups forgot conflict

* continue

* still grinding

* always more

* in tje zone

* never stop

* should fix doc

* fic

* fix

* fix

* fix tests

* still tests

* fix non-deterministic

* style

* remove last rebase issues

* onnx configs

* still on the grind

* always more references

* nearly the end

* could it really be the end?

* small fix

* add converters back

* post rebase

* latest qwen

* add back all converters

* explicitly add functions in converters

* re-add
2025-09-18 18:27:39 +02:00
5ac3c5171a Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 18:27:27 +02:00
d9d7f6a6b9 Revert change in compile_friendly_resize (#40645)
fix
2025-09-18 16:25:45 +01:00
738b223f57 Add captured actual outputs to CI artifacts (#40965)
* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 15:40:53 +02:00
dd7ac4cd59 [tests] Really use small models in all fast tests (#40945)
* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency
2025-09-18 15:24:12 +02:00
2ce35a248f Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)
* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
2025-09-18 13:22:19 +00:00
6e51ac31ef [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)
* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling
2025-09-18 14:09:08 +01:00
9378f874c1 [Trainer] Fix DP loss (#40799)
* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-09-18 13:07:20 +00:00
7cf1f5ced0 Use skip_predictor=True in vjepa2 get_vision_features (#40966)
use skip_predictor in vjepa2 `get_vision_features`
2025-09-18 11:51:45 +00:00
f6104189fd Fix outdated version checks of accelerator (#40969)
* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 11:49:14 +00:00
c532575795 Add new model LFM2-VL (#40624)
* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
2025-09-18 11:01:58 +00:00
564fde14f1 FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)
* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-18 09:57:21 +00:00
5748352c27 Update expected values for one more test_speculative_generation after #40949 (#40967)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 11:47:14 +02:00
438343d93f Don't list dropout in eager_paged_attention_forward (#40924)
Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 09:05:50 +00:00
449da6bb30 Add FlexOlmo model (#40921)
* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`
2025-09-18 09:04:06 +00:00
3bb1b4867c Standardize audio embedding function name for audio multimodal models (#40919)
* Standardize audio embedding function name for audio multimodal models

* PR review
2025-09-18 08:45:04 +00:00
58e13b9f12 Update expected values for some test_speculative_generation (#40949)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 20:50:38 +02:00
529d3a2b06 Fix Glm4vModelTest::test_eager_matches_fa2_generate (#40947)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 19:53:59 +02:00
a2ac4de8b0 Remove nested import logic for torchvision (#40940)
* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports
2025-09-17 13:34:30 -04:00
8e837f6ae2 Consistent naming for images kwargs (#40834)
* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print
2025-09-17 18:40:25 +02:00
eb04363a0d Raise error instead of warning when using meta device in from_pretrained (#40942)
* raise instead of warning

* add timm

* remove
2025-09-17 18:23:37 +02:00
ecc1d778ce Fix Glm4vMoeIntegrationTest (#40930)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 18:21:18 +02:00
c5553b4120 Fix trainer tests (#40823)
* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-09-17 16:05:17 +00:00
14f01aee39 docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941) 2025-09-17 08:48:38 -07:00
26b65fb516 Intel CPU dockerfile (#40806)
* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-09-17 15:42:30 +00:00
66f97d3f64 [models] remove unused import torch.utils.checkpoint (#40934) 2025-09-17 16:37:56 +01:00
3853bfe4d5 [DOC] Add missing dates in model cards (#40922)
add missing dates
2025-09-17 11:17:06 -04:00
6cade29278 Add LongCat-Flash (#40730)
* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism
2025-09-17 14:48:10 +02:00
48a5565179 Add support for Florence-2 training (#40914)
* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-17 11:49:56 +00:00
89949c5d2d Minor fix for #40727 (#40929)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 11:42:13 +02:00
c830fc1207 Adding activation kernels (#40890)
* first commit

* add mode

* revert modeling

* add compile

* rm print
2025-09-17 11:36:09 +02:00
f6999b00c3 [torchao safetensors] renaming get_state_dict function (#40774)
renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-09-17 11:20:50 +02:00
8428c7b9c8 Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)
* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-09-17 09:15:55 +00:00
ddd4caf066 [Llama4] Remove image_sizes arg and deprecate vision_feature_layer (#40832)
* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment
2025-09-17 09:14:13 +00:00
b82cd1c240 Processor load with multi-processing (#40786)
push
2025-09-17 09:46:49 +02:00
6e50a8afb2 [Docs] Adding documentation of MXFP4 Quantization (#40885)
* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-16 11:31:28 -07:00
cccef4be91 Fix dtype in Paligemma (#40912)
* fix dtypes

* fix copies

* delete unused attr
2025-09-16 16:07:56 +00:00
beb09cbd5a 🔴Make center_crop fast equivalent to slow (#40856)
make center_crop fast equivalent to slow
2025-09-16 16:01:38 +00:00
d4af0d9f03 [generate] misc fixes (#40906)
misc fixes
2025-09-16 15:18:06 +01:00
3b3f6cd0c1 [gemma3] Gemma3ForConditionalGeneration compatible with assisted generation (#40791)
* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase
2025-09-16 15:08:48 +01:00
88ba0f107e disable test_fast_is_faster_than_slow (#40909)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:34:04 +02:00
270da89708 Remove runner_map (#40880)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:18:07 +02:00
df03fc1f9c Improve module name handling for local custom code (#40809)
* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-16 13:11:48 +00:00
96bc19bcdf remove dummy EncodingFast (#40864)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-16 12:56:11 +00:00
d0af4269ec Add Olmo3 model (#40778)
* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test
2025-09-16 13:28:23 +02:00
65f9ede359 Set seed for Glm4vIntegrationTest (#40905)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 13:01:51 +02:00
0c1839d609 [cache] Only use scalars in get_mask_sizes (#40907)
* remove tensor ops

* style

* style
2025-09-16 12:48:58 +02:00
3688a977d0 Harmonize CacheLayer names (#40892)
* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert
2025-09-16 12:14:12 +02:00
087775d10e [cache] Merge static sliding and static chunked layer (#40893)
* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle
2025-09-16 11:41:20 +02:00
1aff033ec9 Fix flaky Gemma3nAudioFeatureExtractionTest::test_dither (#40902)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 11:00:07 +02:00
65adc3aaa3 Fix getter regression (#40824)
* test things

* style

* move tests to a sane place
2025-09-16 10:57:13 +02:00
8e1a12bbee Fixing the call to kernelize (#40628)
* fix

* style

* overload train and eval

* add getter and setter
2025-09-16 10:50:54 +02:00
21c8379fb0 Make debugging failing tests (check and update expect output values) easier 🔥 (#40727)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 10:21:48 +02:00
5af248b3e3 [generate] remove docs of a feature that no longer exists (#40895) 2025-09-15 19:22:31 +01:00
20ee3a73f0 🌐 [i18n-KO] Translated imageprocessor.md to Korean (#39557)
* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-15 10:07:16 -07:00
2141a5b764 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)
* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-15 10:06:57 -07:00
2a83792165 Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)
Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 17:38:13 +02:00
04d1c8f3d4 Fix deta loading & dataclass (#40878)
* fix

* fix 2
2025-09-15 17:23:13 +02:00
ff26fe8302 Add Fast PromptDepthAnything Processor (#40602)
* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-15 15:03:43 +00:00
6254bb4a68 Use torch.expm1 and torch.log1p for better numerical results (#40860)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 11:54:14 +00:00
e674e9dadb Clarify passing is_causal in sdpa_attention_paged_forward (#40838)
* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 11:51:22 +00:00
0957999f7f 🔴 Move variable output controls to _prepare_generation_config (#40715)
* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches
2025-09-15 11:08:00 +00:00
5e9ec59d0c Fix modular consistency (#40883)
* reapply modular

* add missing one
2025-09-15 13:07:08 +02:00
3442b2f300 [VaultGemma] Update expectations in integration tests (#40855)
* fix tests

* style
2025-09-15 12:46:30 +02:00
c0dbe095b0 Adding Support for Qwen3-VL Series (#40795)
* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-09-15 12:46:18 +02:00
fc5f9105da [Qwen3 Next] Use numerically stable rsqrt (#40848)
use numerically stable inverse
2025-09-15 12:45:13 +02:00
96d3795cfc Update model tags and integration references in bug report (#40881) 2025-09-15 12:08:29 +02:00
f5e1641857 fix: XIELU act parameters not being casted to correct dtype (#40812) 2025-09-15 11:05:55 +02:00
ada64ce452 fix florence kwargs (#40826) 2025-09-15 11:05:47 +02:00
93f810e6fa [docstrings / type hints] Update outdated annotations for past_key_values (#40803)
* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes
2025-09-15 10:52:32 +02:00
c65fea0b92 [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-15 10:46:32 +02:00
9c804f7ec4 Redirect MI355 CI results to dummy dataset (#40862) 2025-09-14 18:42:49 +02:00
02ea2b3433 Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)
Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-14 15:35:42 +00:00
d42e96a2a7 Use checkpoint in auto_class_docstring (#40844)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-13 00:49:19 +00:00
6eb3255842 [generate] Always use decoder config to init cache (#40772)
* mega derp

* fix

* always use the decoder
2025-09-12 18:24:22 +02:00
e682f90f60 [tests] move generative tests away from test_modeling_common.py (#40854)
move tests
2025-09-12 16:12:27 +00:00
8d8459132a [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)
* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve
2025-09-12 18:07:48 +02:00
291772b6b5 add: differential privacy research model (#40851)
* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-12 17:36:03 +02:00
8502b41bf1 [Sam2Video] Fix video inference with batched boxes and add test (#40797)
fix video inference with batched boxes and add test
2025-09-12 14:33:28 +00:00
f384bb8ad5 [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)
* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular
2025-09-12 14:21:22 +00:00
4cb41ad2a2 [tests] re-enable aria fast tests (#40846)
* rise from the dead

* test
2025-09-12 15:14:54 +01:00
ef053939ca Fixes for continuous batching (#40828)
* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style
2025-09-12 15:35:31 +02:00
98a8078127 Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)
* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-12 14:08:01 +02:00
77aa35ee9c Replace image classification loss functions to self.loss_function (#40764) 2025-09-12 12:59:37 +01:00
797859c9b8 Update no split modules in T5Gemma model (#40810)
* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-12 10:44:57 +00:00
6e69b60806 Adds Causal Conv 1D kernel for mamba models (#40765)
* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit
2025-09-12 12:22:25 +02:00
827b65c42c Add VideoProcessors to auto-backend requirements (#40843)
* add it

* fix existing ones

* add perception to auto_mapping...
2025-09-12 12:21:12 +02:00
5e2e77fb45 Improve torch_dtype checks (#40808)
* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-12 09:57:59 +00:00
c81f426f9a 🌐 [i18n-KO] Translated clipseg.md to Korean (#39903)
* docs: ko: model_doc/clipseg.md

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

---------

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
2025-09-11 17:07:24 -07:00
cf084f5b40 [Jetmoe] Fix RoPE (#40819)
* fix

* remove prints

* why was this there...
2025-09-11 18:41:11 +02:00
dfae7dd98d Push generation config along with checkpoints (#40804) 2025-09-11 17:33:16 +02:00
c264c0ee7e add general hub test for Fast Image Processors in test_image_processing_utils (#40086)
* build unittest for ViTImageProcessorFast

* remove redundant test case

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-09-11 14:31:37 +00:00
895b3ebe41 Fix typos in src (#40782)
Fix typoes in src

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-11 13:15:15 +01:00
6d369124ad Align torch implementation of Gated DeltaNet in Qwen3-Next with fla library. (#40807)
* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-11 13:10:15 +02:00
0f1b128d33 ⚠️ 🔴 Add ministral model (#40247)
* add ministral model

* docs, tests

* nits

* fix tests

* run modular after merge

* opsie

* integration tests

* again

* fff

* dtype

* rerun modular

* arthur review

* ops

* review
2025-09-11 10:30:39 +02:00
02f1d7c091 Fix config dtype parsing for Emu3 edge case (#40766)
* fix emu3 config

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* address comment

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* add comments

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

---------

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-11 08:26:45 +00:00
de01a22aff Fix edge case for tokenize (#36277) (#36555)
* Fix edge case for tokenize (#36277)

* Fix tokenizing dtype for float input cases

* add test for empty input string

* deal empty list of list like [[]]

* add tests for tokenizer for models with input that is not plain text
2025-09-11 09:57:30 +02:00
ec532f20fb feature: Add robust token counting with padding exclusion (#40416)
* created robust token counting by using existing include_num_input_tokens_seen variable and kept bool for backward compatibility and added string also to ensure everything goes well and kept default as is. also robust test cases are created

* some codebase mismatched in my local and remote, commiting to solve it and also solved code quality issue

* ci: retrigger tests

* another attemp to trigger CI for checks
2025-09-11 09:16:06 +02:00
df67cd35f0 Fix DeepSpeed mixed precision precedence over Accelerate defaults (#39856)
* Fix DeepSpeed mixed precision precedence over Accelerate defaults

Resolves issue where Accelerate would default to bf16 mixed precision
when a DeepSpeed config specifies fp16, causing a ValueError. The fix
ensures DeepSpeed config takes precedence over TrainingArguments defaults
while preserving explicit user settings.

Changes:
- Add override_training_args_from_deepspeed() method to handle config precedence
- Reorder mixed precision environment variable setting in TrainingArguments
- Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices

Fixes #39849

* Add tests for DeepSpeed mixed precision precedence fix

- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests
- Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults
- Test user explicit settings being preserved over DeepSpeed config
- Test precedence hierarchy: user settings > DeepSpeed config > defaults
- Replace massive 934-line test bloat with concise 50-line test suite
- Tests cover core functionality of PR #39856 mixed precision precedence fix
2025-09-11 09:12:15 +02:00
549ba5b8b6 [Docs] Add missing class documentation for optimizer_schedules (#31870, #23010) (#40761)
* Add missing class documentation for optimizer_schedules (#31870, #23010)

* Add section level header to the optimizer schedules
2025-09-10 14:58:21 -07:00
dae1ccfb98 fix_image_processing_fast_for_glm4v (#40483)
* fix_image_processing_fast_for_glm4v

* fix(format): auto-ruff format

* add test image processing glm4v

* fix quality

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-10 21:05:27 +00:00
7d57b31e16 Remove use_ipex option from Trainer (#40784)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 17:00:15 +00:00
3378e7dabf Move num_items_in_batch to correct device before accelerator.gather (#40773)
add device
2025-09-10 18:49:42 +02:00
e5ecb03c92 Fix the issue that csm model cannot work with pipeline mode. (#39349)
* Fix the issue that csm model cannot work with pipeline mode.

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove batching inference

Signed-off-by: yuanwu <yuan.wu@intel.com>

* csm output is list of tensor

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update src/transformers/pipelines/text_to_audio.py

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

* Use different waveform key for different model

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix make style errors

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add csm tests

Signed-off-by: yuanwu <yuanwu@habana.ai>

* Update src/transformers/models/auto/tokenization_auto.py

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: yuanwu <yuanwu@habana.ai>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-10 16:17:35 +00:00
abbed7010b Fix dotted model names (#40745)
* Fix module loading for models with dots in names

* quality check

* added test

* wrong import

* Trigger CI rerun after making test model public

* Update src/transformers/dynamic_module_utils.py

* Update tests/utils/test_dynamic_module_utils.py

* Update tests/utils/test_dynamic_module_utils.py

* Move test

* make fixup

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
2025-09-10 14:34:56 +00:00
75202b0928 Read config pattern for Qwen3Next (#40792)
read it
2025-09-10 15:18:51 +02:00
7401cfa57c Use functools.cached_property (#40607)
* cached_property is avaiable in functools

Signed-off-by: cyy <cyyever@outlook.com>

* Remove cached_property

Signed-off-by: cyy <cyyever@outlook.com>

* Fix docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 12:15:40 +00:00
8ab2448707 Fix invalid PipelineParallel member (#40789)
Fix invalid enum member

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 12:06:36 +00:00
6c9f412105 Fix typos in tests and util (#40780)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 11:45:40 +00:00
0997c2f2ab Fix doc for PerceptionLMForConditionalGeneration forward. (#40733)
* Fix doc for PerceptionLMForConditionalGeneration forward.

* fix last nit

---------

Co-authored-by: raushan <raushan@huggingface.co>
2025-09-10 11:57:19 +02:00
a72e5a4b9d 🚨 Fix Inconsistant input_feature length and attention_mask length in WhisperFeatureExtractor (#39221)
* Update feature_extraction_whisper.py

* Reformat

* Add feature extractor shape test

* reformat

* fix omni

* fix new failing whisper test

* Update src/transformers/models/whisper/feature_extraction_whisper.py

* make style

* revert omni test changes

* add comment

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-09-10 09:38:47 +00:00
a5ecd94a3f Enable ruff on benchmark and scripts (#40634)
* Enable ruff on benchmark and scripts

Signed-off-by: cyy <cyyever@outlook.com>

* Cover benchmark_v2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* correct

* style

* style

---------

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-10 11:38:06 +02:00
08edec9f7d [processors] Unbloating simple processors (#40377)
* modularize processor - step 1

* typos

* why raise error, super call check it also

* tiny update

* fix copies

* fix style and test

* lost an import / fix copies

* fix tests

* oops deleted accidentally
2025-09-10 10:37:19 +02:00
c52889bd51 Remove reference of video_load_backend and video_fps for processor (#40719)
* Remove reference of video_load_backend and video_fps for processor

Signed-off-by: cyy <cyyever@outlook.com>

* Restore changes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-10 08:37:11 +00:00
3340ccbd40 Fix gpt-oss router_indices in EP (#40545)
* fix out shape

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix router indice

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix mod

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix masking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add safety cheking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix checking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable 1 expert per rank

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix skip

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add ep plan in config

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add update ep plan

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm ep_plan and add comments

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-09-10 10:30:55 +02:00
b9282355be Adding Support for Qwen3-Next (#40771)
* Add Qwen3-Next.

* fix

* style

* doc

* simplify

* fix name

* lazy cache init to allow multi-gpu inference

* simplify

* fix config to support different hybrid ratio.

* remove last commit (redundant)

* tests

* fix test

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-09 23:46:57 +02:00
79fdbf2a4a [docs] CPU install (#40631)
* init

* feedback
2025-09-09 12:51:54 -07:00
37c14430c9 [pipeline] ASR pipeline kwargs are forwared to generate (#40375)
* tmp commit

* add test

* PR suggestion
2025-09-09 17:29:25 +00:00
d09fdf5e52 Fix crash when executing MambaCache sample code (#40557)
* Fix the sample code of MambaCache

* Update automatically generated code

* Fix FalconMambaCache documents

* minor doc fixes

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
2025-09-09 16:44:49 +00:00
d33c189e5a [RoPE] run RoPE tests when the model uses RoPE (#40630)
* enable rope tests

* no manual rope test parameterization

* Apply suggestions from code review

* Update tests/models/hunyuan_v1_dense/test_modeling_hunyuan_v1_dense.py

* PR comment: use generalist torch code to find the rope layer
2025-09-09 17:11:02 +01:00
71ac7ea048 [tests] update test_past_key_values_format and delete overwrites (#40701)
* tmp

* rm some overwrites
2025-09-09 16:40:04 +01:00
7aaef98cbe rm src/transformers/convert_pytorch_checkpoint_to_tf2.py (#40718)
* rm src/transformers/convert_pytorch_checkpoint_to_tf2.py

* doctest skip
2025-09-09 16:34:54 +01:00
de5cbe8b79 [deprecations] Remove generate-related deprecations up to v4.56 (#40729)
remove generate-related deprecations up to v4.56
2025-09-09 16:32:41 +01:00
1cdbbb3e9d Support sliding window in CB (#40688)
* CB example: better compare feature

* Cache managers, still issue w/ effective length

* WIP -- fix for effective length

* Renames

* Wroking, need better parity checks, we mind be missing 1 token

* Small fixes

* Fixed wrong attn mask and broke cache into pieces

* Warmup is slowing down things, disabling it

* Cache was too big, fixed

* Simplified index objects

* Added a profile option to the example

* Avoid calls to memory reporing tools

* Restore full attention read indices for better latency

* Adressed some TODOS and style

* Docstrings for cache managers

* Docstrings for Schedulers

* Refactor scheudlers

* [Important] Cache fix for sliding window, check with small sw size

* Updated doc for cache memory compute and cache as a whole

* Moved a todo

* Nits and style

* Fix for when sliding window is smaller than max batch per token

* Paged interface update

* Support for FLash in new API

* Fix example CB

* Fix bug in CB for paged

* Revert example

* Style

* Review compliance

* Style

* Styleeeee

* Removed NO_SLIDING_WINDOW

* Review #2 compliance

* Better art

* Turn cum_seqlens_k in a dict

* Attn mask is now a dict

* Update examples/pytorch/continuous_batching.py

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Adressed McPatate pro review

* Style and fix

---------

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
2025-09-09 15:51:11 +02:00
ed100211cb [generate] PromptLookupCandidateGenerator won't generate forbidden tokens (#40726)
* no longer flaky :)

* PR comments

* any token-blocking logits processor works

* ?

* default

* -_-

* create fake tensors once
2025-09-09 11:04:01 +00:00
82d66e5dd0 Fix: swanlab public.cloud.experiment_url api error (#40763)
fix
2025-09-09 09:28:13 +00:00
a871f6f58d Add EfficientLoFTRImageProcessorFast for GPU-accelerated image processing (#40215)
* Add EfficientLoFTRImageProcessorFast for GPU-accelerated image processing

* Fix fast processor output format and add comprehensive tests

* Fix trailing whitespace in test file

* Apply ruff formatting to test file

* simplify pair validation logic

* add superglue tests to fast image processor

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-08 21:08:02 +00:00
aee5000f16 Fix Bark failing tests (#39478)
* Fix vocab size for Bark generation.

* Fix Bark processor tests.

* Fix style.

* Address comments.

* Fix formatting.

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-08 20:24:51 +02:00
126264d015 🌐 [i18n-KO] Translated 'xclip.md' to Korean (#39594)
* feat: nmt draft

* fix: manual edits

* docs: ko: xclip.md

* feat: nmt draft

* fix: manual edits

* fix: Modify _toctree.yml file to reflect review

* fix: Modify _toctree.yml file to reflect review

* jungnerd_suggestion_modified_01 ko_xclip.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* jungnerd_suggestion_modified_02 ko_xclip.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-09-08 11:19:10 -07:00
5a468e56b7 Fix continue_final_message in apply_chat_template to prevent substring matching issues (#40732)
* Fix continue_final_message parameter in apply_chat_template

* after run fixup

* Handle trim in the template

* after fixup

* Update src/transformers/utils/chat_template_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-08 17:25:12 +00:00
e8db153599 Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs (#39364) 2025-09-08 10:01:44 -07:00
fd2a29d468 Fix more typos (#40627)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-08 16:05:40 +00:00
bb8e9cd675 Remove unnecessary tildes from documentation (#40748) 2025-09-08 08:56:35 -07:00
a9b313a0c2 docs: add continuous batching to serving (#40758)
* docs: tmp

* docs: add continuous batching to serving

* docs: reword after @lysandrejik review
2025-09-08 15:50:28 +00:00
2077f17547 feat: err when unsupported attn impl is set w/ --continuous_batching (#40618)
* feat: err when unsupported attn impl is set w/ `--continuous_batching`

* refactor: move defaults and support list to CB code

* feat: add action item in error msg

* fix(serve): add default attn implementation

* feat(serve): add log when `attn_implementation` is `None`

* feat: raise Exception when attn_implementation is not supported by CB
2025-09-08 14:31:49 +00:00
dc262ee6f5 remove FSDP prefix when using save_pretrained with FSDP2 (#40207)
* remove FSDP prefix when using save_pretrained with FSDP2

* Fix: use removeprefix correctly

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
2025-09-08 14:52:31 +02:00
9ab6078323 remove gemmas eager training warning (#40744)
* removed warning

* removed remaining warnings
2025-09-08 14:41:52 +02:00
2a1eb5b508 Add BF16 support check for MUSA backend (#40576)
add musa bf16 supported

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-08 12:39:14 +00:00
7b8d40ea7a Set accepts_loss_kwargs to False for ConvNext(|V2)ForImageClassification (#40746) 2025-09-08 14:25:43 +02:00
def7558f74 Fix np array typing (#40741)
Fix typing

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-08 11:30:40 +00:00
44b3888d2a Fix order of mask functions when using and/or_mask_function (#40753)
fix order
2025-09-08 12:31:42 +02:00
3f7bda4209 [Continous Batching] fix do_Sample=True in continuous batching (#40692)
* fix do_Sample=True in continous batching

* added test

* fix top_p

* test

* Update examples/pytorch/continuous_batching.py
2025-09-08 10:30:15 +02:00
bb45d3631e refactor(serve): move request_id to headers (#40722)
* refactor(serve): move `request_id` to headers

* fix(serve): typo in middleware fn name

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-05 17:50:04 +02:00
12b8e10dbf Skip VitMatteImageProcessingTest::test_fast_is_faster_than_slow (#40713)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-05 17:36:20 +02:00
6b232618b6 Keypoint matching docs (#40541)
---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
2025-09-05 17:24:56 +02:00
948bc0fa34 [Gemma Embedding] Fix SWA (#40700)
* fix gemma embedding flash attention

* fix sdpa

* fix atttempt number 2

* alternative gemma fix

* fix modular
2025-09-05 17:12:00 +02:00
828044cadb Add Optional typing (#40686)
* Add Optional typing

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typing

Signed-off-by: cyy <cyyever@outlook.com>

* Format

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 15:05:51 +00:00
e9d6a6907b [tests] remove overwrites of removed test (#40720)
rm tests from method moved to hub
2025-09-05 16:04:22 +01:00
96a5774f2e [serve] re-enable tests (#40717)
run tests
2025-09-05 15:15:34 +01:00
c76387e580 Fix arguments (#40605)
* Fix invalid arguments

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typing

Signed-off-by: cyy <cyyever@outlook.com>

* Add missing self

Signed-off-by: cyy <cyyever@outlook.com>

* Add missing self and other fixes

Signed-off-by: cyy <cyyever@outlook.com>

*  More fixes

Signed-off-by: cyy <cyyever@outlook.com>

*  More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 13:50:04 +00:00
21f09032db 🔴 Update Glm4V to use config values (#40712)
* update to use config

* just fix it

* fixup want this to be reformatted
2025-09-05 13:19:50 +00:00
b62e5b6051 Fix parent classes of AllKwargsForChatTemplate (#40685)
Fix parent classes of AllKwargsForChatTemplate because the *Kwargs are members

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 11:08:51 +00:00
313effa7ad [onnx] use logical or for grounding dino mask (#40625)
* change |= operator to use torch logical or for friendly export to different backends

* change |= operator to use torch logical or for friendly export to different backends in grounding dino model

---------

Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
2025-09-05 10:55:20 +00:00
f3211b5db7 [moduar] Add missing self in post-process methods (#40711) 2025-09-05 10:49:52 +00:00
a2a8a3ca1e [tests] fix blip2 edge case (#40699) 2025-09-05 11:35:29 +01:00
4e195f1949 🚨 Allow check_model_inputs in core VLMs (#40342)
* allow `check_model_inputs` in core VLMs

* address comments

* fix style

* why this didnt fail prev?

* chec for Noneness instead

* batch update vlms

* fix some tests

* fix copies

* oops delete

* fix efficientloftr

* fix copies

* i am stupid, fix idefics

* fix GC

* return type and other comments

* we shouldn't manually change attention anymore

* fix style

* fix copies

* fix the test
2025-09-05 10:05:56 +00:00
93df343def Fix parent classes of ProcessingKwargs (#40676)
FIx parent classes of ProcessingKwargs

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 10:01:16 +00:00
89e103c15e feat(serve): add healthcheck test (#40697) 2025-09-05 11:56:34 +02:00
a2fffa505d Fetch more test data with hf_hub_download (#40710)
[test-all] tests

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-05 09:49:31 +00:00
4a88e81532 Add Fast Image Processor for ImageGPT (#39592)
* initial commit

* initial setup

* Overiding imageGPT specific functions

* imported is_torch_available and utilized it for importing torch in imageGPT fast

* Created init and ImageGPTFastImageProcessorKwargs

* added return_tensors, data_format, and input_data_format to ImageGPTFastImageProcessorKwargs

* set up arguments and process and _preprocess definitions

* Added arguments to _preprocess

* Added additional optional arguments

* Copied logic over from base imageGPT processor

* Implemented 2nd draft of fast imageGPT preprocess using batch processing

* Implemented 3rd draft of imageGPT fast _preprocessor. Pulled logic from BaseImageProcessorFast

* modified imageGPT test file to properly run fast processor tests

* converts images to torch.float32 from torch.unit8

* fixed a typo with self.image_processor_list in the imagegpt test file

* updated more instances of image_processing = self.image_processing_class in the test file to test fast processor

* standardized normalization to not use image mean or std

* Merged changes from solution2 branch

* Merged changes from solution2 test file

* fixed testing through baseImageGPT processor file

* Fixed check_code_quality test. Removed unncessary list comprehension.

* reorganized imports in image_processing_imagegpt_fast

* formatted image_processing_imagegpt_fast.py

* Added arg documentation

* Added FastImageProcessorKwargs class + Docs for new kwargs

* Reformatted previous

* Added F to normalization

* fixed ruff linting and cleaned up fast processor file

* implemented requested changes

* fixed ruff checks

* fixed formatting issues

* fix(ruff after merging main)

* simplify logic and reuse standard equivalenec tests

---------

Co-authored-by: Ethan Ayaay <ayaayethan@gmail.com>
Co-authored-by: chris <christine05789@gmail.com>
Co-authored-by: Ethan Ayaay <98191976+ayaayethan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-04 22:45:06 +00:00
9db11b728b Fetch one missing test data (#40703)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 23:05:23 +02:00
acd820561f Align assisted generate for unified signature in decoding methods (#40657)
* Squashed previous branch

* unify assisted generate to common decoding method signature

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review
2025-09-04 22:47:44 +02:00
16b821c542 Avoid T5GemmaModelTest::test_eager_matches_sdpa_inference being flaky (#40702)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 20:44:40 +00:00
519c2524af Fix broken Llama4 accuracy in MoE part (#40609)
* Fix broken Llama4 accuracy in MoE part

Llama4 accuracy is broken by a bug in
https://github.com/huggingface/transformers/pull/39501 . It forgot to
transpose the router_scores before applying it to routed_in, causing
Llama4 to generate garbage output.

This PR fixes that issue by adding back the transpose() and adding some
comments explaining why the transpose() is needed.

Signed-off-by: Po-Han Huang <pohanh@nvidia.com>

* remove comment

---------

Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-04 22:14:44 +02:00
586dc5d06e [Glm4.5V] fix vLLM support (#40696)
* fix

* add a test case
2025-09-04 22:09:20 +02:00
ad2da3ea83 Fix self.dropout_p is not defined for SamAttention/Sam2Attention (#40667)
Fix dropout_p is not defined for SamAttention/Sam2Attention
2025-09-04 19:32:39 +02:00
e39f222096 Fix backward compatibility with accelerate in Trainer (#40668) 2025-09-04 18:15:15 +02:00
d8f670583e Change docker image to preview for the MI355 CI (#40693)
* Change docker image to preview for the MI355 CI

* Use pushed image
2025-09-04 17:23:09 +02:00
4cbca0d1af Fixing bug in Voxtral when merging text and audio embeddings (#40671)
* Fixing bug when replacing text-audio token placeholders with audio embeddings

* apply changes

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-04 15:11:23 +00:00
9a6c6568db feat: support request cancellation (#40599)
* feat: support request cancellation

* test: add cancellation test

* refactor: use exisitng fn to check req cancellation

* feat(cb): make cancellation thread safe

* refactor(serve): update test to use `requests` instead of `httpx`
2025-09-04 17:01:29 +02:00
87f38dbfce add: embedding model (#40694)
* Gemma 3 for Embeddings

* Style fixes

* Rename conversion file for consistency

* Default padding side emb vs gen

* Corrected 270m config

* style fixes

* EmbeddingGemma config

* TODO for built-in prompts

* Resolving the sentence similarity bug and updating the architecture

* code style

* Add query prompt for SentenceTransformers

* Code quality

* Fixing or_mask_function return types

* Adding placeholder prompts for document and passage

* Finalizing prompt templates

* Adding Retrieval ro preconfigured prompts

* Add Gemma 3 270M Config

* Correcting num_linear_layers flag default

* Export Sentence Transformer in correct dtype

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
2025-09-04 16:16:15 +02:00
5b0c01b5e2 Final test data cache - inside CI docker images (#40689)
* run

* build

* build

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 13:12:49 +00:00
1f3cc935cc Load a tiny video to make CI faster (#40684)
* load a tiny video to make CI faster

* add video in url_to_local_path
2025-09-04 14:49:00 +02:00
669230a86f fix broken offline mode when loading tokenizer from hub (#40669)
* fix broken offline mode when loading tokenizer from hub

* formatting

* make quality

* fix import order
2025-09-04 12:15:56 +00:00
91b34be9cf Add codebook_dim attribute to DacVectorQuantize for DacResidualVectorQuantize.from_latents() (#40665)
* Add instance attribute to DacVectorQuantize for use in DacResidualVectorQuantize.from_latents

* add from_latent tests

* style fix

* Fix style for test_modeling_dac.py
2025-09-04 11:29:53 +00:00
25b4a0d8ae Add sequence classification support for small Gemma 3 text models (#40562)
* add seq class for gemma3 text model

* add Gemma3TextForSequenceClassification to modeling file

* After run make fixup

* let's just check

* thiis is why it was crashing, tests were just failing...

* skip it, tested only for seq clf

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-09-04 09:44:59 +00:00
30a4b8707d CircleCI docker images cleanup / update / fix (#40681)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 10:42:18 +02:00
7f92e1f91a Mark Aimv2ModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels as flaky (#40683)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 10:30:14 +02:00
ca9b36a9c1 Avoid night torch CI not run because of irrelevant docker image failing to build (#40677)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 09:06:37 +02:00
d40e7ea52d Skip more fast v.s slow image processor tests (#40675)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 06:35:44 +02:00
34595cf296 Even more test data cached (#40636)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 21:20:37 +00:00
f22ec7f174 Benchmarking V2: framework impl (#40486)
* Start revamping benchmarking

* Start refactoring benchmarking

* Use Pandas for CSV

* import fix

* Remove benchmark files

* Remove sample data

* Address review comments

* Benchmarking v2

* Fix llama bench parameters

* Working checkpoint

* Readme touchups

* Remove unnecessary test

* Massage the framework a bit

* Small cleanup

* Remove unnecessary flushes

* Remove references to mock benchmark

* Take commit ID from CLI

* Address review comments

* Use Events for thread comms

* Tiny renaming
2025-09-03 22:26:32 +02:00
459c1fa47a refactor: use tolist instead of list comprehension calling .item() (#40646) 2025-09-03 19:25:29 +02:00
afd1393df1 Remove overwritten GitModelTest::test_beam_search_generate (#40666)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 18:55:45 +02:00
68b9cbb7f5 Skip test_prompt_lookup_decoding_matches_greedy_search for qwen2_audio (#40664)
* Skip `test_prompt_lookup_decoding_matches_greedy_search` for `qwen2_audio`

* Skip `test_prompt_lookup_decoding_matches_greedy_search` for `qwen2_audio`

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 18:43:35 +02:00
55676d7d4c Fix warning for output_attentions=True (#40597)
* Fix attn_implementation for output_attentions

* remove setting attention, just raise warning

* improve message

* Update src/transformers/utils/generic.py
2025-09-03 16:25:13 +00:00
b67608f587 Skip test_fast_is_faster_than_slow for Owlv2ImageProcessingTest (#40663)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 17:49:10 +02:00
30d66dc3bc Update check_determinism inside test_determinism (#40661)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 17:30:39 +02:00
3f40ebf620 Allow custom args in custom_generate Callables and unify generation args structure (#40586)
* Squashed commit of the following:

commit beb2b5f7a04ea9e12876696db66f3589fbae10c5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 16:03:25 2025 +0200

    also standardize _get_stopping_criteria

commit 15c25663fa991e0a215a7f3cdcf13a9d3a989faa
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 15:48:38 2025 +0200

    watch super.generate() usages

commit 67dd845be2202d191a54b2872f1cb3f71b74b7d6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:44:32 2025 +0200

    ops

commit 4655dfa28fd59d5dc083a41d8396de042d99858c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:41:36 2025 +0200

    wrong merge

commit 46478143994e7b27d51c972a7881e0fea3cb6e3c
Merge: a72c2c4b2f 8564e210ca
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:36:15 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into fix-custom-gen-from-function2

commit a72c2c4b2f9c0e09fe6ec7992d4d02bfa279da2a
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:04:59 2025 +0200

    ops5

commit e72f91411b961979bb3d271810f57905cee5b577
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 12:06:19 2025 +0200

    ops4

commit 12ca97b1078a42167143e0243036f6ef87d5fdac
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:58:59 2025 +0200

    ops3

commit 8cac6c60a318dd381793d4bf1ef3775823f3c95b
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:43:03 2025 +0200

    ops2

commit 4681a7d5dc6c8b96a515d9d79f06380c096b9a9f
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:40:51 2025 +0200

    ops

commit 0d72aa6cbd99a5933c5a95a39bea9088ee21e50f
Merge: e0d47e980e 5bb6186b8e
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:37:28 2025 +0200

    Merge branch 'remove-constrained-bs' into fix-custom-gen-from-function2

commit 5bb6186b8efbd5fdb8e3464a22f958343b9c450c
Merge: 44973dac7d b0db5a02f3
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:36:30 2025 +0200

    Merge branch 'main' into remove-constrained-bs

commit 44973dac7df4b4e2111c71f5fac918be21f3de52
Merge: 1ddab4bee1 893d89e5e6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:29:48 2025 +0200

    Merge commit '893d89e5e6fac7279fe4292bfa3b027172287162' into remove-constrained-bs

commit e0d47e980e26d32b028c2b402ccb71262637a7a7
Merge: 88128e4563 1ddab4bee1
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 10:52:50 2025 +0200

    Merge branch 'remove-constrained-bs' into fix-custom-gen-from-function2

commit 88128e4563c0be583728e1d3c639bc93143c4029
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 10:44:38 2025 +0200

    fix custom generate args, refactor gen mode args

commit 1ddab4bee159f6c20722e7ff5cd41d5041fab0aa
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Sun Aug 31 21:03:53 2025 +0200

    fix

commit 6095fdda677ef7fbeb06c05f4f914a11b45257b4
Merge: 4a8b6d2ce1 04addbc9ec
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 17:49:16 2025 +0200

    Merge branch 'remove-constrained-bs' of github.com:manueldeprada/transformers into remove-constrained-bs

commit 4a8b6d2ce18b3a8b52c5261fea427e2416f65187
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 17:48:25 2025 +0200

    restore and deprecate beam obkects

commit 04addbc9ec62dd4f59d15128e8cd9499e2cda3bb
Merge: e800c7841e becab2c601
Author: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Date:   Thu Aug 28 14:38:29 2025 +0200

    Merge branch 'main' into remove-constrained-bs

commit e800c7841e5c46ce5698fc9be309d0808f85d23c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 14:38:10 2025 +0200

    tests gone after green

commit 33971d21ac40aef76a7e1122f4a98ef28beadbe8
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 14:07:11 2025 +0200

    tests green, changed handling of deprecated methods

commit ab303835c184d0a87789da7aed7d8de5ba85d867
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 12:58:01 2025 +0200

    tests fix

commit ec74274ca52a6aa0b5f300374fda838609680506
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 12:32:05 2025 +0200

    ops

commit 0fb19004ccd285dcad485fce0865b355ce5493e0
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:45:16 2025 +0200

    whoops

commit c946bea5e45aea021c8878c57fcabc2a13f06fe5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:35:36 2025 +0200

    testing...

commit 924c0dec6d9ea6b4890644fe7f711dc778f820bb
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:22:46 2025 +0200

    sweeep ready for tests

commit b05aa771d3994b07cd460cda74b274c9e4f315e6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:13:01 2025 +0200

    restore and deprecate constraints

commit 9c7962d10efa7178b69d3c99e69663756e1cd979
Merge: fceeb383f9 c17bf304d5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 20:44:21 2025 +0200

    Merge branch 'remove-group-bs' into remove-constrained-bs

commit c17bf304d5cf33af7f34f9f6057915d5f5821dae
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 17:00:50 2025 +0200

    fix test

commit d579aeec6706b77fcc24c1f6806cd7277d7db56e
Merge: 822efd8c3c ed5dd2999c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 16:04:31 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into remove-group-bs

commit 822efd8c3cf475d079e64293aa06e4ab59740fd7
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 15:59:51 2025 +0200

    aaand remove tests after all green!!

commit 62cb274a4acb9f24201902242f1b0dc4e46daac1
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 11:48:19 2025 +0200

    fix

commit c89c892e7b24a7d71831f2b35264456005030925
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 11:45:20 2025 +0200

    testing that hub works the same

commit fceeb383f99e4a836679d67b1d2a8520152eaf49
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 20:06:59 2025 +0200

    draft

commit 6a9b384078f3798587ba865ac7ddfefc9a79e41c
Merge: 8af3af13ab 58cebc848b
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 15:00:05 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into remove-group-bs

commit 8af3af13abb85ca60e795d0390832f398a56c34f
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 11:55:45 2025 +0200

    Squashed commit remove-constrastive-search

* ops

* fix

* ops

* review

* fix

* fix dia

* review
2025-09-03 17:30:09 +02:00
a8f400367d Avoid attention_mask copy in qwen2.5 (#40658)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-03 15:17:22 +00:00
57f5668d0b Fix Metaclip modular conversion (#40660)
* Fix Metaclip modular conversion

* manually run check_copies
2025-09-03 16:13:50 +01:00
238a8274b4 feat(serving): add healthcheck (#40653) 2025-09-03 16:43:12 +02:00
f2416b4fd2 fix pipeline dtype (#40638)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-03 16:05:48 +02:00
5ea5c8179b Mark LongformerModelTest::test_attention_outputs as flaky (#40655)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 13:19:02 +00:00
fe1a9e0dba Remove TF/Flax examples (#40654)
* Remove TF/Flax examples

* Remove check_full_copies

* Trigger CI
2025-09-03 14:15:57 +01:00
5e2e496149 fix MetaCLIP 2 wrong link & wrong model names in the docstrings (#40565)
* fix MetaCLIP 2 wrong link & wrong model names in the documentation and docstrings

* ruff reformatted

* update files generated by modular

* update meta_clip2 to metaclip_2 to match the original

* _supports_flash_attn = False

---------

Co-authored-by: Yung-Sung Chuang <yungsung@meta.com>
2025-09-03 13:53:56 +01:00
03708ccf6f add DeepseekV3ForTokenClassification (#40641)
* add DeepseekV3ForTokenClassification

* fix typo

---------

Co-authored-by: json.bourne <json.bourne@kakaocorp.com>
2025-09-03 12:30:09 +00:00
c485c52db4 Skip test_prompt_lookup_decoding_matches_greedy_search for voxtral (#40643)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 11:45:29 +00:00
2bbf98a83d Fix: PIL image load in Processing utils apply_chat_template (#40622) 2025-09-03 13:06:05 +02:00
acc968c581 [CP] Add attention_mask to the buffer when the mask is causal (#40619)
Fix attention mask validation for context parallelism

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-03 10:19:35 +00:00
cb54ce4ec6 [auto-model] propagate kwargs (#40491)
propagate kwargs
2025-09-03 09:59:20 +00:00
ye
0f5e45a6d1 fix: gas for gemma fixed (#40591)
* fix: gas for gemma fixed

* feat: run fix-copies

* feat: added issue label
2025-09-03 08:44:14 +00:00
e690fe61e8 Fix too many requests in TestMistralCommonTokenizer (#40623)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 05:05:03 +02:00
00a8364271 🌐 [i18n-KO] Translated deepseek_v3.md to Korean (#39649)
* docs: ko: deepseek_v3.md

* feat: nmt draft

* fix: manual edits

* fix: glossary edits

* docs : 4N3MONE recommandced modified contents

* Update docs/source/ko/model_doc/deepseek_v3.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/deepseek_v3.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* add_toctree.yml

---------

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
2025-09-02 13:35:56 -07:00
ed49376a42 Remove random flag (#40629)
remove flag
2025-09-02 19:10:02 +02:00
d47ad91c3c Support TF32 flag for MUSA backend (#33187)
* Support MUSA (Moore Threads GPU) backend in transformers
Add accelerate version check, needs accelerate>=0.33.0

* Support TF32 flag for MUSA backend

* fix typo
2025-09-02 16:27:10 +00:00
a470f21396 Enable more ruff UP rules (#40579)
* Import Sequence from collections.abc

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff UP rules

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 17:29:59 +02:00
37103d6f22 Fix invalid typing (#40612)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 13:10:22 +00:00
4f542052b9 Remove unnecessary pillow version check (#40604)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 12:59:22 +00:00
8c60a7c385 Add collated reports job to Nvidia CI (#40470)
* Add collated reports job to Nvidia CI

* machine_type

* Move collated reports job to model_jobs

* Propagate repo id variable

* assifgn runner_type is self-scheduled-caller
2025-09-02 14:25:22 +02:00
97266dfd50 Fix flaky JambaModelTest.test_load_balancing_loss (#40617)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 13:58:16 +02:00
91be12bdc6 Avoid too many request caused by AutoModelTest::test_dynamic_saving_from_local_repo (#40614)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 12:08:52 +02:00
bbd8085b0b Fix processor chat template (#40613)
fix tests
2025-09-02 10:59:48 +02:00
b2b1c30b1b fix: continuous batching in transformers serve (#40479)
* fix: continuous batching in `transformers serve`

* fix: short circuit inner gen loop when prepare_next_batch prepared nothing

* docs: add comment explaining FastAPI lifespan

* test: add CB serving tests

* refactor: remove gen cfg max new tokens override bc unnecessary

* docs: add docstring for `ServeCommand::run`

* feat: use new `DecodeStream` API
2025-09-02 10:45:05 +02:00
8a091cc07c Disable cache for TokenizerTesterMixin temporarily (#40611)
* try no cache

* try no cache

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 08:40:04 +02:00
514b3e81b7 Multiple fixes to FA tests in AMD (#40498)
* Expectations for gemma3

* Fixes for Qwen2_5_VL tests

* Added expectation but underlying pb is still there

* Better handling of mrope section for Qwen2_5_vl

* Fixes for FA2 tests and reformat batch test for Qwen2_5_Omni

* Fix multi-device error in qwen2_5_omni

* Styel and repo-consistency

* Removed inherited test because fix in common

* slow tests fixes

* Style

* Fixes for qwen2_5_vl or omni for FA test
2025-09-01 20:49:50 +02:00
b3655507bb Pin torchcodec to 0.5 in AMD docker (#40598) 2025-09-01 20:39:55 +02:00
4da03d7f57 Reduce more test data fetch (#40595)
* example

* fix

* fix

* add to fetch script

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 18:07:18 +02:00
abf5900a76 [Tests] Fixup duplicated mrope logic (#40592)
cleanup duplicated logic
2025-09-01 17:22:34 +02:00
3beac9c659 Fix quite a lot of FA tests (#40548)
* fix_rope_change

* fix

* do it dynamically

* style

* simplify a lot

* better fix

* fix

* fix

* fix

* fix

* style

* fix
2025-09-01 16:42:50 +02:00
21e708c8fd Fix for missing default values in encoder decoder (#40517)
* Added default_value for is_updated and type check

* Forgot one

* Repo consistency
2025-09-01 16:11:23 +02:00
c99d43e6ec Fix siglip flaky test_eager_matches_sdpa_inference (#40584)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 15:17:25 +02:00
3c3dac3c12 Add Copilot instructions (#40432)
* Add copilot-instructions.md

* Fix typo

* Update .github/copilot-instructions.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-09-01 14:09:54 +01:00
2b71c5b7a6 Fix inexistent imports (#40580)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-01 13:05:00 +00:00
8e0b2c8baf Skip TvpImageProcessingTest::test_slow_fast_equivalence (#40593)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 15:03:34 +02:00
a543095c99 Fix typos (#40585)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-01 12:58:23 +00:00
8564e210ca 🚨 Remove Constrained Beam Search decoding strategy (#40518)
* Squashed remove-constrastive-search

* sweeep ready for tests

* testing...

* whoops

* ops

* tests fix

* tests green, changed handling of deprecated methods

* tests gone after green

* restore and deprecate beam obkects

* restore and deprecate constraint objects

* fix ci

* review
2025-09-01 12:34:48 +00:00
564be6d895 Support batch size > 1 image-text inference (#36682)
* update make nested image list

* fix make flat list of images

* update type anno

* fix image_processing_smolvlm

* use first image

* add verbose comment

* fix images

* rollback

* fix ut

* Update image_processing_smolvlm.py

* Update image_processing_idefics3.py

* add tests and fix some processors

* fix copies

* fix after rebase

* make the test cover chat templates

* sjip udop, no point in fixing it

* fix after rebase

* fix a few more tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: raushan <raushan@huggingface.co>
2025-09-01 12:26:07 +00:00
3bccb02616 🚨 Remove Group Beam Search decoding strategy (#40495)
* Squashed remove-constrastive-search

* testing that tests pass using hub

* fix

* aaand remove tests after all green!!
2025-09-01 13:42:48 +02:00
90953d5bc1 Fix custom generate relative imports (#40480) 2025-09-01 13:38:56 +02:00
2537ed4477 Update get_*_features methods + update doc snippets (#40555)
* siglip

* clip

* aimv2

* metaclip_2

* align

* align fixup

* altclip

* blip2 (make consistent)

* chineese clip

* clipseg

* flava

* groupvit

* owlv2

* owlvit

* vision_encoder

* clap

* x_clip

* fixup

* fix siglip2

* blip2

* fix blip2 tests (revert to original)

* fix docs
2025-09-01 12:37:43 +01:00
48ebae975e Fix llava image processor (#40588)
fix
2025-09-01 13:32:57 +02:00
db6821b79c Allow remi-or to run-slow (#40590)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 12:30:53 +02:00
6546f288a1 Fix CircleCI step passes in the case of pytest worker crash at test collection time (#40552)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 11:33:23 +02:00
cfed99d310 Fix test_eager_matches_sdpa_inference not run for CLIP (#40581)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 11:21:56 +02:00
1d742644c0 [qwen-vl] fix position ids (#40490)
* fix position ids

* fixup

* adjust tests since they are failing on main as well

* add a comment to make it clear
2025-09-01 09:10:41 +00:00
0b24507379 processor tests - use dummy videos (#40537)
* use dummy videos

* failing on main, new model merged had conflicts
2025-09-01 09:04:47 +00:00
b0db5a02f3 Set test_all_params_have_gradient=False for DeepseekV2ModelTest (#40566)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-30 22:46:31 +02:00
1363fceeec remove the redundant non maintained jieba and use rjieba instead (#40383)
* porting not maintained jieba to rjieba

* Fix format

* replaced the line with rjieba instead of removing it

* cut_all is not included as a parameter. cut_all is a seperate function rjieba

* rev

* jieba remove installation

* Trigger tests

* Update tokenization_cpm.py

* Update tokenization_cpm_fast.py

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-30 13:28:52 +02:00
36fddebcee pin pytest-rerunfailures<16.0 (#40561)
ping pytest-rerunfailures<16.0

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-30 12:58:44 +02:00
2d3b8863e8 Fix collated reports upload filename (#40556) 2025-08-30 09:35:51 +02:00
ce48e9cac0 Dev version 2025-08-29 20:17:34 +02:00
155fd926d2 Fix GptOssModelTest::test_assisted_decoding_matches_greedy_search_1_same (#40551)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
2025-08-29 15:53:53 +00:00
1067577ad2 fix gpt-oss out shape (#40535)
* fix out shape

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* reset gpt-oss modeling

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix copies

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-29 15:20:33 +00:00
7efb4c87ca Flaky CI is annoying (#40543)
* mark flaky

* and the non batch one
2025-08-29 16:47:44 +02:00
828a27fd32 Fix gpt-oss rope warning (#40550)
* fix

* fix print

* rm

* real fix

* fix

* style
2025-08-29 14:40:33 +00:00
74a24217f5 Add bfloat16 support detection for MPS in is_torch_bf16_gpu_available() (#40458)
* Add bfloat16 support detection for MPS (Apple Silicon) in is_torch_bf16_gpu_available

bfloat16 seems to have been supported for a few years now in Metal and torch.mps.

Make sure to allow it and not throw on bf16 usage with "Your setup doesn't support bf16/gpu." from TrainingArguments.

* Check bf16 support for MPS using torch method

Actually seems method exists: 5859edf113/torch/_dynamo/device_interface.py (L519)

It simply checks if you are on MacOs 14 or higher.

* Document Metal emulation for bf16 support

Add note about Metal emulation for bf16 support on M1/M2.

* Update bf16 support check for MPS backend

is_bf16_supported() not exposed even if defined on MPSInterface, use same approach as in accelerate pr.

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-29 14:37:15 +00:00
ffdd10fced Allow compression on meta device (#39039)
* disable gradient calculation for int weights

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* Update src/transformers/quantizers/quantizer_compressed_tensors.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* updated model procession before/after weight loading

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* fix style

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* reformat

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* fix style

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

---------

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
2025-08-29 15:49:15 +02:00
f0e778112f Clean-up kernel loading and dispatch (#40542)
* clean

* clean imporrts

* fix imports

* oups

* more imports

* more imports

* more

* move it to integrations

* fix

* style

* fix doc
2025-08-29 14:14:38 +02:00
f68eb5f135 Redundant code removal (#40534)
redundant code
2025-08-29 11:30:23 +00:00
d888bd435d Fix typos (#40511)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-29 11:25:33 +00:00
11a6b95553 Oupsy (#40544)
fix bump!
2025-08-29 12:59:49 +02:00
b07144ac27 tokenizers bump tokenizers version (#40540)
* bump tokenizers version

* use rc0

* ?

* fml

* update
2025-08-29 12:34:41 +02:00
008c0ba8e2 Fix SeamlessM4Tv2ModelWithTextInputTest::test_retain_grad_hidden_states_attentions (#40532)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 23:30:59 +02:00
89ef1b6e0b Set test_all_params_have_gradient=False for HunYuanMoEV1ModelTest (#40530)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 22:32:51 +02:00
2e0f1d6a37 [Qwen Omni/VL] Fix fa tests (#40528)
* fix

* style

* flaky flaky

* flaky flaky

* oopsie, we need the out of place for sure

* flaky flaky

* flaky flaky
2025-08-28 21:07:22 +02:00
68013c505a Improve Gemma3n model and tests (#39764) 2025-08-28 20:25:42 +02:00
ffcb344612 Lazy import torchcodec (#40526)
* lazy import

* parse version

* omg, we need to guard version parse as well
2025-08-28 18:57:14 +02:00
8c7f685079 Fix typo: 'casual' to 'causal' (#40374)
fix typo: 'casual' to 'causal'

Co-authored-by: demo <vamshika0210@gamil.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-28 09:17:37 -07:00
d61fab1549 skip some padding_matches_padding_free_with_position_ids for FA2 (#40521)
skip 1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 17:20:07 +02:00
31336ab750 Fix mistral3 tests after "[Kosmos 2.5] Rename checkpoints" (#40523)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 16:29:54 +02:00
851b8f281d [kernels] If flash attention2 is not installed / fails to import (cc on our cluster) default to kernels (#40178)
* first step if flash not installed but you set to use it

* try importing

* now default to using it

* update our tests as well

* wow yesterday I was not awake

* fixup

* style

* lol the fix was very very simple

* `RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
` for updated dockers

* push review comments

* fix

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-28 16:20:25 +02:00
de9e2d7a2e Skip some flex attn tests (#40519)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 15:43:38 +02:00
7e1aee4db6 [FA] Remaining Cleanup (#40424)
* fa cleanup

* flaky tests

* readd removed test and changeup comments to reflect the purpose

* flaky tests
2025-08-28 15:01:19 +02:00
893d89e5e6 [omni modality] support composite processor config (#38142)
* dump ugly option to check again tomorrow

* tiny update

* do not save as nested dict yet!

* fix and add tests

* fix dia audio tokenizers

* rename the flag and fix new model Evolla

* fix style

* address comments

* broken from different PRp

* fix saving layoutLM

* delete print

* delete!
2025-08-28 14:40:27 +02:00
becab2c601 Use the config for DynamicCache initialization in all modelings (#40420)
* update all

* remove the most horrible old code

* style
2025-08-28 14:32:30 +02:00
8acbbdcadf [serve] fix request_id unexpected (#40501)
* fix request-id in serving

* style

* fix
2025-08-28 14:16:28 +02:00
2300be3b41 sped up gguf tokenizer for nemotron test (#40509)
sped up tokenizer for nemotron test
2025-08-28 12:10:49 +00:00
b2b654afbf correct kes to keys. (#40489)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-08-28 12:00:22 +00:00
476cd7bab1 [vision] Improve keypoint-matching models docs (#40497)
fix options and add inference_mode
2025-08-28 12:31:21 +01:00
1499f9e356 [Kosmos 2.5] Rename checkpoints (#40338) 2025-08-28 13:30:41 +02:00
10ddfb0be5 Add more missing arguments (#40354)
Add missing arguments

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-28 12:21:51 +02:00
d10603f701 Add Apertus (#39381)
* init swissai model

* AutoModelForCausalLM

* AutoModelForCausalLM mapping

* qk norm and post ln optional

* fix wrong shape of qk norm: megatron uses head_dim

* automodel fixes

* minor fix in forward

* fix rope validation to accept llama3 scaling

* `SwissAIForTokenClassification` support

* Align `SwissAI` to v4.52.4

* Align `SwissAI` to v4.53.1

* Init CUDA xIELU

* `SwissAI*`->`Apertus*`

* ci fix

* check_docstring ignore ApertusConfig

* Licensing and placeholder tests

* Placeholder doc

* XIELU syntax

* `_xielu_python` optimization

* Fix xIELU

* [tmp] `{beta,eps}` persistent=False
until {beta,eps} saved in checkpoint

* Modular `Apertus`

* CUDA xIELU logging

* ci fix

* ci fix

* ci fix

* Update license

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update tests/models/apertus/test_modeling_apertus.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* `.utils.import_utils.is_torchdynamo_compiling`

* `Apertus` class ordering

* `past_key_value{->s}`, `make fix-copies`

* ci fix

* Remove unused configuration parameters

* `{beta,eps}` saved in checkpoint

* `{beta,eps}` Temporarily on CPU

* Suggestions

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* ci fix

* remove fx_compatible (deprecated)

* remove `rotary_embedding_layer`

As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.

* fully removing `Mask4DTestHard` class

Not needed (for now)

* switch to `dtype` instead of `torch_dtype`

Following this:
https://github.com/huggingface/transformers/pull/39782

* remove unused imports

* remove `cache_implementation="static"`

* +Apertus to `docs/source/en/_toctree.yml` for the doc builder

---------

Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com>
Co-authored-by: dhia680 <garbayad@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>
2025-08-28 11:55:43 +02:00
f9b9a5e884 Update quantization overview for XPU (#40331)
* update xpu quantization overview

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix aqlm tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gguf support

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix gguf tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu gguf precision error

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* replace deprecated models

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix import org

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update xpu ggml tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert wrong change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* xpu optimum-quanto goes green

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-28 09:52:59 +00:00
b824f4986f fix typo (#40484)
* fix typo

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* csm & qwen omni

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* format

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* Apply style fixes

* omni

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

---------

Signed-off-by: guochenxu <guochenxu@modelbest.cn>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-28 08:31:25 +00:00
c9ff166718 Various AMD expectations (#40510)
* AMD expectations for qwen2

* Added more detailled excpectation to smolvlm

* Added AMD expectations to TableTransformer

* Style
2025-08-28 10:15:21 +02:00
721d4aee81 Include machine type in collated reports filename (#40514) 2025-08-28 09:28:12 +02:00
98289c5546 [modular] Classes can now be defined and referenced in arbitrary order (without bringing unwanted dependencies) (#40507)
* remove future class from dependency graph

* convert all
2025-08-27 23:06:10 +02:00
e3d8fd730e docs(pixtral): Update Pixtral model card to new format (#40442)
* docs(pixtral): Update Pixtral model card to new format

* docs(pixtral): Change cuda into auto for device_map

* docs(pixtral): Apply suggestions from review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Apply suggestions from review, changing mistral-community into Mistral AI

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Apply suggestions from review [!TIP] part

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Finalize model card with tested code examples

This commit finalizes the update for the Pixtral model card.

* Fix the hfoption by the right one

* @BryanBradfo docs(pixtral): Changing the redirection of bitsandbytes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Add of ` to highlight the tokens

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Move image block per final review

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-27 11:38:51 -07:00
821384d5d4 Fix the CI workflow of merge to main (#40503)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 18:35:12 +02:00
304225aa15 Collated reports: no need to upload artifact (#40502)
No need to upload collated reports as gh artifact
2025-08-27 18:31:55 +02:00
3c343c6601 [Whisper] Add rocm expected results to certain tests (#40482)
* Add rocm expected results to certain tests

* Specify rocm version in expectations so we know origin. Improved var names

* Update test var names
2025-08-27 16:19:11 +00:00
6350636964 Fix qwen2_moe tests (#40494)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 16:22:04 +02:00
52aaa3f500 [EfficientLoFTR] dynamic image size support (#40329)
* fix: reverted efficientloftr embeddings computation to inference time with lru cache

* fix: added dtype and device for torch ones and zeros creation

* fix: fixed embed height and width computation with aggregation

* fix: make style

* fix error message

* fix fa2 tests

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-08-27 15:05:08 +01:00
ed5dd2999c [ESM] support attention API (#40370)
* ESM supports attention API

* supports flags

* fix tests

* fix copiees

* another fixup needed after fixing tests

* fix tests and make sure Evolla copied everything

* fix

* order

* forgot about "is_causal" for fa2

* cross attention can't be causal
2025-08-27 15:39:04 +02:00
8b804311ba [modular] Remove ambiguity in all calls to parent class methods + fix dependency graph (#40456)
* fix in modular

* remove leftover print

* fix everything except when it's in assignment

* fix assignment as well

* more general

* better

* better

* better comment

* docstring

* cleaner

* remove base

* doc
2025-08-27 14:51:28 +02:00
a3afebbbbe [modular] Use multi-processing + fix model import issue (#40481)
* add mp and simplify a bit

* improve

* fix

* fix imports

* nit
2025-08-27 14:51:12 +02:00
75d6f17de6 Validate GptOssConfig rope config after it's fully initialized (#40474)
* Validate GptOssConfig rope config after it's fully initialized

Fixes #40461

* Remove whitespaces
2025-08-27 10:16:58 +01:00
80f4c0c6a0 CI when PR merged to main (#40451)
* up

* up

* up

* up

* up

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 10:56:18 +02:00
ff8b88a948 Fix nightly torch CI (#40469)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-26 22:02:15 +02:00
74ad608a2b Not to shock AMD team by the cancelled workflow run notification ❤️ 💖 (#40467) 2025-08-26 20:53:24 +02:00
c8c7623f20 Update SegFormer model card (#40417)
* Update SegFormer model card

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update the segformer model card

* Remove quantization example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-26 08:27:25 -07:00
78f32c3917 [pipeline] Add Keypoint Matching pipeline (#39970)
* feat: keypoint-matcher pipeline

* docs: added keypoint-matcher pipeline in docs

* fix: added missing statements for repo consistency

* docs: updated SuperGlue, LightGlue and EfficientLoFTR docs

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* test: fixed run_pipeline_test

* update pipeline typing and docs

* update tests

* update docs snippets

* Fix import error

* fix: pipeline init

* pt framework

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-26 15:26:57 +01:00
6451294f6f [RoPE] explicit factor > implicit factor in YaRN (#40320)
explicit factor > implicit factor
2025-08-26 14:58:28 +01:00
5a8ba87ecf [fast_image_processor] fix image normalization for resize (#40436) 2025-08-26 13:49:51 +00:00
VED
0ce6709e70 deci gguf support (#38669)
* deci gguf support

* make style

* tests for deci

* try except removed

* style

* try except removed
2025-08-26 13:43:17 +00:00
263d06fedc Fix extra template loading (#40455)
* Fix extra template loading

* Reformat

* Trigger tests
2025-08-26 14:01:01 +01:00
58cebc848b flash_paged: s_aux may not exist (#40434)
Some implementations (i.e.,
https://huggingface.co/kernels-community/vllm-flash-attn3) support an
`s_aux` arg for attention sinks, but others
(https://huggingface.co/kernels-community/flash-attn) do not. If s_aux
is present in the kwargs, we forward it, otherwise we don't.

The user will still get an error if they use a model like gpt-oss-20b
with an implementation that does not support `s_aux`, but models that
don't use it won't error out. For example, [this is currently
failing](399cd5c04b/examples/pytorch/continuous_batching.py (L16))
because we are sending `s_aux: None` in the dict.
2025-08-26 13:15:59 +02:00
34108a2230 Continuous batching refactor (#40426)
* Rework of the CB example

* Further rework of CB example

* Refactor PA cache, slice on tokens, add debug prints -- WIP

* Slice cache -- WIP

* Added a mechanism to check batched outputs in CB script

* Less logging, debug flag for slice, !better reset! -- WIP

* QOL and safety margins

* Refactor and style

* Better saving of cb example

* Fix

* Fixes and QOL

* Mor einformations about metrics

* Further logging

* Style

* Licenses

* Removed some comments

* Add a slice input flag

* Fix in example

* Added back some open-telemetry deps

* Removed some aux function

* Added FA2 option to example script

* Fixed math (all of it)

* Added a simple example

* Renamed core to classes

* Made allocation of attention mask optionnal

* Style
2025-08-26 13:01:42 +02:00
49e168ff08 🚨 Remove Contrastive Search decoding strategy (#40428)
* delete go brrr

* fix tests

* review
2025-08-26 12:31:46 +02:00
b8184b7ce9 Make cache_config not mandatory (#40316)
* Relaxed assumptions on cache_config

* Review compliance

* Style

* Styyyle

* Removed default and added args

* Rebase mishapfix

* Propagate args to TorchExportableModuleForDecoderOnlyLM

* Fix the test I wanted  fixed in this PR

* Added some AMD expectation related to cache tests
2025-08-26 12:06:17 +02:00
32fcc24667 rename get_cuda_warm_up_factor to get_accelerator_warm_up_factor (#40363)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-26 09:56:35 +00:00
f690a2a1e0 [video processors] decode only sampled videos -> less RAM and faster processing (#39600)
* draft update two models for now

* batch update all VLMs first

* update some more image processors

* update

* fix a few tests

* just make CI green for now

* fix copies

* update once more

* update

* unskip the test

* fix these two

* fix torchcodec audio loading

* maybe

* yay, i fixed torchcodec installation and now can actually test it

* fix copies deepseek

* make sure the metadata is returrned when users request it

* add docs

* update

* fixup

* Update src/transformers/audio_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/glm4v/video_processing_glm4v.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* update

* what if we set some metadata attr to `None`

* fix CI

* fix one test

* fix 4 channel test

* fix glm timestemps

* rebase gone wrong

* raise warning once

* fixup

* typo

* fix copies

* ifx smolvlm test

* this is why torch's official benchmark was faster, set threads to `0`

* Apply style fixes

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-26 11:38:02 +02:00
64ae6e6b1d fix qwen25-vl grad acc (#40333)
* fix qwen25—vl grad acc

* fix Qwen2_5_VLForConditionalGeneration for accepts_loss_kwargs

* fix ci

* fix ci

* fix typo

* fix CI
2025-08-26 09:30:06 +00:00
6d2bb1e04d [Trainer] accelerate contextparallel support in trainer (#40205)
* initial context_parallel_size support in trainer

* For context parallelism, use AVG instead of SUM to avoid over-accounting tokens

* use parallelism_config.cp_enabled

* add parallelism_config to trainer state

* warn when auto-enabling FSDP

* fix some reviews

* WIP: somewhat matching loss

* Feat: add back nested_gather

* Feat: cleanup

* Fix: raise on non-sdpa attn

* remove context_parallel_size from TrainingArguments

* if we have parallelism_config, we defer to get_state_dict from accelerate

* fix form review

* Feat: add parallelism config support

* Chore: revert some unwanted formatting changes

* Fix: check None

* Check none 2

* Fix: remove duplicate import

* Update src/transformers/trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Fin

* require accerelate 1.10.1 and higer

---------

Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-26 09:28:48 +00:00
63caaea1fb Refactor ViT-like models (#39816)
* refactor vit

* fix

* fixup

* turn off FX tests

* AST

* deit

* dinov2

* dinov2_with_registers

* dpt

* depth anything (nit)

* depth pro (nit)

* ijepa

* ijepa (modular)

* prompt_depth_anything (nit)

* vilt (nit)

* zoedepth (nit)

* videomae

* vit_mae

* vit_msn

* vivit

* yolos

* eomt

* vitpose

* update auto backbone

* disable `fx` and export tests (dnov2, dpt, ijepa, vit, vitpose)

* fix kwargs for backbone

* fix

* convnext

* fixup

* update convnext layernorm

* fix-copies layer_norm

* convnextv2

* explicit output_hidden_states for models with backbones

* explicit hidden states collection for dinov2

* tests fixed

* fix DPT as well

* fix dinov2 with registers

* add comment
2025-08-26 11:14:06 +02:00
922e65b3fc Fix non FA2 tests after FA2 installed in CI docker image (#40430)
* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-26 10:36:50 +02:00
e68146fbe7 Fix collated reports model name entry (#40441) 2025-08-25 20:36:01 +00:00
8ce633cc75 InternVL MI325 test expectations (#40387)
* Adjust ROCm expectations

* MI355

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-08-25 22:00:35 +02:00
7637d298b3 Fix collated reports uploading (#40440) 2025-08-25 21:49:59 +02:00
fa59cf9c9f Fix https://github.com/huggingface/transformers/issues/40292 (#40439)
* Fix https://github.com/huggingface/transformers/issues/40292

* Trigger tests

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-08-25 20:12:57 +01:00
f0e87b436d Fix collated reports model directory traversal (#40437)
Fix model dir traversal
2025-08-25 18:01:58 +00:00
ef406902bf Gemma3 text fixes: Add expectations for MI325 (#40384)
* Add expectations for MI325

* Ruff

* Adjust CUDA expectations as well

* Another attempt for CUDA expectations
2025-08-25 19:57:50 +02:00
c81723d31b 🌐 [i18n-KO] Translated models.md to Korean (#39518)
* docs: ko: models.md

* feat: nmt draft

* fix: manual edits

* Resolved _toctree.yaml conflict during merge from main

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

* fix: update toctree

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-25 09:17:08 -07:00
6b5eab70e4 Remove working-dir from collated reports job (#40435) 2025-08-25 18:14:35 +02:00
1763ef2951 [docs] remove last references to transformers TF classes/methods (#40429)
* halfway through tasks

* complete

* Update utils/check_docstrings.py
2025-08-25 16:30:59 +01:00
eac4f00bdf Fix typo and improve GPU kernel check error message in MXFP4 quantization (#40349) (#40408)
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-25 15:21:55 +00:00
d8f2edcc46 Add tokenizer_kwargs argument to the text generation pipeline (#40364)
* Add `tokenizer_kwargs`  arg to text generation pipeline.

* chore: re-run CI

* Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline

* Fix `tokenizer_encode_kwargs` doc string.

* Fix note related to `tokenizer _kwargs` in text generation pipeline

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-25 15:21:19 +00:00
1a35d07f56 Update collated reports working directory and --path (#40433) 2025-08-25 15:18:26 +00:00
399cd5c04b Fix modular for modernbert-decoder (#40431)
* fix the modular

* CI
2025-08-25 16:50:49 +02:00
ea8d9c8f06 🚨 Remove DoLa decoding strategy (#40082)
* remove dola generation strategy

* add fast test
2025-08-25 16:33:27 +02:00
6bf6f8490c [Mxfp4] Add a way to save with a quantization method (#40176)
* add a test

* tempdir

* fix import issue[

* wow I am tired

* properly init

* i am not super familiar with quantizer api :|

* set to TRUE fro now

* full support

* push current changes

* will clean this later but the imports are a shitshow here

* this correctly saves the block and scales but forward seems broken

* quanitze was not correct

* fix storage

* why were bias even included

* finally!

* style

* fix style

* remove print

* lazy import

* up

* not sure what happens this works now?

* holy molly it was not so far

* okay this seems to work!

* workings!!!

* allow save_pretrained to create PR

* Apply suggestions from code review

* fixup

* add deqyabtze fakse as wek

* working new

* fix

* rm swizzle and unswizzle during saving

* rm print

* Update src/transformers/modeling_utils.py

* fix

* style

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
2025-08-25 16:27:19 +02:00
04c2bae3a8 Fix label smoothing incompatibility with multi-label classification (#40296)
* Fix label smoothing incompatibility with multi-label classification (#40258)

* Improve label smoothing multi-label check based on reviewer feedback

- Move check from LabelSmoother to Trainer.__init__() for better architecture
- Use model.config.problem_type instead of tensor inference for robustness
- Warn and disable smoothing instead of raising error for better UX
- Update test to verify warning behavior
2025-08-25 14:23:31 +00:00
3b5b9f6518 Fix processing tests (#40379)
* fix tests

* skip failing test in generation as well

* grounding dino was overwritten

* one more overwritten code

* clear comment
2025-08-25 14:50:54 +02:00
a0a37b3250 Gpt oss optim (#40304)
* enable fast index selecting

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix gpt-oss tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check tensor

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-08-25 14:36:33 +02:00
d73181b3fc Fix UnboundLocalError in WER metric computation (#40402)
Renamed wer metric variable to wer_metric to avoid naming conflict
with local variable assignment in compute_metrics function.

Co-authored-by: pranam-gf <pranam@goodfin.com>
2025-08-25 12:02:22 +00:00
11e12a715a Fix typo: 'seperator' to 'separator' in variable names (#40389)
Fixed 4 instances of the typo "seperator" → "separator" in variable names:
- 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py
- 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py

These typos were in variable names used for parsing path components in weight conversion scripts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-25 11:56:30 +00:00
40299134a8 Fix CI (hunyuan moe does not support fullgraph) (#40423)
fix flag
2025-08-25 12:01:28 +02:00
a2b37bfd58 Fix typo: 'casual' -> 'causal' in code and documentation (#40371) (#40407) 2025-08-25 09:32:15 +00:00
0031c044f8 [docs] flax/jax purge (#40372)
flax/jax purge
2025-08-25 10:25:00 +01:00
14b89fed24 fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194)
* fix to the typings which are unmatched to FA function signature

cumulative_seqlens_q/k -> cu_seq_lens_q/k:
- in the FlashAttentionKwargs in modeling_flash_attention_utils
- in the TransformersKwargs in generic
- in the PagedAttentionArgs in continuous_batching

It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor`

* format changes by ruff

* Update src/transformers/integrations/flash_paged.py

unused function arg in `PagedAttentionCache.update`

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* revert continuous_batching signiture, which is more meaningful

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-08-25 11:00:13 +02:00
ba095d387d 🧹 🧹 🧹 Get set decoder cleanup (#39509)
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* this should be explicit

* fix Nth ESM exception

* tryout decoder

* this as well

* revert again

* 🧠

* aaah ESM has two modelings aaah

* broom broom

* format

* wrong copies

* copies

* modular cleanups

* format

* modularities

* wrong mergefix

* seriously

* align with new model

* new model
2025-08-25 10:57:56 +02:00
2c55c7fc94 Reactivate a lot of tests skipped for no reason anymore (#40378)
* reactivate all the tests

* some tests still failing
2025-08-25 10:44:43 +02:00
4f9b4e62bc Run FA2 tests in CI (#40397)
up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-23 12:30:18 +02:00
28ca27cb2b HF papers in doc (#40381)
* HF papers

* clean

* Update src/transformers/models/gemma3n/configuration_gemma3n.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-22 15:07:08 -07:00
7d88f57fc6 Update README_zh-hans.md (#40380)
Fix a typo.
2025-08-22 18:22:26 +00:00
29ddcacea3 Rework the Cache documentation (#40373)
* start working the doc

* remove gemma2

* review
2025-08-22 17:06:28 +02:00
dab66f15a1 Chat Template Doc Fixes (#40173)
* draft commit

* draft commit

* Fixup chat_extras too

* Update conversations.md

* Update the toctree and titles

* Update the writing guide!

* Use @zucchini-nlp's suggestion

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-22 15:48:33 +01:00
0a21e870c7 Bug Fix: Dynamically set return_lse flag in FlexAttention (#40352)
* bug fix - return_lse dynamically set

* addressed compatibility with return type - flex_attention_forward

* rename variables

* revert changes to commits
2025-08-22 13:49:26 +00:00
894b2d84b6 Add GptOssForTokenClassification for GPT-OSS models (#40190)
* Add GptOssForTokenClassification for GPT-OSS models

* After run make fixup
2025-08-22 15:14:46 +02:00
56d68c6706 Addiing ByteDance Seed Seed-OSS (#40272)
add seed oss
2025-08-22 14:54:28 +02:00
8a6908c10d fix(example): align parameter names with the latest function definition for gdino (#40369) 2025-08-22 12:27:58 +00:00
7db228a92a [configuration] allow to overwrite kwargs from subconfigs (#40241)
allow to overwrite kwargs from subconfigs
2025-08-22 13:31:25 +02:00
19ffe0219d [processor] move commonalities to mixin (#40339)
* move commonalities to mixin

* revert - unrelated

* fix copies

* fix style

* comments
2025-08-22 13:04:43 +02:00
d8f6d3790a ⚠️⚠️ Use dtype instead of torch_dtype everywhere! (#39782)
* update everywhere

* style

* pipelines

* switch it everywhere in tests

* switch it everywhere in docs

* switch in converters everywhere

* update in examples

* update in model docstrings

* style

* warnings

* style

* Update configuration_utils.py

* fix

* Update configuration_utils.py

* fixes and add first test

* add pipeline tests

* Update test_pipelines_common.py

* add config test

* Update test_modeling_common.py

* add new ones

* post rebase

* add new

* post rebase adds
2025-08-22 12:34:16 +02:00
9c25820978 [pipelines] add support to skip_special_tokens in the main text generation pipelines (#40356)
* add support to skip_special_tokens in pipelines

* add test

* rm redundant
2025-08-22 10:12:46 +00:00
5c40e7a225 Change multimodal data links to HF hub (#40309)
change multimodal data links to HF hub
2025-08-22 11:50:04 +02:00
e018b77c89 wav2vec2 fixes (#40341)
* Changed datasets to avoid a datasets error

* Changed back split to test
2025-08-22 11:32:29 +02:00
d7fe3111ff Fix idefics3 vision embeddings indices dtype (#40360)
fix idefics3 vision embeddings

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-22 11:10:45 +02:00
cf487cdf1f HunYuan opensource (#39606)
* merge opensource_hunyuan

* add head_dim

* fix assertion error

* fix seen_tokens

* ready_for_upstream (merge request !17)

Squash merge branch 'ready_for_upstream' into 'main'

* fix configuration type&docstring
* fix style

* ready_for_upstream (merge request !18)

Squash merge branch 'ready_for_upstream' into 'main'
* add doc
* fix testcode
* fix configuration type&docstring

* rename base model

* remove assert

* update

* remove tiktoken

* update

* fix moe and code style (#3)

* update

* fix format

* update

* revert makefile

* fix moe config

* fix numel()

* remove prepare_inputs_for_generation

* fix kv_seq_len

* add docs/toctree

* remove unused paramter&add licence

* add licence

* remove unused paramter

* fix code

* dense modular

update import

fix

fix

use mistralmodel

fix qknorm

add sliding_window

make style

fix

dense done

hunyuan moe

fix import

fix modular

fixup

fixup

* update model path

* fix mlp_bias

* fix modular

* Fix modeling (#5)

* fix attention

* use llamamodel

* fix code

* Fix qk (#6)

* fix qk_norm

* fix

* fix modual

* Fix moe (#7)

* fix some moe code

* fix einsum

* try top1

* use top1

* Fix rotary (#8)

* fix rotary

* fix modeling

* fix modular

* fix testcode

* remove A13B unit test

* Fix moe v1 (#9)

fix moe & gate

* Fix gate norm (#10)

* add norm_topk_prob

* Fix testcase (#11)

* fix&skip test

* Fix testcase (#12)


* skip testcase

* Fix norm topk (#13)

* hardcode norm_topk_prob

* fix testcase

---------

Co-authored-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: Mingji Han <mingjihan@tencent.com>
2025-08-22 07:59:58 +00:00
8365f70e92 DOCS: Clarification on the use of label_names as an argument to TrainingArguments (#40353)
* Update trainer.md

* Update trainer.md

Removed the detail about label_names argument usage from the tip/ warning section

* Update training_args.py

Added the label_names usage clarification in the docstring

* Update trainer.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-21 17:19:04 -07:00
7c1169e21f [4/N]more docs to device agnostic (#40355)
* more docs to device agnostic

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 1

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update vitpose.md

* Update camembert.md

* Update camembert.md

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-08-21 10:22:26 -07:00
9568b506ed [generate] handle support for cache classes when num enc layers != num dec layers (#40277)
* handle support for cache classes when num enc layers != num dec layers

* handle overwrites

* one more corner case

* Update src/transformers/generation/utils.py

* Update src/transformers/generation/utils.py

* Apply suggestions from code review

* handle corner case :o
2025-08-21 17:35:18 +01:00
7f38068ae0 Qwen2.5-VL test fixes for ROCm (#40308) 2025-08-21 18:13:07 +02:00
cb1df4d26a [FA] Fix some model tests (#40350)
* fix

* cleanup, revert aimv2 fa changes

* fix aria

* i searched a long time but the cross dependency is for the recent models so...

* this was something... evolla

* fix modernbert decoder + make fa test more robust

* nit
2025-08-21 18:08:21 +02:00
f46f29dd7c Remove more PyTorch 2.2 compatible code (#40337)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 15:19:53 +00:00
128f42d370 [detection] use consistent dtype for Conditional and DAB DETR positional embeddings (#40300)
fix: use consistent dtype for sine positional embeddings
2025-08-21 15:49:56 +01:00
2121d09239 [serve] add cors warnings (#40112)
* add cors warnings

* Update src/transformers/commands/serving.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/commands/serving.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* make fixup

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-21 14:32:36 +01:00
b40b834ab1 Clean up XCodec and other codecs (#40348)
* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.

* Polish XCodec and standardize across codecs.

* Update src/transformers/models/xcodec/modeling_xcodec.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Format and fix test.

* Update tol.

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-08-21 15:32:00 +02:00
75aa7c7252 [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification (#35991)
* [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification

* fix the modular conversion
2025-08-21 15:16:03 +02:00
04b751f07d Fix attention vizualizer (#40285)
* make visualizer rely on create causal mask

* format

* fixup

* fixup

* read token

* read token, duh

* what is up with that token

* small tests?

* adjust

* try with flush

* normalize for ANSI

* buffer shenanigans
2025-08-21 13:13:35 +00:00
cyn
1e1db12304 (small) fix conditional for input_ids and input_embeds in marian (#40045)
* (small) fix conditional for input_ids and input_embeds in marian

* address comment
2025-08-21 15:13:14 +02:00
7f2f53424e Update test_spm_converter_bytefallback_warning (#40284)
fff

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-21 14:09:28 +02:00
11a49dd9e3 T5 test and target device fixes (#40313)
* Fix cache setup related issues

* Fix target-device-related issues

* Ruff

* Address review comments
2025-08-21 14:07:29 +02:00
c4513a9fe6 Fix links in Glm4vMoe configuration classes to point to the correct H… (#40310)
* Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository

* run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository
2025-08-21 11:42:53 +00:00
c7e6f9a485 Fix an infinite loop bug in recursive search of relative imports (#40326)
Fix bug in recursive search of relative imports
2025-08-21 11:39:43 +00:00
e95441bdb5 add type hints (#40319)
* add basic type hints to import module

* run make fixup

* remove optional

* fixes

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-21 12:19:59 +01:00
5c88d8fbcc Fix: Only call Trainer.align_special_tokens if model has "config" attribute (#40322)
* Only call Trainer.align_special_tokens if model has "config" attribute

* Add efficient test for training a model without model.config

* Reformat
2025-08-21 12:06:42 +01:00
c031f6f994 [docs] remove TF references from /en/model_doc (#40344)
* models up to F

* models up to M

* all models
2025-08-21 11:53:21 +01:00
7b060e5eb7 Add missing arguments to class constructors (#40068)
* Add missing arguments

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typos

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 10:22:38 +00:00
6ad7f29461 Fix deprecation warning version (#40343)
fix
2025-08-21 12:18:23 +02:00
adf84aec21 Add DeepseekV3ForSequenceClassification for Deepseek V3 models (#40200)
* Add Sequence Classification Support for Deepseek v3 model DeepseekV3ForSequenceClassification

* After run make fixup
2025-08-21 12:01:33 +02:00
1e2e28f3c8 Change Qwen2RMSNorm to RMSNorm from PyTorch (#40066)
* Unify Qwen2RMSNorm definitions and use RMSNorm from PyTorch

Signed-off-by: cyy <cyyever@outlook.com>

* subclass RMSNorm

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 11:58:35 +02:00
022af24fcc Fix qwen-omni processor text only mode (#40336)
* Fix qwen-omni processor text only mode

* remove try except

---------

Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster>
2025-08-21 11:57:32 +02:00
c99ed492c7 [docs] remove flax references from /en/model_doc (#40311)
* 1st commit

* all models up to D

* all models up to G

* all models up to M

* all remaining models
2025-08-21 10:52:54 +01:00
c2e3cc24e0 Fix chunked attention mask with left-padding (#40324)
* add fix

* add test

* raise proper warning for older versions

* fix

* fix and add 2nd test

* fix for flex and torch 2.5
2025-08-21 10:52:49 +02:00
242bb2cafc One cache class to rule them all (#40276)
* remove all classes

* fix generate

* start replacing everywhere

* finish removing everywhere

* typo

* typo

* fix

* typo

* remove num_layers=1

* CI

* fix all docstrings

* review

* style
2025-08-20 19:36:11 +02:00
1054494dd6 Update notification service amd_daily_ci_workflows definition (#40314) 2025-08-20 17:49:46 +02:00
139cd91713 Fix: Apply get_placeholder_mask in Ovis2 (#40280)
* Refactor special image mask

* Refactor get_placeholder_mask method

* Revert "Refactor special image mask"

This reverts commit 9eb1828ae930329656d6f323a510c5e6033e1f85.

* Fix

* Revert "Refactor get_placeholder_mask method"

This reverts commit 07aad6484bb08d6351d5b605e9db574d28edcd15.
2025-08-20 17:12:10 +02:00
5d906740d2 Update CI with nightly torch workflow file (#40306)
* fix nightly ci

* Apply suggestions from code review

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-08-20 16:59:00 +02:00
4977ec2ae8 [GPT OSS] Refactor the tests as it was not properly checking the outputs (#40288)
* it was long due!

* use the official kernel

* more permissive

* update the kernel as well

* mmm should it be this?

* up pu

* fixup

* Update test_modeling_gpt_oss.py

* style

* start with 20b
2025-08-20 16:47:41 +02:00
3b7230124b No more natten (#40287)
get rid off natten

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-20 16:10:15 +02:00
2df0c323cb byebye torch 2.1 (#40317)
* Bump minimum torch version to 2.2

* Remove is_torch_greater_or_equal_than_2_2

* update versions table

* Deprecate is_torch_sdpa_available (except for backward compat), remove require_torch_sdpa
2025-08-20 15:03:46 +01:00
c50f140be2 Add back _tp_plan attribute (#39944)
* Update modeling_utils.py

* make sure we update with the module's plan

* use public api

* oups

* update

* fix failing test

* Update src/transformers/integrations/tensor_parallel.py

* Update src/transformers/integrations/tensor_parallel.py

* fix

* make the API more friendly!

* fix tests

* fix styling

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-20 15:29:55 +02:00
a97213d131 Qwen2.5-Omni test fixes (#40307)
Updated expectations, and mp tests
2025-08-20 14:48:30 +02:00
ca543f822f Add support for Florence-2 (#38188)
* init

* add modular

* fixup

* update configuration

* add processing file

* update auto files

* update

* update modular

* green setup_and_quality ci

* it works

* fix some tests

* commit florence2

* update test

* make test cases done - 16 left

* style

* fix few test cases

* fix some tests

* fix init test

* update florence2 vision style

* hope is green

* fix init test

* fix init

* update modular

* refactor vision module

* fix: channel attention use dynamic scale

* update modular

* update

* update attention mask

* update

* fix naming

* Update src/transformers/models/florence2/processing_florence2.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* spatial block works

* more beautiful

* more more beautiful

* merge main

* merge main and fixup

* fix typing hint

* update modeling

* fix eager matches sdpa

* fix style

* fix compile test - all green

* remove florence2 language

* remove Florence2LanguageModel things

* fix style

* update florence2 model

* override prepare encoder_decoder for generation

* add weight conversion script

* rewrite channel attention to use sdpa

* eleminate 1 tranpose op

* support fa2

* fix quality check

* chore: reformat `test_modeling_florence2.py`

* some refactor for processor

* some refactor for processor

* update naming convention and remove BC

* make it pass the test

* fix: correct Embedding Cosine

* update comments and docstring

* support input_embeds

* support input embeds ideally

* fix style

* fix style

* fix style again :D

* add test prcoessor

* refactor processor and add test for processor

* reformat test processor

* make fixup

* fix schema check

* remove image_token

* ensure image token in tokenizer and fix integration tests

* fix processor test

* add more integration tests for large model and rename test_processor to test_processing

* test_assisted_decoding_sample should pass

* update doc and make model work with image text to text pipeline

* docs: add sdpa bagde

* resolve cyril's comments

* fix import torch error

* add helper get_placeholder_mask

* inherit from llava

* florence2 may not _supports_attention_backend because of bart ...

* move florence2 model card to multimodal

* let base model always return_dict

* fix style

* tiny update doc

* set   _checkpoint_conversion_mapping = {}

* fix code quality

* support flex and compile graph and move external func to internal func

* remove condition because it always true

* remove window funcs

* move post processor config out

* fix ci

* new intro to trigger test

* remove `kernel_size` argument

---------

Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-20 14:28:06 +02:00
959239debc Remove unnecessary contiguous calls for modern torch (#40315) 2025-08-20 12:24:14 +00:00
7d2aa5d6e6 🚨 [Flash Attention] Fix sliding window size (#40163)
* swa fix

* add comment, make fix symmetrical

* modify fa inference test to force swa correctness check

* fixup comment
2025-08-20 14:23:14 +02:00
3128db6927 chore: fix typo in find_executable_batch_size to match new 0.9 ratio (#40206) 2025-08-20 12:18:06 +00:00
ca0aaa8c74 [fix] Pass adamw optimizer parameters to StableAdamW (#40184)
* fix: pass adamw optimizer parameters to StableAdamW

* add test for stable_adamw initialization with trainer arguments

* address copilot suggestion

* fix: update weight_decay handling in stable_adamw kwargs

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-20 11:52:23 +00:00
a01f38b364 Fix GOT-OCR2 and Cohere2Vision image processor patches caculation (#40312)
fix got-ocr patches caculation

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-20 13:13:58 +02:00
a5f0b505a0 Remove OTel SDK dependencies (#40305) 2025-08-20 12:31:44 +02:00
d0f1a6ec36 Clean up X-Codec. (#40271)
* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.
2025-08-20 12:16:28 +02:00
da9452a592 [docs] delete more TF/Flax docs (#40289)
* delete some TF docs

* update documentation checks to ignore tf/flax

* a few more removals

* nit

* Update utils/check_repo.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-20 10:44:14 +01:00
a4e1fee44d [FA] Fix dtype in varlen with position ids (#40295)
fix
2025-08-20 11:15:55 +02:00
126bc03b4e Allow to be able to run torch.compile tests with fullgraph=True (#40164)
* fix

* address comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-20 10:42:33 +02:00
1d46091737 Add MetaCLIP 2 (#39826)
* First draft

* Make fixup

* Use eos_token_id

* Improve tests

* Update clip

* Make fixup

* Fix processor tests

* Add conversion script

* Update docs

* Update tokenization_auto

* Make fixup

* Use check_model_inputs

* Rename to lowercase

* Undo CLIP changes

* Address comment

* Convert all checkpoints

* Update auto files

* Rename checkpoints
2025-08-20 09:25:43 +02:00
0f9c9088d0 [3/3] make docs device agnostic, all en docs for existing models done (#40298)
docs to device agnostic cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-19 21:01:27 -07:00
eaa48c81e9 make model docs device agnostic (2) (#40256)
* doc cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* more models

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mixtral.md

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 13:10:03 -07:00
42fe769928 SmolVLM test fixes (#40275)
* Fix SmolVLM tests

* Add the proper CUDA expectations as well

* Split 'A10 and A100 expectations

* Ruff

---------

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>
2025-08-19 21:22:06 +02:00
4c017465bd Adjust ROCm test output expectations (#40279)
Adjust ROCm output expectations
2025-08-19 21:21:45 +02:00
0f9ce43687 Standardize BertGeneration model card (#40250)
* Standardize BertGeneration model card: new format, usage examples, quantization

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply reviewer feedback: update code examples

* Add missing code example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 11:22:13 -07:00
6ceb13fb22 SmolVLM and InternVL: Ensure pixel values are converted to the correct dtype for fp16/bf16 (#40121)
* Ensure pixel values are converted to the correct dtype for fp16/bf16

* add to modular
2025-08-19 10:39:08 -07:00
92f40da608 Update model card for gpt neox japanese (#39862)
* Update GPT-NeoX-Japanese model card

* Apply suggestions from code review

* Update gpt_neox_japanese.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 09:18:46 -07:00
3a4b2756cf docs: Update TrOCR model card to new format (#40240)
* docs: Update TrOCR model card to new format

* Updated Sugegestions
2025-08-19 09:17:45 -07:00
46d38546f3 Standardize RAG model card (#40222)
* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added pipeline and AutoModel usage examples
- Included quantization example with BitsAndBytesConfig
- Added notes and resources sections
- Removed abstract and FlashAttention badge

* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added AutoModel usage example
- Included quantization example with BitsAndBytesConfig
2025-08-19 09:16:10 -07:00
bd96e1e1cc docs(layoutlm): add missing id=usage to <hfoptions> tag in LayoutLM model card (#40273)
docs(layoutlm): add missing 'id=usage' to <hfoptions> tag in LayoutLM model card
2025-08-19 09:14:43 -07:00
8636b309e6 Fix chat CLI GPU loading and request_id validation issues (#40230) (#40232)
* Fix chat CLI GPU loading and request_id validation issues (#40230)

This commit addresses two critical bugs in the transformers chat CLI:

1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments
   - Chat CLI now automatically uses GPU when available instead of defaulting to CPU
   - Matches the behavior of the underlying serving infrastructure

2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema
   - Fixes "Unexpected keys in the request: {'request_id'}" error on second message
   - Allows request_id to be properly sent and validated by the server

Both fixes target the exact root causes identified in issue #40230:
- Users will now get GPU acceleration by default when available
- Chat sessions will no longer break after the second message

* Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming
2025-08-19 15:33:44 +00:00
bebeccb06a fix which routing method (#40283) 2025-08-19 16:35:13 +02:00
249d7c6929 Update image_processing_perception_lm_fast.py to allow for proper override of vision_input_type (#40252)
* Update image_processing_perception_lm_fast.py

Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute.

* Update processing_perception_lm.py to match kwargs vision input type

* Update image_processing_perception_lm_fast.py kwargs to signature args
2025-08-19 11:41:27 +00:00
r0
57bb6db6ee Skipping pytree registration in case fsdp is enabled (#40075)
* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Added pytree registration inside dynamic cache class

* Making ci/cd lords happy

* Adding a check if DynamicCache is already a leaf

* Adding try/catch for multiple initializations of DynamicCache in test suites

* Moving dynamic cache pytree registration to executorch

* Adding try catch back
2025-08-19 11:58:05 +02:00
5b3b7ea472 Add Kosmos-2.5 (#31711)
Add Microsoft Kosmos-2.5

---------

Co-authored-by: kirp@umich.edu <tic-top>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 11:56:03 +02:00
c93594e286 [detection] fix correct k_proj weight and bias slicing in D-FINE (#40257)
Fix: correct k_proj weight and bias conversion in D-FINE
2025-08-19 09:44:37 +00:00
2f1a8ad4ba Fix setting attention for multimodal models (#39984)
* fix

* use non-explicit `None`

* keep previously set attn if exists
2025-08-19 11:35:11 +02:00
a2e76b908b 🚨🚨 Switch default compilation to fullgraph=False (#40137)
* switch default

* docstring

* docstring

* rework tests and remove outdated restrictions

* simplify

* we need a check for static cache

* fix

* rename var

* fix

* revert

* style

* rename test
2025-08-19 11:26:22 +02:00
2b59207a72 Fix slow static cache export tests (#40261) 2025-08-19 11:24:07 +02:00
56c44213b3 [detection] fix attention mask for RT-DETR-based models (#40269)
* Fix get_contrastive_denoising_training_group attention

* Add bool attention_mask conversion
2025-08-19 09:15:56 +00:00
5d9a715e30 set inputs_embeds to None while generate to avoid audio encoder forward in generation process (#40248)
* set inputs_embeds to None while generate to avoid audio encoder forward in generation process

* set input_features to none instead

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-08-19 08:45:57 +00:00
28746cdc7b Remove MI300 CI (#40270)
Remove MI300 CI (in history if we need it back)
2025-08-19 08:23:39 +00:00
debc92e60a Skip broken tests (#40157)
skip these tests
2025-08-19 10:04:08 +02:00
6b5bd11723 docs: Update OLMo model card (#40233)
* Updated OLMo model card

* Update OLMo description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli example

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add bitsandbytes info

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 13:35:39 -07:00
e472efb9ac Fix benchmark workflow (#40254)
Correct init_db.sql path

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>
2025-08-18 18:14:16 +00:00
59862209ca Correct typo and update notes in docs Readme (#40234)
* Correct typo and update notes in docs readme

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 10:31:12 -07:00
a7eabf1dde Model card for NLLB (#40074)
* initializing branch and draft PR

* updated model card .md file

* minor

* minor

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* resolving comments + adding visuals

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* NllbTokenizerFast and NllbTokenizer added

* endline

* minor

* Update nllb.md

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 10:05:59 -07:00
01c03bf4ee fix: Catch correct ConnectionError for additional_chat_templates (#39874)
* fix: Catch correct ConnectionError for additional_chat_templates

* fix: don't catch timeout

* fix: formatting
2025-08-18 17:25:47 +01:00
2bcf9f6c7e Fixes for EncoderDecoderCache (#40008)
* Add expectation to t5 for rocm 9.4

* Made EncoderDecoderCache compatible with nn.DataParallel

* Fixed t5gemma EncoderDecoderCache

* Added todos in autoformer

* Ruff

* Init is self-contained

* Review compliance

* Fixed kwargs init of EncoderDecoderCache
2025-08-18 17:51:05 +02:00
aa45824919 [CI] Fix repo consistency (#40249)
* fix

* doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-18 17:32:17 +02:00
d6fad86d23 [serve] guard imports (#39825)
guard imports
2025-08-18 16:28:10 +01:00
MQY
7a0ba0d7d8 [typing] fix type annotation error in DepthPro model image processor (#40238)
* fix type annotation error in DepthPro model image processor

* fix

* run make fix-copies
2025-08-18 15:42:13 +01:00
00b4dfb786 Add chat_template (jinja2) as an extra dependency (#40128)
* add jinja2 as a dependency

* Make jinja2 a core dependency in install_requires

- Add jinja2 to install_requires list in setup.py for automatic installation
- Add jinja2 to runtime version checks in dependency_versions_check.py
- Resolves issue where pip install transformers doesn't install jinja2

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Make jinja2 a core dependency in install_requires

* Make jinja2 an extra dependency instead of adding a core dep

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-18 14:31:40 +00:00
f417a1aad4 remove transpose_for_scores call in ESM-2 (#40210)
* remove transpose_for_scores call

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* fix copied evolla code

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
2025-08-18 14:28:59 +00:00
a36d51e801 🚨 Always return Cache objects in modelings (to align with generate) (#39765)
* watch the world burn

* fix models, pipelines

* make the error a warning

* remove kwargs and return_legacy_cache

* fix reformer
2025-08-18 16:26:35 +02:00
57e230cdb2 Fix more pylint warnings (#40204)
Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-18 14:17:16 +00:00
47938f8f8d Add Ovis2 model and processor implementation (#37088)
* Add Ovis2 model and processor implementation

* Apply style fixes

* Add unit tests for Ovis2 image processing and processor

* Refactor image processing functions for clarity and efficiency

* Add Ovis2 ImageProcessorFast

* Refactor Ovis2 code

* Refactor Ovis2 model components and update processor functionality

* Fix repo consistency issues for Ovis2: docstring, config cleanup

* Update Ovis2 model integration tests

* Update Ovis2 configuration and processing classes for improved documentation

* Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES

* Fix conflict

* Fix import order

* Update image processor class names

* Update Ovis2 model structure

* Refactor Ovis2 configuration

* Fix typos

* Refactor Ovis2 model classes and remove unused code

* Fix typos

* Refactor Ovis2 model initialization

* Fiix typos

* Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py

* Add license and update type hints

* Refactor token function and update docstring handling

* Add license

* Add Ovis2 model support and update documentation

* Refactor Ovis2 model structure and enhance multimodal capabilities

* Update Ovis2 weight mapping for consistency and clarity in key patterns

* Remove unused 'grids' parameter from Ovis2 model and Update processing logic to handle image grids more efficiently.

* Refactor Ovis2 model test structure to include Ovis2Model

* Add optional disable_grouping param to Ovis2ImageProcessorFast

* Refactor type hints in Ovis2 modules

* Add licensing information in Ovis2 modules and tests

* Refactor Ovis2 model by removing unused methods

* Refactor Ovis2 model tests by renaming test classes and removing skipped tests

* Refactor Ovis2 model output classes

* Refactor Ovis2 weight conversion and Update model embedding classes

* Refactor Ovis2 model imports and remove unused functions

* Enhance vision configuration extraction in Ovis2 weight conversion

* Refactor Ovis2 model's forward method to remove interpolation option

* Update Ovis2 model documentation

* Refactor Ovis2 model input handling and tokenizer configuration

* Update return type hints in Ovis2 model

* Remove commented-out code

* fix config for tests and remove key mappings

* Update tokenizer configuration to use add_special_tokens method

* skip torchscript

* Fix image placeholder generation in Ovis2Processor

* Refactor Ovis2 model to rename visual_table to visual_embeddings_table

* Enhance Ovis2 model by adding vision_feature_select_strategy parameter

* Refactor Ovis2 model weights conversion and architecture

* Refactor Ovis2 model by removing vision_feature_select_strategy parameter

* Update Ovis2 model examples

* Refactor Ovis2 model

* Update Ovis2 model

* Update Ovis2 model configuration

* Refactor Ovis2 model test setup

* Refactor flash attention support

* Refactor

* Fix typo

* Refactor

* Refactor model classes

* Update expected output in Ovis2

* Refactor docstrings

* Fix

* Fix

* Fix

* Update input in tests

* Fix

* Fix get_decoder method

* Refactor

* Refactor Ovis2

* Fix

* Fix

* Fix test

* Add get_placeholder_mask

* Refactor Ovis2 model tests

* Fix

* Refactor

* Fix

* Fix

* Fix Ovis2 test

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-18 16:05:49 +02:00
2fe43376cd AMD scheduled CI ref env file (#40243)
* Reference env-file to be used in docker running the CI

* Disable MI300 CI for now
2025-08-18 15:23:27 +02:00
e4bd2c858d Fix ESM token_dropout crash when using inputs_embeds instead of input_ids (#40181)
* fix: Error after calling ESM model with input embeddings not input ids

* propagate changes to other models
2025-08-18 13:22:10 +00:00
6333eb986a Fix more typos (#40212)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-18 12:52:12 +00:00
e5886f9194 [SAM 2] Change checkpoints in docs and tests (#40213)
* change checkpoints in docs and tests

* add notebook
2025-08-18 11:21:34 +02:00
eb2f9da096 fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function (#40130)
* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* fix similar errer at qwen2_vl and do make fix-copies

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* Apply style fixes

---------

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-18 08:59:25 +00:00
6ce8f05375 Use correct model_input_names for PixtralImageProcessor (#40226)
add image_sizes to model_input_names
2025-08-18 08:06:52 +00:00
2914ceca20 Revert "Pin torch to 2.7.1 on CircleCI for now" + Final fix for too long with no output (#40201)
* Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)"

This reverts commit 31b6e6e1dac0d32f74ec5cd6b3c1868534ccd7b5.

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-18 08:40:53 +02:00
cd22550692 docs: Update LayoutLM model card according to new standardized format (#40129)
* docs: Update LayoutLM model card with standardized format

* Apply suggestions from code review

This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Address remaining review comments

* Address few more review comments:
1. remove transformer-cli section
2. put resources after notes
3. change API refs to 2nd level header

* Update layoutlm.md

* Update layoutlm.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-15 09:33:47 -07:00
05000aefe1 Fix GPT-OSS swiglu_limit not passed in for MXFP4 (#40197)
Add swiglu_limit = 7.0
2025-08-15 17:04:25 +02:00
3f4c85fef0 Add X-Codec model (#38248)
* add working x-codec

* nit

* fix styling + copies

* fix docstring

* fix docstring and config attribute

* Update args + config

* update convertion script

* update docs + cleanup

* Ruff fix

* fix doctrings
2025-08-15 16:24:12 +02:00
29e4e35927 Benchmarking improvements (#39768)
* Start revamping benchmarking

* Start refactoring benchmarking

* Use Pandas for CSV

* import fix

* Remove benchmark files

* Remove sample data

* Address review comments
2025-08-15 15:59:11 +02:00
de437d0d7a Update: add type hints to check_tokenizers.py (#40094)
* Update check_tokenizers.py

chore(typing): add type hints to check_tokenizers script

- Annotate params/returns for helper functions
- Keep tokenizer instances as `Any` to avoid runtime coupling
- Make `check_LTR_mark` return `bool` explicitly (no behavior change)

* Update check_tokenizers.py

chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py

- Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params
- Covers both PreTrainedTokenizer and PreTrainedTokenizerFast
- Exposes required methods (encode, decode, encode_plus, tokenize)
- Removes generic Any typing while staying implementation-agnostic
2025-08-15 12:41:28 +00:00
28a03fb78a Fix various Pylint warnings (#40107)
Tidy code

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:40:12 +00:00
ec85d2c44f Avoid CUDA stream sync (#40060)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:37:15 +00:00
c7afaa5b44 Remove _prepare_flash_attention_from_position_ids (#40069)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:35:03 +00:00
c167faa081 Fix typos (#40175)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:10:26 +00:00
5068fcd9a8 Add repr to EncoderDecoderCache (#40195)
* add repr

* oups
2025-08-15 12:57:49 +02:00
421175685d Fix fsdp for generic-task models (#40191)
* remove abc inheritance

* add fast test
2025-08-15 12:28:16 +02:00
4912d5b490 fix to avoid modifying a view in place (#40162)
* fix to avoid modifying a view in place

* add backward test in tensor parallel

* add test to test_modelig_gpt_oss.py

* linting
2025-08-15 10:30:49 +02:00
cc9997878a make model doc device agnostic (#40143)
* make model doc device agnostic

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update align.md

* Update aya_vision.md

* Update byt5.md

* refine

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update granitevision.md

* Update src/transformers/pytorch_utils.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add doc

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* 3 more

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-14 23:31:31 -07:00
85fce2e54c [MINOR:TYPO] Update base.py (#40169)
* [MINOR:TYPO] Update base.py

All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code)

Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns:  `ValueError: The task does not provide any default models for options ('EN', 'FR')`

It might be a good idea to allow for uppercase, but that's for another issue.

* [MINOR:TYPO] Update __init__.py
2025-08-14 22:53:57 -07:00
52c6c1bb6e Update dynamic attnt setter for multimodals (#39908)
* update

* fix the test for DepthPro

* PR comments

* wait, I didn't delete this in prev commit?

* fix

* better way

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-14 21:46:13 +02:00
31b6e6e1da Pin torch to 2.7.1 on CircleCI for now (#40174)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-14 20:19:35 +02:00
b02f2d8b6a Add dates to the model docs (#39320)
* added dates to the models with a single hf papers link

* added the dates for models with multiple papers

* half of no_papers models done

* rest of no_papers models also done, only the exceptions left

* added copyright disclaimer to sam_hw, cohere, cohere2 + dates

* some more fixes, hf links + typo

* some new models + a rough script

* the script looks robust, changed all paper links to hf

* minor change to handle technical reports along with blogs

* ran make fixup to remove the white space

* refactor
2025-08-14 10:08:46 -07:00
8a658ac119 Standardize BARTpho model card: badges, new examples, fixed broken im… (#40051)
* Standardize BARTpho model card: badges, new examples, fixed broken image section, and links (#36979)Update bartpho.md

* Update bartpho.md

Removed non-required/unsupported sections: Quantization, Attention visualizer, and Resources (plus stray tokenizer header).

Added code snippets which were suggested

* Update bartpho.md

Updated with necessary tags

* Update bartpho.md

* Update bartpho.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-14 09:55:27 -07:00
2b6cbedeb2 Add GptOssForSequenceClassification for GPT-OSS models (#40043)
* Add GptOssForSequenceClassification

* Tiny fix

* make fixup

* trigger CI rerun

* Check config type instead

---------

Co-authored-by: Yuefeng Zhan <yuefzh@microsoft.com>
2025-08-14 18:32:14 +02:00
b834cb8138 build: Add fast image processor tvp (#39529)
* build: add TvpImageProcessorFast

- Introduced TvpImageProcessorFast to enhance image processing capabilities.
- Updated image processing auto registration to include the new fast processor.
- Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes.

* fix: TvpImageProcessorFast with new resize method and update processing logic

* build: add TvpImageProcessorFast

* refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests

- Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py.
- Updated import order in test_image_processing_tvp.py for clarity.
- Minor adjustments to maintain code readability and consistency.

* fix: Enhance TvpFastImageProcessorKwargs and update documentation

- Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast.
- Updated the documentation in tvp.md to include the new class and its parameters.
- Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing.
- Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs.

* fix: tested now with python 3.9

* fix: remove tvp kwargs from docs

* simplify processing

* remove import and fix tests

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-08-14 15:48:18 +00:00
6f259bc83e Fix docs typo (#40167)
* DINOv3 model

* working version

* linter revert

* linter revert

* linter revert

* fix init

* remove flex and add convert to hf script

* DINOv3 convnext

* working version of convnext

* adding to auto

* Dinov3 -> DINOv3

* PR feedback

* complete convert checkpoint

* fix assertion

* bf16 -> fp32

* add fast image processor

* fixup

* change conversion script

* Use Pixtral attention

* minor renaming

* simplify intermediates capturing

* refactor DINOv3ViTPatchEmbeddings

* Refactor DINOv3ViTEmbeddings

* [WIP] rope: remove unused params

* [WIP] rope: rename period -> inv_freq for consistency

* [WIP] rope: move augs

* change inv_freq init (not persistent anymore)

* [WIP] rope: move coords to init

* rope - done!

* use default LayerScale

* conversion: truncate expected outputs

* remove commented code

* Refactor MLP layers

* nit

* clean up config params

* nit docs

* simplify embeddings

* simplify compile compat lru_cache

* fixup

* dynamic patch coords

* move augmentation

* Fix docs

* fixup and type hints

* fix output capturing

* fix tests

* fixup

* fix auto mappings

* Add draft docs

* fix dtype cast issue

* add push to hub

* add image processor tests

* fixup

* add modular

* update modular

* convert and test convnext

* update conversion script

* update prefix

* Update LayerNorm

* refactor DINOv3ConvNextLayer

* rename

* refactor convnext model

* fix doc check

* fix docs

* fix convnext config

* tmp fix for check docstring

* remove unused arg

* fix tests

* (nit) change init

* standardize gated MLP

* clear namings and sat493m

* fix tensors on different devices

* revert linter

* pr

* pr feedbak ruff format

* missing headers

* fix code snippet and collection link in docs

* DINOv3 description

* fix checkpoints in tests

* not doc fixes in configs

* output_hidden_states

* x -> features

* remove sequential

---------

Co-authored-by: Cijo Jose <cijose@meta.com>
2025-08-14 17:29:53 +02:00
41980ce93e [bugfix] fix flash-attention2 unavailable error for Ascend NPU (#40151)
* [bugfix] fix flash-attention2 unavailable error for Ascend NPU

* remove redundant apply_rotary_emb usage

* fix ruff check error

* pad_input and unpad_input use same implementation as fa2

* rollback redundant codes

* fix ruff check error

* optimize fa2 judgement logic
2025-08-14 14:21:39 +02:00
eba1d62091 [FA2] Fix it finally - revert fa kwargs preparation (#40161)
revert
2025-08-14 13:39:11 +02:00
1c5d2f7fb6 Replace self.tokenizer by self.processing_class (#40119) 2025-08-14 13:24:55 +02:00
cfe52ff4db [Continous Batching] set head_dim when config.head_dim is None (#40159)
* set head_dim when config.head_dim is None

* use model's actual TP setting
2025-08-14 13:23:27 +02:00
c47544b16f Fix CI: Use correct import in SAM for torchvision InterpolationMode (#40160)
fix ci
2025-08-14 10:53:23 +00:00
22e89e5385 [efficientloftr] fix bugs and follow original cross attn implementation strictly (#40141)
* fix: changed is_causal to be False

* fix: Added original cross attention bug

* fix: fixed the way bordel removal is computed

* fix: added missing normalization on coarse features

* test: fixed integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-14 10:42:59 +01:00
252364fd8e [Cohere2Vision] remove unused arg (#40103)
* remove unused arg

* remove the arg from test as well
2025-08-14 09:10:25 +00:00
e446372f76 Create self-scheduled-amd-mi355-caller.yml (#40134) 2025-08-14 01:33:45 +02:00
be1ab5103f Update Dockerfiles to install packages inside a virtual environment (#39098)
* Removed un-necessary virtual environment creation in Dockerfiles.

* Updated Dockerfiles to install packages in a virtual environment.

* use venv's python

* update

* build and trigger

* trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-13 23:51:52 +02:00
591708d9ce Add pytest marker: torch_compile_test and torch_export_test (#39950)
* new marker

* trigger CI

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-13 23:47:15 +02:00
12e49cda32 Fix quantized cache with only cache_implementation in generate (#40144)
* fix args

* comment
2025-08-13 23:21:41 +02:00
e651ae0a32 🌐 [i18n-KO] Translated gemma3.md to Korean (#39865)
* docs: ko: gemma3.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
2025-08-13 13:25:20 -07:00
0f9c2595cd updated visualBERT modelcard (#40057)
* updated visualBERT modelcard

* fix: Review for VisualBERT card
2025-08-13 12:47:32 -07:00
412c9c3030 Remove an old badly designed test (#40142)
remove it
2025-08-13 20:47:00 +02:00
eb5768a86e [docs] Fix ko toctree (#40138)
Update _toctree.yml
2025-08-13 11:24:58 -07:00
68a13cd4a6 Add Segment Anything 2 (SAM2) (#32317)
* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* nit fixes + optimization

* split sam2 and sam2_video in two models

* PR review part 1

* fix None for default slow processor of sam2

* remove unecessary code path in sam2_video

* refactor/simplify RoPE

* replace embedding module list with embedding matrix

* fix tests

* remove kernel

* nit

* use lru_cache for sine_pos_embeddings

* reorder sam2_video methods

* simplify sam2_video

* PR review part 1

* simplify sam2 video a lot

* more simplification

* update integration tests with updated conftest

* more explicit config for hieradet

* do post_processing outside of sam2 video model

* Improve Sam2VideoVisionRotaryEmbedding

* fix tests

* update docs and fix mask2former/oneformer

* avoid unnecessary reshapes/permute

* fix device concatenating points

* small dtype fix

* PR review

* nit

* fix style and finish up doc

* fix style

* fix docstrings

* fix modular

---------

Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-13 14:18:05 -04:00
25ad9c8c92 Fix Janus (#40140)
fix
2025-08-13 20:12:21 +02:00
bec6926696 gpt oss is important (#40139) 2025-08-13 19:49:54 +02:00
ab9108517a 🌐 [i18n-KO] Translated pipelines.md to Korean (#39577)
* docs: ko: pipelines.md

* feat: gpt draft

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update _toctree.yml

* Update _toctree.yml

번역 문서 수정

* Update pipelines.md

ToC 수정

* Update pipelines.md

---------

Co-authored-by: xhaktm <tnwjd318@hs.ac.kr>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-13 10:26:17 -07:00
20c6b478cd 🚨 Use lru_cache for sine pos embeddings MaskFormer (#40007)
* use lru_cache for sine pos embeddings maskformer

* fix calls to pos embed

* change maxsize to 1
2025-08-13 17:05:22 +00:00
6b728f1830 🌐 [i18n-KO] Translated grounding-dino.md to Korean (#39861)
* docs: ko: grounding-dino.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* docs: add AP explanation for better readability

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-13 10:01:05 -07:00
127e33f759 🌐 [i18n-KO] Translated optimizers.md to Korean (#40011)
* docs: ko: optimizers.md

* feat: optimizers draft

* fix: manual edits

* docs: ko: update optimizers.md

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>

* docs: ko: final updates to optimizers and toctree

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>
Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>
2025-08-13 10:00:47 -07:00
ac52c77a66 🌐 [i18n-KO] Translated gpt2.md to Korean (#39808)
* docs: ko: bamba.md

* feat: nmt draft

* fix: manual edits

* docs: ko: gpt2.md

* feat: nmt draft

* fix: manual edits

* Remove bamba.md from docs/source/ko/model_doc/

* Update _toctree.yml
2025-08-13 10:00:25 -07:00
5337f3052d 🚨🚨 [generate] ignore cache_implementation="hybrid" hub defaults (#40135)
* working?

* fix tests
2025-08-13 17:57:41 +02:00
e4223fa915 🌐 [i18n-KO] Translated main_classes/optimizer_schedules.md to Korean (#39713)
* docs: ko: main_classes/optimizer_schedules

* feat: nmt draft

* fix: improve TOC anchors and expressions in optimizer_schedules

- Add TOC anchors to all section headers
- Fix terminology and improve Korean expressions

* fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된'

Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization.

* fix: Use more natural Korean inheritance expression

Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology.

* fix: Use consistent '미세 조정' translation for 'finetuned models'

Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.
2025-08-13 08:23:09 -07:00
9e21e50241 🌐 [i18n-KO] Translated jamba.md to Korean (#39890)
* docs: ko: jamba.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestion

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>
2025-08-13 08:22:28 -07:00
486844579b 🌐 [i18n-KO] Translated main_classes/processors.md to Korean (#39519)
* docs: ko: processors.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2025-08-13 08:21:38 -07:00
f445caeb0f Fix hidden torchvision>=0.15 dependency issue (#39928)
* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT

* fix min torchvision version

* use InterpolationMode directly

* remove unused is_torchvision_greater_or_equal,

* nit
2025-08-13 15:13:42 +00:00
11537c3e0c [trainer] handle case where EOS token is None in generation_config (#40127)
* handle case where EOS token is None in gen config

* update eli5 dataset
2025-08-13 15:57:17 +01:00
8ef5cd6579 DOCS: Add missing space in SECURITY.md (#40087) 2025-08-13 12:57:37 +00:00
ebceef343a Collated reports (#40080)
* Add initial collated reports script and job definition

* provide commit hash for this run. Also use hash in generated artifact name. Json formatting

* tidy

* Add option to upload collated reports to hf hub

* Add glob pattern for test report folders

* Fix glob

* Use machine_type as path filter instead of glob. Include machine_type in collated report
2025-08-13 14:48:15 +02:00
e78571f5ce decoding_method argument in generate (#40085)
* factor out expand inputs

* callable arg

* improve docs, add test

* Update docs/source/en/generation_strategies.md

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-13 12:45:50 +00:00
8d19231bca [serve] allow array content inputs for LLMs (#39829)
fix bug; add tests
2025-08-13 11:26:19 +01:00
34a1fc6426 Fix QuantoQuantizedCache import issues (#40109)
* fix quantoquantized
2025-08-13 10:22:59 +00:00
060b86e21d changed xLSTMRMSNorm to RMSNorm (#40113)
* changed xLSTMRMS.. to RMS...

* fix linter error

---------

Co-authored-by: Nikita <nikita@Nikitas-MacBook-Pro.local>
2025-08-13 11:10:42 +02:00
849c3778c6 [bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM (#39975)
* [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models

* to cuda
2025-08-13 09:58:50 +02:00
85d536a93b 🌐 [i18n-KO] Translated tiny_agents.md to Korean (#39913)
* docs: ko: tiny_agents.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits
2025-08-12 22:54:16 -07:00
31ab7168ff remove sequence parallel in llama4 (#40084) 2025-08-13 00:12:45 +02:00
a1a4fcd03e Add model card for MobileViT (#40033)
* Add model card for MobileViT

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-12 11:36:59 -07:00
e5e73e4b95 [docs] Add reference to HF-maintained custom_generate collections (#39894)
decoding -> generation; add collections
2025-08-12 17:38:00 +01:00
0ce24f5a88 Fix Causality Handling in Flash Attention to Support Bidirectional Attention (#39707)
Fix the is_causal logic to enable bidirectional attention

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-12 16:16:28 +00:00
83dbebc429 [trainer] ensure special tokens in model configs are aligned with tokenizer at train time (#38441)
* tmp commit

* add test

* make fixup

* reset warns/info in test
2025-08-12 16:32:07 +01:00
9977cf1739 [Flash Attention] Fix flash attention integration (#40002)
* fix flash attention

* i got a stroke reading that comment

* change dropout kwarg back to before

* rename _fa3... as it's used for multiple variants and should work as fallback instead

* simplify imports and support kwargs for fa

* style

* fix comments order

* small fix

* skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart

* style

* allow fullgraph by preloading on init

* make globals "private"

* ci pls be happy

* change skip conditions based on backend flag (indicating missing mask interface)

* move globals support to a function to prepare kwargs

* style

* generalize supported kwargs

* small change to doc

* fix

* add comments

* style

* revert prep during generate

* style

* revert weird style changes

* add fa kwarg prep during generate with fixes back

* how did this even happen

* how

* add comment
2025-08-12 15:24:10 +00:00
b6ba595543 Default to dequantize if cpu in device_map for mxfp4 (#39993)
* default to dq if cpu

* an other check

* style

* revert some changes
2025-08-12 16:48:52 +02:00
a5fac1c394 Fix error on importing unavailable torch.distributed (#40038)
Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.
2025-08-12 16:30:51 +02:00
085e02383c Fix Qwen3 MoE GGUF architecture mismatch (#39976)
* fix qwen3moe gguf architecture

* Fix Qwen3Moe GGUF loading

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Jinuk Kim <jusjinuk@snu.ac.kr>
2025-08-12 13:38:48 +00:00
2ce0dae390 Switch the order of args in StaticCache (for BC and future logic) (#40100)
* switch order for BC and future logic

* in generate as well
2025-08-12 15:30:44 +02:00
f7cbd5f3ef Fix regression in mllama vision encoder (#40083)
fix mllama vision encoder

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-12 15:29:45 +02:00
35dc88829c Replace logger.warning with logger.warning_once in GradientCheckpointingLayer (#40091) 2025-08-12 15:26:47 +02:00
b1b46555cd Re-apply make style (#40106)
make style
2025-08-12 15:02:16 +02:00
a07b5e90f2 feat: add is_fast to ImageProcessor (#39603)
* feat: add `is_fast` to ImageProcessor

* test_image_processing_common.py 업데이트

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* feat: add missing BaseImageProcessorFast import

* fix: `issubclass` for discriminating subclass of BaseImageProcessorFast

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-08-12 12:14:57 +00:00
952fac100d Enable SIM rules (#39806)
* Enable SIM rules

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-12 12:14:26 +00:00
41d1717882 New DynamicSlidingWindowLayer & associated Cache (#40039)
* start adding the layer

* style

* improve

* modular

* fix

* fix

* improve

* generate integration

* comment

* remove old one

* remove

* fix

* fix

* fix

* fix all recompiles

* fix

* doc

* fix

* add text config check

* fix encoderdecoder cache

* add it for all models with sliding/hybrid support

* revert

* start fixing

* prophetnet

* fsmt

* fix ddp_data

* add test for mistral

* improve mistral test and add gemma2 test

* docstrings
2025-08-12 14:09:52 +02:00
ab455e0d88 Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock (#39743)
audio encodings now match conv weight dtype in Gemma3nAudioSSCPConvBlock
2025-08-12 12:08:28 +00:00
4b3a1a62cc Causal loss for ForConditionalGeneration (#39973)
* feat: add ForConditionalGeneration loss to LOSS_MAPPING

* consistent spelling of "recognized"
2025-08-12 14:03:09 +02:00
f6b6e17719 Add glm4.5&&glm4.5V doc (#40095)
* Docs: GLM-4-MoE & GLM-4V-MoE pages

* Docs: polish GLM-4V-MoE intro, remove placeholders; pin image

* Docs

---------

Co-authored-by: wujiahan <lambert@gmail.com>
2025-08-12 11:44:53 +00:00
1c5e17c025 Update Glm4V processor and add tests (#39988)
* update GLm4V and add tests

* Update tests/models/glm4v/test_processor_glm4v.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* remove min/max pixels for BC

* fix video tests

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-08-12 13:40:54 +02:00
913c0a8c33 [docs] Zero Shot Object Detection Task (#40096)
* refactor zsod task docs

* keeping the image guided od section

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/tasks/zero_shot_object_detection.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-08-12 11:43:38 +01:00
c6fbfab61b [fix] batch inference for llava_onevision (#40021)
* [fix] llava onevision batch inference

* style

* cannot pass inconsistent list & handle text-only case
2025-08-12 11:01:00 +02:00
86bb1fcd26 Revert FA2 kwargs construction (#40029)
* revert

* use imports

* went way too high in imports level

* style
2025-08-12 10:48:35 +02:00
3ff2e984d2 Fix PerceptionLM image preprocessing for non-tiled image input. (#40006)
* Fix PerceptionLM image preprocessing for non-tiled image input.

* Add test for single tile vanilla image processing.

* ruff format

* recover missing test skip

* Simplify test.

* minor test name fix
2025-08-12 08:40:22 +00:00
4668ef1459 Update notification service MI325 (#40078)
add mi325 to amd_daily_ci_workflows
2025-08-12 10:22:52 +02:00
1cea763ba4 feat: extract rev in attn_implementation kernels via @ (#40009)
* feat: extract rev in attn_implementation kernels via @

* fix: adjust for ruff

* fix: update regex and add explanatory comment

* fix: move attn_implementation kernel doc

* fix: remove extra line
2025-08-11 15:14:13 -04:00
e29919f993 [GPT Big Code] Fix attention scaling (#40041)
* fix

* update integration tests

* fmt

* add regression test
2025-08-11 19:01:31 +00:00
eca703026e chore: standardize DeBERTa model card (#37409)
* chore: standardize DeBERTa model card

* Apply suggestions from code review in docs

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: Update deberta.md with code cleanup suggestions

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update deberta.md

* Update deberta.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-11 10:30:37 -07:00
43001fd3c6 Fix time_spent in notification_service.py. (#40081)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-11 18:30:58 +02:00
5521c62b89 added Textnet fast image processor (#39884)
* feat: add fast image processor implementation for TextNet model

* chore: override to_dict method to TextNetImageProcessorFast for slow processor compatibility tests

* chore: update init method

* chore: coding and style checks

* chore: fixed code quality issue

* chore: override resize to handle size_divisor, move all preprocessing logic to child class

* fix: autoImageProcessor issue for textnet

* chore: cleanup

* simplify resize

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-08-11 11:44:31 -04:00
6b70d79b61 Fix repo consistency (#40077)
fix
2025-08-11 15:26:22 +02:00
7dd82f307b guard on model.eval when using torch.compile + FSDP2 (#37413)
guard on model.eval

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-11 13:22:42 +02:00
68eb1a9a63 Remove deprecated cache-related objects (#40035)
remove them
2025-08-11 10:30:14 +02:00
480653d271 fix: move super().__init__ after vision_config init in Mistral3Config (#40063)
fix: move super().__init__ after vision_config init in Mistral3Config (#40062)
2025-08-11 09:21:54 +02:00
502f253e20 [gemma3] update conversion key mapping (#39778)
update conversion key mapping
2025-08-11 09:21:13 +02:00
3124d1b439 [qwen-vl] fix beam search with videos (#39726)
* fix

* fix copies
2025-08-11 09:21:04 +02:00
1372a5b8c4 fix: resolve triton version check compatibility on windows (#39986)
* fix: resolve triton version check compatibility on windows

* style: remove trailing space

* fix: fix typo

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-11 08:53:19 +02:00
99c747539e unpin torchcodec==0.5.0 and use torch 2.8 on daily CI (#40072)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-10 22:27:39 +02:00
b59140b696 Update HuBERT model card according to template (#39742)
* Update HuBERT model card according to template

Standardized HuBERT doc, added ASR examples, Flash Attention 2 support, and quantization section.

* Address review comments and changes requested to hubert.md

* Update hubert.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-10 11:32:45 -07:00
f4d57f2f0c Revert "fix notification_service.py about time_spent" (#40044)
Revert "fix `notification_service.py` about `time_spent` (#40037)"

This reverts commit d2ba153b29feb9cc0e9818c1ce63a07679b47250.
2025-08-08 22:32:24 +02:00
7b20915f4e GLM-4.5V Model Support (#39805)
* init

* update

* uupdate

* ruff

* t patch is 2 defalut not 1

* draft

* back

* back1

* update

* config update

* update using glm-41 format

* add self.rope_scaling = config.rope_scaling

* update config

* update

* remove the processor

* update

* fix tests

* update

* for test

* update

* update 2126

* self.rope_scaling is missing in GLM4MOE lets add it

* update

* update

* Update modular_glm4v_moe.py

* change config

* update apply_multimodal_rotary_pos_emb

* format

* update

* Delete 3-rollout_qas_thinking_answers.py

* use right name

* update with place holder

* update

* use right rotary

* Update image_processing_glm4v_fast.py

* rope_config_validation needs to rewrite the entire config file in modular

* update

* changed name

* update

* Update modeling_glm4v_moe.py

* _init_weights shoud be add in Glm4vMoePreTrainedModel

* remove use_qk_norm

* Update modular_glm4v_moe.py

* remove use_qk_norm as it is not use

* fix style

* deprecations are not needed on new models

* fix merge issues

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-08-08 17:39:52 +02:00
d2ba153b29 fix notification_service.py about time_spent (#40037)
temp

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-08 17:11:16 +02:00
f639c0c780 Bnb failling tests (#40026)
* initial commit

* style

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-08 16:28:00 +02:00
a96cccd0dd Tie weights recursively on all submodels (#39996)
* recursive call

* add missing keys

* remove bad keys
2025-08-08 16:03:16 +02:00
a78263dbb5 fix 2025-08-08 15:32:23 +02:00
dc11a3cbb2 [core] Refactor the Cache logic to make it simpler and more general (#39797)
* Simplify the logic quite a bit

* Update cache_utils.py

* continue work

* continue simplifying a lot

* style

* Update cache_utils.py

* offloading much simpler

* style

* Update cache_utils.py

* update inits

* Update cache_utils.py

* consistemncy

* Update cache_utils.py

* update generate

* style

* fix

* fix

* add early_initialization

* fix

* fix mamba caches

* update

* fix

* fix

* fix

* fix tests

* fix configs

* revert

* fix tests

* alright

* Update modeling_gptj.py

* fix the constructors

* cache tests

* Update test_cache_utils.py

* fix

* simplify

* back to before -> avoid compile bug

* doc

* mistral test

* llama4 test dtype

* Update test_modeling_llama4.py

* CIs

* Finally find a nice impl

* Update cache_utils.py

* Update cache_utils.py

* add lazy methods in autodoc

* typo

* better doc

* Add detailed docstring for lazy init

* CIs

* style

* fix
2025-08-08 14:47:21 +02:00
95510ab018 Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991) (#40024)
* Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991)

* Switched definition of optional from| None to Optiona[] (Issue #39991)

---------

Co-authored-by: Laurenz Ruzicka <Laurenz.Ruzicka@ait.ac.at>
2025-08-08 10:43:42 +00:00
5c3fb7f731 Harmonize past_key_value to past_key_valueS everywhere (#39956)
* all modulars and llama

* apply modular

* bert and gpt2 copies

* fix imports

* do it everywhere

* fix import

* finalize it

* fix

* oups set it in modular

* style

* fix

* Add 1 version to deprecation cycle

* Update modeling_layers.py
2025-08-08 11:52:57 +02:00
2469cce621 Fix an annoying flaky test (#40000)
annoying flaky test
2025-08-08 10:32:51 +02:00
fe1bf82159 Higgs modules_to_not_convert standardization (#39989)
fix higgs
2025-08-08 10:22:59 +02:00
b374c3d12e Fix broken image inference for Fuyu model (#39915)
* fix fuyu

Signed-off-by: Isotr0py <2037008807@qq.com>

* oops

Signed-off-by: Isotr0py <2037008807@qq.com>

* run test on GPU

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* clean unused

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* revert

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* add fuyu multimodal test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-08 07:21:49 +00:00
4d57c39007 pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI (#40013)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 23:05:39 +02:00
3e0333fa4a Update expected output values after #39885 (part 2) (#40015)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 22:52:53 +02:00
12f248bced Raising error when quantizing a quantized model (#39998)
* error when quantizing a quantized model

* style
2025-08-07 20:37:25 +00:00
efaf3714dc docs: fix duplication in 'en/optimizers.md' (#40014) 2025-08-07 13:28:43 -07:00
ca4cbb1e3f unpin torch<2.8 on circleci (#40012)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 21:31:17 +02:00
78922577e9 FA2 can continue generation from cache (#39843)
* add fa2 support to continue generation from cache

* update q-len
2025-08-07 19:26:23 +02:00
9bfbdd2945 Fix default values of getenv (#39867)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-07 17:25:40 +00:00
692d336908 Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips (#39965)
* fix hgnet docs and image-classification pipeline

* use positional argument

* fix dit close hfoptions tag

* fix alphabet order

* fix hgnnet modular docstring

* Update hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: hgnet reference

* change hgnet to en doc

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-07 09:33:29 -07:00
0659214196 fix: remove CHAT_TEMPLATE import in tests for deepseek-vl (#40003)
* remove CHAT_TEMPLATE import in tests

* update and use prepare_processor_dict
2025-08-07 16:19:36 +00:00
27997eeb8d Fix missing video inputs for PerceptionLM. (#39971)
* Fix missing video inputs for PerceptionLM.

* Minor fix for vanilla input image (only C,H,W, no tiles dim).

* Revert "Minor fix for vanilla input image (only C,H,W, no tiles dim)."

This reverts commit 181d87b964e59c4118035a9fd4f530c6e551ba9f.
2025-08-07 15:54:45 +00:00
bf1bd6ac1f Fix int4 quantized model cannot work with cpu (#39724)
* Fix int4 quantized model cannot work with cpu

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update the comments

Signed-off-by: yuanwu <yuan.wu@intel.com>

* update

Signed-off-by: yuanwu <yuan.wu@intel.com>

* update

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-07 15:24:00 +00:00
43d3b1931a Update expected output values after #39885 (part 1) (#39990)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 16:00:28 +02:00
d5a0809707 Fix consistency (#39995)
* modular

* fix
2025-08-07 15:52:40 +02:00
b347e93567 [typing] Fix return typehint for decoder and inv_freq annotation (#39610)
* fix return typehint for decoder and annotate inv_freq

* fix modular

* Fix consistency

* Move annotation on class level

* missing annotations

* add comment
2025-08-07 14:10:22 +01:00
7188e2e28c Bump transformers from 4.48.0 to 4.53.0 in /examples/tensorflow/language-modeling-tpu (#39967)
Bump transformers in /examples/tensorflow/language-modeling-tpu

Bumps [transformers](https://github.com/huggingface/transformers) from 4.48.0 to 4.53.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.48.0...v4.53.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.53.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-07 12:13:48 +01:00
2b19a06692 Fix gemma3n feature extractor's incorrect squeeze (#39919)
* fix gemma3n squeeze

Signed-off-by: Isotr0py <2037008807@qq.com>

* add regression test

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-07 18:34:28 +08:00
555cbf5917 [Idefics] fix device mismatch (#39981)
fix
2025-08-07 11:12:04 +02:00
597ed1a11d Various test fixes for AMD (#39978)
* Add amd expectation in internvl

* Add amd expectation to llama

* Added bnb decorator for a llava test that requires bnb

* Added amd expectation for mistral3

* Style
2025-08-07 10:57:04 +02:00
6121e9e46c Support input_embeds in torch exportable decoders (#39836)
* Support input_embeds in torch exportable decoders

* Hybrid cache update

* Manually change some callsites

* AI changes the rest of the call sites

* Make either input_ids/inputs_embeds mandatory

* Clean up

* Ruff check --fix

* Fix test

* pr review

* Revert config/generation_config changes

* Ruff check
2025-08-07 08:51:31 +00:00
cdeaad96b7 [superglue] Fixed the way batch mask was applied to the scores before match assignment computation (#39968)
fix: mask filling to score was wrong
2025-08-07 09:49:39 +01:00
2593932f10 Gemma3 fixes (#39960)
* Fix multiple devices issue

* Added expectations for rocm 9.4

* Ruff
2025-08-07 09:57:21 +02:00
513f76853b Modular fix: remove the model name in find_file_type (#39897)
* remove the model name in the class name

* add comment
2025-08-06 23:31:07 +00:00
743bb5f52e chore: update Deformable_Detr model card (#39902)
* chore: update Deformable_Detr model card

* fix: added pipeline, automodel examples and checkpoints link

* Update deformable_detr.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-06 12:45:14 -07:00
ac0b468465 [bugfix] fix flash_attention_2 unavailable error on Ascend NPU (#39844) 2025-08-06 17:48:52 +00:00
cf243a1bf8 Fix fix_and_overwrite mode of utils/check_docstring.py (#39369)
* bug in fix mode of check_docstring
2025-08-06 19:37:25 +02:00
6902ffa505 remove triton_kernels dep with kernels instead (#39926)
* remove dep

* style

* rm import

* fix

* style

* simplify

* style
2025-08-06 19:31:20 +02:00
cb2e0df2ec [image processor] fix glm4v (#39964)
* fix glm4v image process

* Update src/transformers/models/glm4v/image_processing_glm4v.py

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-06 17:46:58 +01:00
9ab75fc428 fix typo (#39936)
* fix typo

* fix modular instead

* fix

---------

Co-authored-by: y.korobko <y.korobko@tbank.ru>
2025-08-06 16:21:24 +00:00
43b3f58875 Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts (#39959)
* Fix grammatical error: expert_hitted -> expert_hit in MoE implementations

* Fix grammatical error: hitted_experts -> hit_experts in MoE implementation
2025-08-06 15:45:19 +00:00
dff6185d61 docs: fix typo in 'quantization-aware training' (#39904) 2025-08-06 14:52:43 +00:00
c7844c7a8e Enable gpt-oss mxfp4 on older hardware (sm75+) (#39940)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-06 13:39:21 +00:00
dd70a8cb9d Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953)
* Fix MXFP4 quantizer validation to enable CPU dequantization

Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.

* Add tests for MXFP4 quantizer CPU dequantization validation

* fix: format mxfp4 test file with ruff
2025-08-06 15:20:41 +02:00
82eb67e62a [docs] ko toc fix (#39927) 2025-08-06 10:12:34 +00:00
9e76a6bb54 circleci: pin torch 2.7.1 until torchcodec is updated (#39951)
circleci torch 2.7.1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-06 11:18:00 +02:00
910b319357 Fix CI: Tests failing on CPU due to torch.device('cpu').index being None (#39933)
replace routing_weights.device.index with a
2025-08-06 10:22:43 +02:00
369c99d0ce Avoid utils/check_bad_commit.py failing due to rate limit (requesting api.github.com) (#39918)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-05 21:52:20 +02:00
b771e476a8 [CI] post-GptOss fixes for green CI (#39929) 2025-08-05 20:04:59 +02:00
eb6e26acf3 Dev version 2025-08-05 18:09:30 +02:00
c54203a32e gpt_oss last chat template changes (#39925)
Last chat template changes
2025-08-05 18:08:08 +02:00
7c38d8fc23 Add GPT OSS model from OpenAI (#39923)
* fix

* nice

* where i am at

* Bro this works

* Update src/transformers/integrations/tensor_parallel.py

* cleanups

* yups that was breaking

* Update src/transformers/models/openai_moe/modeling_openai_moe.py

* gather on experts and not mlp

* add changes for latest convert branch

* adds options to get output_router_logits from config

* bring chat temlate + special tokens back into the script.

* initial commmit

* update

* working with shards

* add model.safetensors.index.json

* fix

* fix

* mxfp4 flag

* rm print

* Fix PAD/EOS/BOS (#18)

* fix pad/eos/bos

* base model maybe one day

* add some doc

* special tokens based on harmony.

* add in tokenizer config as well.

* prepare for rebase with main

* Fix for initialize_tensor_parallelism  now returning 4-tuple

```
[rank0]:   File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module>
[rank0]:     model = AutoModelForCausalLM.from_pretrained(
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained
[rank0]:     tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None)
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 3)
```

* mxfp4

* mxfp4 draft

* fix

* fix import

* draft

* draft impl

* finally working !

* simplify

* add import

* working version

* consider blocks and scales

* device mesh fix

* initial commit

* add working dequant + quant logic

* update

* non nan, gibberish output

* working EP + quantization finally !

* start cleaning

* remove reversing process

* style

* some cleaning

* initial commmit

* more cleaning

* more cleaning

* simplify

* more cleaning

* rm duplicated function

* changing tp_plan

* update tp plan check

* add loading attribute

* dequantizing logic

* use subfunctions

* import cleaning

* update_param_name

* adds clamped swiglu

* add clamping to training path

* simplify dequant logic

* update

* Bad merge

* more simplifications & tests

* fix !

* fix registering custom attention

* fix order

* fixes

* some test nits

* nits

* nit

* fix

* Clamp sink logits

* Clean

* Soft-max trick

* Clean up

* p

* fix deepspeed

* update both modeling and modular for cleanup

* contiguous

* update tests

* fix top_k router call

* revert renaming

* test nits

* small fixes for EP

* fix path for our local tests

* update as I should not have broken that!

* fix the loss of mixtral

* revert part of the changes related to router_scores, kernel probably no ready for that!

* deleting a small nit

* update arch

* fix post processing

* update

* running version but not expected output

* moving to cuda

* initial commit

* revert

* erroring when loading on cpu

* updates

* del blocks, scales

* fix

* style

* rm comm

* comment

* add comment

* style

* remove duplicated lines

* Fix minor issue with weight_map conversion script

* fix sampling params

* rename to final name

* upate pre-final version of template

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

* fix batched inference

* serve fixes

* swizzle !

* update final chat template by Matt.

* fix responses; pin oai

* sinplify

* Thanks Matt for his tireless efforts!

Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* fix

* Use ROCm kernels from HUB

* Make kernel modes explicit

* update final chat template by Matt. x2

* Thanks Matt for his tireless efforts!

Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>

* Fix installation

* Update setup.py

Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>

* allow no content

* fix: update message handling in write_tokenizer function

* Fix template logic for user message role

* last nits for CB and flash_paged!

* there was one bad merge

* fix CB (hardcode for now, its just using kv groups instead)

* fix

* better fix for device_map

* minor device fix

* Fix flash paged

* updates

* Revert "remove dtensors, not explicit (#39840)"

This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* update

* Revert "remove dtensors, not explicit (#39840)"

This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* fix merge

* fix

* Fix line break when custom model indentity

* nits testing

* to locals first and pass sliding window to flash paged

* register modes for MegaBlocksMoeMlp

* add integration test in fixtures -> now update the tests to use it!

* update integration tests

* initial fix

* style and update tests

* fix

* chore(gpt oss): remove mlp_bias from configuration

It was just a leftover.

* stats

* Integration tests

* whoops

* Shouldn't move model

* Ensure assistant messages without thinking always go to "final" channel

* More checks to ensure expected format

* Add pad_token_id to model configuration in write_model function (#51)

* Add oai fix fast tests (#59)

* Fix some fast tests

* Force some updates

* Remove unnecessary fixes

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

* reasoning -> Reasoning

* Add additional integration tests

* fixup

* Slight fixes

* align chat template with harmony

* simplify

* Add comment

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* Revert fixup

* skip 2 test remove todo

* merge

* padding side should be left for integration tests

* fix modular wrt to changes made to modeling

* style

* isort

* fix opies for the loss

* mmmm

---------

Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: edbeeching <edbeeching@gmail.com>
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan@openai.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal>
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Akos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Matt <rocketknight1@gmail.com>
2025-08-05 18:02:18 +02:00
738c1a3899 🌐 [i18n-KO] Translated cache_explanation.md to Korean (#39535)
* update: _toctree.yml

* docs: ko: cache_explanation.md

* feat: nmt draft

* fix: apply yijun-lee's comments

* fix: apply 4N3MONE's comments

* docs: update cache_position

* docs: update cache-storage-implementation

* update: add h2 tag in cache-position

---------

Co-authored-by: taehyeonjeon <xogus294@gmail.com>
2025-08-05 08:20:13 -07:00
d2ae766836 Export SmolvLM (#39614)
Export SmolVLM for ExecuTorch
2025-08-05 16:20:23 +02:00
c430047602 [docs] update object detection guide (#39909)
* Update object_detection.md

* Update object_detection.md
2025-08-05 14:07:21 +00:00
dedcbd6e3d run model debugging with forward arg (#39905)
* run model debugging a lot simpler

* fixup

* Update src/transformers/utils/generic.py

* fixup

* mode syle?

* guard a bit
2025-08-05 15:46:19 +02:00
20ce210ab7 Revert "remove dtensors, not explicit (#39840)" (#39912)
* Revert "remove dtensors, not explicit (#39840)"
This did not work with generation (lm_head needs extra care!)
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* update

* style?
2025-08-05 15:12:14 +02:00
2589a52c5c Fix aria tests (#39879)
* fix aria tests

* awful bug

* fix copies

* fix tests

* fix style

* revert this
2025-08-05 13:48:47 +02:00
6e4a9a5b43 Fix eval thread fork bomb (#39717) 2025-08-05 10:50:32 +00:00
98a3c49135 Replace video_fps with fps in tests (#39898)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-05 10:39:55 +00:00
1af1071081 Fix misleading WandB error when WANDB_DISABLED is set (#39891)
When users set `report_to="wandb"` but also have `WANDB_DISABLED=true` in their environment,
the previous error message was misleading: "WandbCallback requires wandb to be installed. Run pip install wandb."

This was confusing because wandb was actually installed, just disabled via the environment variable.

The fix detects this specific case and provides a clear, actionable error message explaining
the conflict and how to resolve it.
2025-08-05 10:18:18 +00:00
78ef84921b Avoid aliasing in cond's branches for torch 2.8 (#39488)
Avoid alaising in cond's branches

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-05 11:18:11 +02:00
9e676e6a0e [qwen] remove unnecessary CUDA sync in qwen2_5_vl (#39870)
Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-05 08:54:16 +00:00
392be3b282 fix test_working_of_tp failure of accelerate ut (#39828)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-05 08:52:57 +00:00
cc5de36454 [Exaone4] Fixes the attn implementation! (#39906)
* fix

* fix config
2025-08-05 09:29:16 +02:00
00d47757bf Reorder serving docs (#39634)
* Slight reorg

* LLMs + draft VLMs

* Actual VLM examples

* Initial responses

* Reorder

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/tiny_agents.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/open_webui.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/cursor.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Responses API

* Address Pedro's comments

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-08-05 08:43:06 +02:00
8c4ea670dc chore: update DETR model card (#39822)
* Update model card for DETR

* fix: applied suggested changes

* fix: simplified pipeline and modified notes and resources

* Update detr.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-04 12:25:53 -07:00
0bd91cc822 Add support for ModernBertForMultipleChoice (#39232)
* implement ModernBertForMultipleChoice

* fixup, style, repo consistency

* generate modeling_modernbert

* add tests + docs

* fix test
2025-08-04 20:45:43 +02:00
801e869b67 send some feedback when manually building doc via comment (#39889)
* fix

* fix

* fix

* Update .github/workflows/pr_build_doc_with_comment.yml

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-04 18:20:48 +00:00
ee7eb2d0b1 Update cohere2 vision test (#39888)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 20:08:18 +02:00
3bafa128dc [DOCS] : Improved mimi model card (#39824)
* [DOCS] : Improved mimi model card

* Removed additional header

* Review: addressed feedback

* Update mimi.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-04 10:07:06 -07:00
192acc2d0f Fix link to models in README (#39880)
Update README.md
2025-08-04 09:34:41 -07:00
7dca2ff8cf [typing] better return type hint for AutoModelForCausalLM and AutoModelForImageTextToText (#39881)
* Better return type hint for  AutoModelForCausalLM and AutoModelForImageTextToText

* fix imports

* fix
2025-08-04 15:03:53 +00:00
3edd14610e Set torch.backends.cudnn.allow_tf32 = False for CI (#39885)
* fix

* fix

* [test all]

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 16:55:16 +02:00
e3505cd4dc Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor (#39858)
Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor
2025-08-04 16:39:19 +02:00
380b2a0317 Rework add-new-model-like with modular and make test filenames coherent (#39612)
* remove tf/flax

* fix

* style

* Update add_new_model_like.py

* work in progress

* continue

* more cleanup

* simplify and first final version

* fixes -> it works

* add linter checks

* Update add_new_model_like.py

* fix

* add modular conversion at the end

* Update add_new_model_like.py

* add video processor

* Update add_new_model_like.py

* Update add_new_model_like.py

* Update add_new_model_like.py

* fix

* Update image_processing_auto.py

* Update image_processing_auto.py

* fix post rebase

* start test filenames replacement

* rename all test_processor -> test_processing

* fix copied from

* add docstrings

* Update add_new_model_like.py

* fix regex

* improve wording

* Update add_new_model_like.py

* Update add_new_model_like.py

* Update add_new_model_like.py

* start adding test

* fix

* fix

* proper first test

* tests

* fix

* fix

* fix

* fix

* modular can be used from anywhere

* protect import

* fix

* Update add_new_model_like.py

* fix
2025-08-04 14:41:09 +02:00
5fb5b6cfaf Fix quant docker for fp-quant (#39641)
* fix quant docker

* Apply style fixes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-04 11:57:08 +00:00
16d6faef9a [core] Fix attn_implementation setter with missing sub_configs (#39855)
* fix

* add sub_configs

* remove case for attention setter

* fix None

* Add test

* Fix sub-configs

* fix tests_config

* fix consistency

* fix fsmt

* fix
2025-08-04 11:35:09 +01:00
2a9febd632 Add support for including in-memory videos (not just files/urls) in apply_chat_template (#39494)
* added code for handling video object ,as dictionary of frames and metadata, in chat template

* added new test where videos are passed as objects (dict of frames, metadata) in the chat template

* modified hardcoded video_len check that does not match with increased number of tests cases.

* Modify hardcoded video_len check that fails with increased number of tests

* update documentation of multi-modal chat templating with extra information about including video object in chat template.

* add array handling in load_video()

* temporary test video inlcuded

* skip testing smolvlm with videos that are list of frames

* update documentation & make fixup

* Address review comments
2025-08-04 11:49:42 +02:00
0d511f7a77 Use comment to build doc on PRs (#39846)
* try

* try

* try

* try

* try

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 10:24:45 +02:00
4819adbbaa Refactor label name handling for PEFT models in Trainer class (#39265)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-04 06:29:57 +00:00
166fcad3f8 Improve is_wandb_available function to verify WandB installation (#39875)
Improve `is_wandb_available` function to verify WandB installation by checking for a key attribute
2025-08-04 08:22:52 +02:00
6dfd561d9c remove dtensors, not explicit (#39840)
* remove dtensors, not explicit

Co-authored-by: 3outeille <3outeille@users.noreply.github.com>

* style

* fix test

* update

* as we broke saving try to fix

* output layouts should exit

* nit

* devicemesh exists if it was distributed

* use _device_mesh of self

* update

* lol

* fix

* nit

* update

* fix!

* this???

* grumble grumble

* ?

* fuck me

---------

Co-authored-by: 3outeille <3outeille@users.noreply.github.com>
2025-08-01 22:02:47 +02:00
b727c2b20e Allow TrackioCallback to work when pynvml is not installed (#39851)
Allow TrackioCallback to work when pynvml is not installed
2025-08-01 18:57:25 +02:00
1ec0feccdd [image-processing] deprecate plot_keypoint_matching, make visualize_keypoint_matching as a standard (#39830)
* fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models

* refactor: added copied from

* fix: make style

* fix: repo consistency

* fix: make style

* docs: added missing method in SuperGlue docs
2025-08-01 16:29:57 +00:00
7b4d9843ba Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid (#39739)
* add fast image processor Janus, deepseek_vl, deepseek_vl_hybrid

* fix after review
2025-08-01 12:20:08 -04:00
88ead3f518 Fix responses add tests (#39848)
* Quick responses fix

* [serve] Fix responses API and add tests

* Remove typo

* Remove typo

* Tests
2025-08-01 18:06:08 +02:00
6ea646a03a Update ux cb (#39845)
* clenaup

* nits

* updates

* fix logging

* push updates?

* just passexception

* update

* nits

* fix

* add tokencount

* style
2025-08-01 16:50:28 +02:00
3951d4ad5d Add MM Grounding DINO (#37925)
* first commit

Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface.

TODO: Some tests are failing so that needs to be fixed.

* fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model

* cleaned up a hack in the conversion script

* Fixed the expected values in integration tests

Cross att masking and cpu-gpu consistency tests are still failing however.

* changes for make style and quality

* add documentation

* clean up contrastive embedding

* add mm grounding dino to loss mapping

* add model link to config docstring

* hack fix for mm grounding dino consistency tests

* add special cases for unused config attr check

* add all models and update docs

* update model doc to the new style

* Use super_kwargs for modular config

* Move init to the _init_weights function

* Add copied from for tests

* fixup

* update typehints

* Fix-copies for tests

* fix-copies

* Fix init test

* fix snippets in docs

* fix consistency

* fix consistency

* update conversion script

* fix nits in readme and remove old comments from conversion script

* add license

* remove unused config args

* remove unnecessary if/else in model init

* fix quality

* Update references

* fix test

* fixup

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-08-01 15:43:23 +01:00
50145474b7 [typecheck] proper export of private symbols (#39729)
* Export private symbols

Signed-off-by: cyy <cyyever@outlook.com>

* Update src/transformers/__init__.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/__init__.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix format

Signed-off-by: cyy <cyyever@outlook.com>

* Add a comment for exported symbols

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-01 13:36:47 +01:00
c962f1515e [attn_implementation] remove recursive, allows custom kernels with wrappers (#39823)
* fix?

* fixme and style

* Update src/transformers/modeling_utils.py

* update

* update

* fix

* small fixees

* nit

* nits

* fix init check?

* fix

* fix default

* or fucks me

* nits

* include a small nit

* does this make it hapy?

* fixup

* fix the remaining ones
2025-08-01 12:18:28 +02:00
d3b8627b56 [VLMs] split out "get placeholder mask" to helper (#39777)
* batch upidate all models

* update

* forgot about llava onevision

* update

* fix tests

* delete file

* typo

* fix emu3 once and forever

* update cohere2 vision as well
2025-08-01 08:01:06 +00:00
a115b67392 Fix tp cb (#39838)
* fixes

* one more
2025-08-01 09:59:04 +02:00
2c0af41ce5 Fix bad markdown links (#39819)
Fix bad markdown links.
2025-07-31 09:14:14 -07:00
4fcf455517 Fix broken links (#39809)
Replace links in the form of `[text]((url))` to `[text](url)`. This is
the correct format of a url in the markdown.
2025-07-31 13:23:04 +00:00
b937d47455 [cohere2 vision] move doc to multimodal section (#39820)
move doc to multimodal section
2025-07-31 15:13:02 +02:00
6ba8a1ff45 Update documentation for Cohere2Vision models (#39817)
* Update docs with pipeline example

* Add Cohere2Vision to list of vision models

* Sort models
2025-07-31 11:58:45 +00:00
e1688d28d3 [Model] Cohere2 Vision (#39810)
* Add cohere2_vision to support CohereLabs/command-a-vision-07-2025

* update and add modualr file

* update processors and check with orig impl later

* delete unused files

* image processor reduce LOC and re-use GotOCR2

* update the config to use modular

* model tests pass

* processor fixes

* check model outputs decorator

* address one more comment

* Update tokens. Temp - need to read from tokenizer'

* fix for multi-gpu

* Fix image token handling

* upadte image token expansion logic

* fix a few issues with remote code loading

* not related but modular forces us to change all files now

* Add overview and code sample to cohere vision docs

* add scripts. TMP.

* Update inference script

* Create script

* set dtype in export script

* TO revert: modular export fix

* Fix scripts

* Revert "TO revert: modular export fix"

This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c.

* Use modular weights

* Upload to hub

Removed OOD weights ad script

* Updated docs

* fix import error

Update docs

Added pipeline test

* Updated docs

* Run modular script

remove modular for config

Added patch_size

Added docstrings in modular

Fix OOM

Add docs, fixup integration tests. 8-gpu passing

* tiny updates

* address comments + fixup

* add test for chat template

* check model outputs workaround

* aya vision fix check model inputs

* Revert "add test for chat template"

This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8.

* reveert more changes

* last revert

* skip and merge

* faulty copy from

---------

Co-authored-by: Julian Mack <julian.mack@cohere.com>
Co-authored-by: kyle-cohere <kyle@cohere.com>
2025-07-31 10:57:34 +00:00
6c3f27ba61 [docs] fix korean docs yet again (#39813)
fix korean docs yet again
2025-07-31 09:13:25 +00:00
cb289ad243 feat(tokenization): add encode_message to tokenize messages one by one (#39507)
* feat(tokenization): add encode_message to tokenize messages one by one

* Fix the `encode_message` method, remove the `add_generation_prompt` parameter and add the corresponding error handling. Update the document to reflect this change and verify the error handling in the test.

* Optimize the `encode_message` method, improve the processing logic of the empty dialogue history, and ensure that the chat template can be applied correctly when the dialogue history is empty. Update the document to reflect these changes.

* The `_encode_message` method is deleted, the message coding logic is simplified, and the functional integrity of the `encode_message` method is ensured. Update the document to reflect these changes.

* Docs fix

* Revert changes in docstring of pad()

* Revert changes in docstring

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Repair the call of the `encode_message` method, update it to `encode_message_with_chat_template` to support the chat template, and adjust the relevant test cases to reflect this change.

* Optimize the call format of the `apply_chat_template` method, and merge multi-line calls into a single line to improve code readability.

---------

Co-authored-by: pco111 <15262555+pco111@user.noreply.gitee.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-31 10:55:45 +02:00
4f93cc9174 fix: providing a tensor to cache_position in model.generate kwargs always crashes because of boolean test (#39300)
* fix: cache_position: RuntimeError: Boolean value of Tensor with more than one value is ambiguous

* test cache_position

* move test

* propagate changes

---------

Co-authored-by: Masataro Asai <guicho2.71828@gmail.com>
2025-07-30 17:30:28 +00:00
9b3203f47b Add callback to monitor progress in whisper transcription (#37483)
* Add callback to monitor progress in whisper transcription

* Added `` around variables, rewording

* Add example of `monitor_progress`.

---------

Co-authored-by: Eric B <ebezzam@gmail.com>
2025-07-30 17:40:53 +02:00
7abb5d3992 Update mT5 model card (#39702)
* Update mt5 model card

* Fix casing of model title

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-30 08:35:04 -07:00
1019b00028 Update model card for Cohere2 (Command R7B) (#39604)
* Update model card for Cohere2 (Command R7B)

* fix: applied suggested changes
2025-07-30 08:34:26 -07:00
ecbb5ee194 standardized BARThez model card (#39701)
* standardized barthez model card according to template

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* suggested changes to barthez model card

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-30 08:33:13 -07:00
8e077a3e45 Fix re-compilations for cross attention cache (#39788)
fix recompilations for cross attn cache
2025-07-30 14:52:03 +02:00
1e0665a191 Simplify conditional code (#39781)
* Use !=

Signed-off-by: cyy <cyyever@outlook.com>

* Use get

Signed-off-by: cyy <cyyever@outlook.com>

* Format

* Simplify bool operations

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-30 12:32:10 +00:00
b94929eb49 Fix an invalid condition (#39762)
Fix an invalid judgement

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-30 12:19:17 +00:00
bb2ac66453 fix chameleonvision UT failure (#39646)
* fix chameleonvision UT failure

Signed-off-by: matrix.yao@intel.com <Yao Matrix>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: matrix.yao@intel.com <Yao Matrix>
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: root <Yao Matrix>
2025-07-30 12:09:26 +00:00
5348445dfa Super tiny update (#39727)
super tiny update
2025-07-30 12:21:41 +02:00
54cbea5615 more info in model_results.json (#39783)
more info

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-30 11:43:10 +02:00
01d5f94695 [ASR pipline] fix with datasets 4.0 (#39504)
* fix

* handle edge case

* make
2025-07-30 08:13:40 +00:00
8ab21be570 enable static cache on vision encoder decoder (#39773)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-30 08:10:46 +00:00
67cfe11528 Fix Evolla and xLSTM tests (#39769)
* fix all evolla

* xlstm
2025-07-30 09:51:55 +02:00
ec4033457e Don't set run_name when none (#39695)
* Don't set run_name when none

* revert

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-30 01:39:29 +00:00
551a89a4a3 Standardize CLAP model card format (#39738)
* Standardize CLAP model card format

* Apply review feedback

* Remove Resources section
2025-07-29 14:13:04 -07:00
da70b1389a docs: Update EfficientLoFTR documentation (#39620)
* docs: Update EfficientLoFTR documentation

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-29 13:54:44 -07:00
ddd2100767 Fix OmDet test after arg deprecation (#39766)
fix arg name
2025-07-29 22:10:36 +02:00
4abb053b6c Remove python3.7 reference from doc link (#39706) 2025-07-29 09:17:13 -07:00
33aa49df9d [docs] Ko doc fixes after toc update (#39660)
* update docs

* doc builder working

* make fixup
2025-07-29 17:05:26 +01:00
c4e2069898 Fix Cache.max_cache_len max value for Hybrid models (#39737)
* fix gemma

* fix min

* fix quant init issue

* fix gemma 3n

* skip quant cache test

* fix modular

* new test for Gemma

* include cyril change

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-29 17:12:50 +02:00
075dbbceaa fix(trainer): Correct loss scaling for incomplete gradient accumulation steps (#39659)
* Fix issue[#38837]: wrong loss scaled in last step of epoch

* chore: trigger CI

* Update src/transformers/trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/modeling_flash_attention_utils.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: taihang <taihang@U-2RHYVWX7-2207.local>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-07-29 17:12:31 +02:00
1d061536cf 🌐 [i18n-KO] Translated how_to_hack_models.md to Korean (#39536)
* docs: ko: how_to_hack_models.md

* feat: nmt draft

* fix: manual edits
2025-07-29 08:09:16 -07:00
43fe41c0a8 🌐 [i18n-KO] Translated perf_train_gpu_one.md to Korean (#39552)
* docs: ko: perf_train_gpu_one.md

* feat: nmt draft

* fix: manual edits

* fix: Manually added missing backticks

* Update docs/source/ko/perf_train_gpu_one.md

fix: remove space between heading and GPU anchor

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix: clarify table headers to indicate training speed boost and memory savings

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix: improve readability

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix : rephrase explanation of data preloading to improve readability

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-07-29 08:08:57 -07:00
9f38763731 🌐 [i18n-KO] Translated pipeline_gradio.md to Korean (#39520)
* docs: ko: pipeline_gradio.md

* feat: nmt draft

* fix: manual edits

* docs: ko: pipeline_gradio.md
2025-07-29 08:04:30 -07:00
f72311796b 🌐 [i18n-KO] Translated tokenizer.md to Korean (#39532)
* docs: ko: tokenizer.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Yijun Lee <yijun-lee@users.noreply.github.com>

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

---------

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-07-29 08:04:14 -07:00
d346d46752 🌐 [i18n-KO] Translated tvp.md to Korean (#39578)
* docs: ko: tvp.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
2025-07-29 08:04:00 -07:00
2f59c15b33 🌐 [i18n-KO] Translated albert.md to Korean (#39524)
* docs: ko: albert.md

* feat: nmt draft

* fix: manual edits
2025-07-29 08:03:40 -07:00
98386dcee9 🌐 [i18n-KO] Translated main_classes/peft.md (#39515)
* docs: ko: main_classes/peft.md

* feat: nmt draft

* docs: add missing TOC to documentation for `PeftAdapterMixin` section

Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin).

* fix: Improve naturalness of purpose expression in Korean

Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions.

* fix: Simplify plural form and make expression more concise

Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity.

* fix: Replace technical term '주입' with more natural '적용'

Changed '주입할 수 없어' to '적용할 수 없어' for better readability.
Considered alternatives:

'삽입': Too literal translation of 'inject'
'입력': Could be misunderstood as data input
'통합': Implies merging two systems
'추가': Simple but less precise

'적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context.

* fix: update toctree path for PEFT to lowercase

Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links.

* docs: update as per reviewer feedback after rebase
2025-07-29 08:03:17 -07:00
1ad216bd7d [modenbert] fix regression (#39750)
* fix regression

* add FA2 test
2025-07-29 16:58:59 +02:00
379209b603 add libcst to extras["testing"] in setup.py (#39761)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 16:58:51 +02:00
abf101af1f Fix version issue in modeling_utils.py (#39759)
fix version issue
2025-07-29 16:15:30 +02:00
8db4d79161 Enable xpu allocator on caching_allocator_warmup (#39654)
* add xpu allocator

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix variable name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm useless default value

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-29 16:06:52 +02:00
fb141e2c90 Support loading Qwen3 MoE GGUF (#39638)
* support loading qwen3 gguf

* qwen3moe test cases

* fix whitespaces

* fix ggml tests
2025-07-29 13:44:44 +00:00
ccb2e0e03b Fix GPT2 with cross attention (#39754)
* fix

* use new mask API

* style

* fix copies and attention tests

* fix head pruning tests
2025-07-29 15:40:31 +02:00
dfd616e658 Avoid OOM when other tests are failing (#39758)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 15:35:44 +02:00
65df73aa88 AMD disable torchcodec (#39757)
Temporarily disable torchcodec installation because of bizarre segfault
2025-07-29 13:07:25 +00:00
63b3200779 Use --gpus all in workflow files (#39752)
gpu all

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 14:53:33 +02:00
95faabf0a6 Apply several ruff SIM rules (#37283)
* Apply ruff SIM118 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff SIM910 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff SIM101 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Format code

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-29 11:40:34 +00:00
cf97f6cfd1 Fix mamba regression (#39728)
* fix mamba regression

* fix compile test
2025-07-29 12:44:28 +02:00
66984ed4f6 Update IMPORTANT_MODELS list (#39734) 2025-07-29 12:34:57 +02:00
de8d0cec30 update GemmaIntegrationTest::test_model_2b_bf16_dola again (#39731)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 11:42:55 +02:00
85d5aeb324 Fix: add back base model plan (#39733)
* Fix: add back base model plan

* Fix: typo

* fixup

* remove unused import

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-07-29 11:37:33 +02:00
2a90193dd8 [Fix] import two missing typos in models/__init__.py for typo checking (#39745)
* [Fix] import lost gemma3n for type checking in vscode

* [Fix] import missing qwen2_5_omni typo

* [Refactor] sort by ascii order
2025-07-29 11:35:22 +02:00
f2aca3eccc fix cache inheritance (#39748)
* fix cache inheritance

* styule
2025-07-29 11:24:44 +02:00
f3598a95c7 extend more trainer test cases to XPU, all pass (#39652)
extend more trainer test cases to XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-07-29 10:51:00 +02:00
75794792ad BLIPs clean-up (#35560)
* blips clean up

* update processor

* readability

* fix processor length

* fix copies

* tmp

* update and fix copies

* why keep these, delete?

* fix test fetcher

* irrelevant comment

* fix tests

* fix tests

* fix copies
2025-07-29 10:03:06 +02:00
4f8f51be4e Add Fast Segformer Processor (#37024)
* Add Fast Segformer Processor

* Modified the params according to segformer model

* modified test_image_processing_Segformer_fast args

- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class

* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast

* fixed code_quality

* added recommended fixes and tests to make sure everything processess smoothly

* Fixed SegmentationMapsLogic

- modified the preprocessing of segmentation maps to use tensors
- added batch support

* fixed some mismatched files

* modified the tolerance for tests

* use modular

* fix ci

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-28 19:22:32 +00:00
c353f2bb5e Superpoint fast image processor (#37804)
* feat: superpoint fast image processor

* fix: reran fast cli command to generate fast config

* feat: updated test cases

* fix: removed old model add

* fix: format fix

* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* fix: ported to torch and made requested changes

* fix: removed changes to init

* fix: init fix

* fix: init format fix

* fixed testcases and ported to torch

* fix: format fixes

* failed
test case fix

* fix superpoint fast

* fix docstring

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-28 18:15:06 +00:00
14adcbd937 Fix AMD dockerfile for audio models (#39669) 2025-07-28 19:05:41 +02:00
1c6b47451d Fix cache-related tests (#39676)
* fix

* fix kyutai at last

* fix unrelated tests and copies

* update musicgen as well

* revert tensor

* fix old test failures

* why it wasn't added?
2025-07-28 17:30:11 +02:00
fc2bd1eac0 Fix Layer device placement in Caches (#39732)
* fix device placement

* style

* typo in comment
2025-07-28 16:37:11 +02:00
7623aa3e5f Fix Qwen2AudioForConditionalGeneration.forward() and test_flash_attn_kernels_inference_equivalence (#39503)
* Add missing cache_position argument.

* Pass cache_position to language model.

* Overwrite prepare_inputs_for_generation.

* Set model to half precision for Flash Attention test.

* Cast model to bfloat16.
2025-07-28 16:35:08 +02:00
28f2619868 skip Glm4MoeModelTest::test_torch_compile_for_training (#39670)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-28 16:30:40 +02:00
88aed92b59 Update QAPipelineTests::test_large_model_course after #39193 (#39666)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-28 16:26:49 +02:00
da823fc04e mllama outputs refactor (#39643)
* mllama outputs refactor

* forgot kwargs

* fix output

* add can_record_outputs

* correct @check_model_inputs placement

* ruff and copies

* rebase

* feedback

* only return hidden_states

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
2025-07-28 15:59:20 +02:00
686bb3b098 Remove all expired deprecation cycles (#39725)
* remove all deprecation cycles

* style

* fix

* remove

* remove

* fix

* Update modular_dpt.py

* back

* typo

* typo

* final fix

* remove all args
2025-07-28 15:43:41 +02:00
a0fa500a3d [CI] Add Eric to comment slow ci (#39601)
add to ci
2025-07-28 13:24:00 +00:00
4c7da9fedf PATCH: add back n-dim device-mesh + fix tp trainer saving (#39693)
* Feat: something

* Feat: initial changes

* tmp changes to unblock

* Refactor

* remove todo

* Feat: docstring

* Fix: saving of distributed model in trainer

* Fix: distributed saving with trainer

* Feat: add pure tp saving

* Only require tp dim if ndim > 1

* Fix: default to None

* Fix: better comments/errors

* Fix: properly check tp_size attribute

* Fix: properly check for None in tp_size

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-28 12:29:58 +00:00
cbede2969b Add self-hosted runner scale set workflow for mi325 CI (#39651) 2025-07-28 13:32:25 +02:00
b56d721397 [configuration] remove redundant classmethod (#38812)
* remove redundant classmethod

* warning message, add space between words

* fix tests

* fix copies
2025-07-28 10:38:48 +00:00
02ea23cbde update ernie model card (#39657)
* update ernie model doc

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

* address ruff format error reported by ci

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

* address check_repository_consistency error reported by ci

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

---------

Signed-off-by: Zhang Jun <jzhang533@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-28 10:21:18 +00:00
8b237b8639 [processors] add tests for helper fn (#39629)
* add tests for helpers

* duplicate test for each model

* why llava next video has no helper

* oops must have been in the commit

* fix test after rebase

* add copy from
2025-07-28 09:41:58 +00:00
6638b3642d xpu optimization for generation case (#39573)
* xpu optimization for generation case

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix ci failure

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-28 11:34:58 +02:00
5c15eb55d2 fix(tokenization): check token.content for trie (#39587)
fix: check token.content for trie
2025-07-28 11:28:56 +02:00
6a61e16626 Fix missing initialization of FastSpeech2Conformer (#39689)
* fix missing initialization of FastSpeech2Conformer

* switch order and reactivate tests

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-28 10:47:39 +02:00
a6393e7d28 fix missing model._tp_size from ep refactor (#39688)
* fix missing model._tp_size from ep refactor

* restore setting device_mesh too
2025-07-26 12:26:36 +02:00
18a7c29ff8 More robust tied weight test (#39681)
* Update test_modeling_common.py

* remove old ones

* Update test_modeling_common.py

* Update test_modeling_common.py

* add

* Update test_modeling_musicgen_melody.py
2025-07-25 22:03:21 +02:00
c3401d6fad dev version 4.55 2025-07-25 21:11:20 +02:00
97f8c71f52 Add padding-free to Granite hybrid moe models (#39677)
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
2025-07-25 20:10:50 +02:00
d6e9f71a6e Fix tied weight test (#39680)
Update test_modeling_common.py
2025-07-25 20:09:33 +02:00
5da6ad2731 fix break for ckpt without _tp_plan (#39658)
* fix break for ckpt without _tp_plan

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

---------

Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-25 20:03:48 +02:00
c06d4cd6ce Add EXAONE 4.0 model (#39129)
* Add EXAONE 4.0 model

* Refactor EXAONE 4.0 modeling code

* Fix cache slicing on SWA + FA2

* Fix cache slicing on FA2 + HybridCache

* Update EXAONE 4.0 modeling code for main branch

* Update o_proj for asymmetric projection

* Address PR feedback

* Add EXAONE 4.0 docs

* Update EXAONE 4.0 modeling code for main branch

* update

* fix updates

* updates

* fix

* fix

* fix

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-25 19:58:28 +02:00
3e4d584a5b Support typing.Literal as type of tool parameters or return value (#39633)
* support `typing.Literal` as type of tool parameters

* validate the `args` of `typing.Literal` roughly

* add test to get json schema for `typing.Literal` type hint

* fix: add `"type"` attribute to the parsed result of `typing.Literal`

* test: add argument `booleanish` to test multi-type literal

* style: auto fixup
2025-07-25 17:51:28 +00:00
300d42a43e Add ep (#39501)
* EP + updates

Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>

* remove unrelated change

* not working yet but let's see where it goes!

* update the api a bit

* udpate

* where I am at for now

* fix ep

* refactor the API

* yups

* fix

* fixup

* clean modeling

* just support llama4 for now!

* properly avoid

* fix

* nits

* Update src/transformers/models/llama4/modeling_llama4.py

* Update src/transformers/integrations/tensor_parallel.py

* style

* ,,,,

* update

---------

Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
2025-07-25 19:46:17 +02:00
abaa043d60 bad_words_ids no longer slow on mps (#39556)
* fix: bad_words_ids no longer slow on mps

* fix: SequenceBiasLogitsProcessor slow `_prepare_bias_variables` method

* fix: re-adding a deleted comment

* fix: bug in no_bad_words_logits

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-25 19:45:41 +02:00
6630c5b714 Add xlstm model (#39665)
* Add xLSTM cleanly with optimizations.

* Fix style.

* Fix modeling test.

* Make xLSTM package optional.

* Fix: Update torch version check.

* Fix: Bad variable naming in test.

* Fix: Import structure cleaning with Ruff.

* Fix: Update docstrings.

* Fix: Mitigate unused config attr tests by explicit usage.

* Fix: Skip tests, if xlstm library is not installed.

* Feat: Enable longer context window for inference by chunking.

* Fix: Make training test pass by lowering target accuracy.

* Chore: Increase test verbosity for failing generation test.

* Update docs/source/en/model_doc/xlstm.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix: Make xlstm available even without CUDA.

* Chore: Remove unnecessary import.

* Fix: Remove BOS insertion.

* Chore: Improve xLSTMCache documentation.

* Integrate basic xLSTM fallback code.

* Chore: Remove unnecessary import.

* Chore: Remove duplicate LayerNorm.

* chore: update copyright, minor reformatting

* fix: refactor mLSTMStateType due to missing torch import

* fix: add missing import

* Chore: Replace einops.

* fix: apply ruff formatting

* fix: run `make fix-copies` to re-generate dummy_pt_objects.py

* fix: make type hints Python 3.9 compatible

* fix: remove obsolete import

* fix: remove obsolete method from docs

* chore: remove obsolete `force_bos_token_insert` from config

* Chore: Remove duplicated xLSTMCache class.

* Fix: Formatting of modeling_xlstm.py

* Chore: Remove xlstm package requirement from test. Re-add update_rnn_state.

* Fix: Update xLSTMCache docstring.

* Feat: Add proper initialization of xLSTM.

* Chore: Re-format files.

* Chore: Adapt format.

* Fix: xLSTMCache import restructuring.

* Fix: Add __all__ lists to modeling and configuration files.

* Chore: Reformat.

* Fix: Remove unnecessary update_rnn_state function.

* Fix: Undo test accuracy quickfix.

* Fix: Update copyright year, remvoe config copy.

* Chore: Flatten all internal configs to xLSTMConfig.

* Fix: Unused config variables check.

* Chore: Remove unnecessary imports.

* Fix: Unify xlstm cache argument from batch_size to max_batch_size.

* Chore: Remove bad default arg value for xLSTMCache.

* Chore: Rename core configuration arguments to HF default in xLSTM.

* Chore: Fix formatting.

* Fix: xLSTM Cache config access.

* Fix: Update xlstm tests for config update.

* Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B.

* Fix: Configuration xLSTM python3.9 syntax.

* Fix: Difference to main in test_utils.py assertion.

* Fix: Bad syntax in xlstm config for python3.9.

* Fix: xLSTMConfig docstring.

* Fix: xLSTMConfig docstring.

* Fix typing issues in xLSTM and BeiT, Paligemma.

* Fix: Exclude xLSTM from test cache utils.

* Chore: Fix style.

* Chore: Fix format.

* Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions.

* Chore: Remove asserts and replace with ValueErrors.

* Chore: Update __init__.py structure of xLSTM.

* Chore: Clean xLSTM initialization of weights.

* Fix index names in modeling_xlstm.py

* Update xlstm model test typing annotations.

* Fix: Remove all asserts.

* Revert changes to the main __init__.py

* Fix: Move xLSTMCache to modeling_xlstm.py

* Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py

* Remove xLSTMCache from dummy_pt_objects.py

* Fix: Remove extended torchdynamo compilation check integrating cuda graph captures.

* Revert test_cache_utils.py xLSTM change.

* Fix: Move xLSTM init functions before init call.

* Remove xLSTMCache from generation utils.

* Fix: Clean xLSTM init functionality for recursive calls.

* Fix: Move xLSTMCache before its first call.

* Fix formatting.

* Add partial docstring for xLSTMModel forward.

* Fix xLSTMCache docstring in xLSTMModel.

* Remove xLSTMCache from public documentation. Update auto_docstring.

* Remove all agressive shape comments

* style

* Fix names

* simplify

* remove output_hidden_states

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* Update test_modeling_xlstm.py

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* fix

* fix

* style

* style

---------

Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com>
Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com>
Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>
2025-07-25 19:39:17 +02:00
ed9a96bc6d Use auto_docstring for perception_lm fast image processor (#39679) 2025-07-25 17:32:48 +00:00
d913b39ef3 fix: HWIO to OIHW (#39200)
* fix: HWIO to OIHW

* Bug in attention type

* Conversion script docstring

* style

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-07-25 19:23:15 +02:00
a26f0fabb8 Fix auto_docstring crashing when dependencies are missing (#39564)
* add try except to not crash auto_docstring when some dependency are missing

* safeguard None value in placeholder dict
2025-07-25 19:19:23 +02:00
69cff312f5 Add support for DeepseekAI's DeepseekVL (#36248)
* upload initial code

* update deepseek-vl adaptor

* update hierarchy of vision model classes

* udpate aligner model

* add text model

* Added Image Processor

* Added Image Processor

* Added Image Processor

* apply masks

* remove projection; add aligner

* remove interpolate_pos_encoding

* remove unused params in config

* cleaning

* Add the __init__ file

* added processing deepseek_vl class

* modified the deepseek-vl processor

* modified the deepseek-vl processor

* update __init__

* Update the image processor class name

* Added Deepseek to src/transformers/__init__.py file

* Added Deepseek to image_processing_auto.py

* update the __init__ file

* update deepseek_vl image processor

* Update Deepseek Processor

* upload fast image processor

* Revert "upload fast image processor"

This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.

* update image processor

* flatten heirarchy

* remove DeepseekVLModel

* major update (complete modeling)

* auto modeling and other files

* formatting

* fix quality

* replace torchvision in modeling

* set default do_normalize to False

* add fast image processor template using tool

* update image processors

* add fast image processor to other files

* update liscense

* Added deepseek image testcases

* update image test

* update processor

* write CHAT_TEMPLATE

* update model for processor

* fix processor

* minor fixes and formatting

* fix image processing and tests

* fix interpolation in sam

* fix output_attentions in DeepseekVLModel

* upload test_modeling

* fix tests because of vocab size

* set use_high_res_vision=False in tests

* fix all modeling tests

* fix styling

* remove explicit background_color from image processors

* added test_processor

* added test_processor

* fix processor tests

* update docs

* update docs

* update docs

* update conversion script

* Fixed typos

* minor fixes from review

- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025

* fix type in config docstring

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* update get_image_features

* fix config

* improve DeepseekVLImageProcessor.preprocess

* return image_hidden_states

* use AutoTokenizer and AutoImageProcessor in Processor

* fix model outputs

* make num_image_tokens configurable

* fix docstring of processor

* move system prompt to chat template

* fix repo consistency

* fix return_dict

* replace SamVisionEncoder with SamVisionModel

* update to remove deepcopy

* 🛠️  Major Architectural Changes (Adds DeepseekVLHybrid)

* fix quality checks

* add missing hybrid in auto modeling

* run make style

* update sam_hq

* update high_res_size in test

* update docs following #36979

* update code with auto_docstring

* update conversion scripts

* fix style

* fix failing test because of tuple

* set weights_only=True in conversion script

* use safetensors.torch.load_file instead of torch.load in conversion script

* make output_dir optional in conversion script

* fix code snippets in docs (now the examples work fine)

* integration tests for DeepseekVL

* update expected texts

* make style

* integration tests for DeepseekVLHybrid

* fix class name

* update expected texts for hybrid

* run "make style"

* update since changes in main

* run make-style

* nits since changes in main

* undo changes in sam

* fix tests

* fix tests; update with main

* update with main: output_attention/output_hidden_states

* fix copied part in deepseek_vl

* run fix-copies

* fix output_hidden_states

* sam: fix _init_weigths

* use modular for DeepseekVL

* make image processor more modular

* modular: use JanusPreTrainedModel

* janus: provide kwargs in loss

* update processors in conversion script

* Revert "sam: fix _init_weigths"

This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.

* run fix-copies

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2025-07-25 19:18:50 +02:00
a98bbc294c Add missing flag for CacheLayer (#39678)
* fix

* Update cache_utils.py
2025-07-25 19:12:13 +02:00
45c7bfb157 Add evolla rebase main (#36232)
* add evolla

* adding protein encoder part

* add initial processing test

* save processor

* add docstring

* add evolla processor

* add two test

* change vision to protein

* change resampler to sequence_compressor

* change vision to protein

* initial update for llama

* add initial update for llamaForCausalLM

* add `test_processor`, `test_saprot_output`, `test_protein_encoder_output`

* change evolla, but still working on it

* add test_single_forward

* pass test_attention_outputs

* pass test_hidden_states_output

* pass test_save_load and test_from_pretrained_no_checkpoint

* pass test_cpu_offload

* skip some tests

* update new progress

* skip test_model_is_small

* pass test_model_weights_reload_no_missing_tied_weights

* pass test_model_get_set_embeddings

* pass test_cpu_offload

* skip test_resize_embeddings

* add pipeline_model_mapping

* remote old setUp

* pass processor save_pretrained and load_pretrained

* remove pooling layer

* pass test_inputs_embeds_matches_input_ids

* pass test_model_is_small

* pass test_attention_outputs

* pass test_initialization

* pass test_model_get_set_embeddings

* pass test_single_forward

* skip test_disk_offload_bin and test_disk_offload_safetensors

* fix most tests

* pass test_protein_encoder_output

* remove useless code

* add EvollaForProteinText2Text

* pass test_saprot_output

* pass all EvollaModelTest test and remove processor test

* add processor test to its own file

* skip is_training since esm skipped it and the saprot code causes error when setting is_training True

* pass processor tests

* solve all except config

* pass most cases

* change init

* add doc to `configuration_evolla.py`

* remove image_processing test

* remove extra processor test

* remove extra modules

* remove extra modules

* change all configs into one config

* pass all evolla test

* pass `make fixup`

* update short summary

* update Evolla-10B-hf

* pass check_dummies.py and check_code_quality

* fix  `tests/models/auto/test_tokenization_auto.py::AutoTokenizerTest::test_model_name_edge_cases_in_mappings`

* remove dummy codes

* change format

* fix llava issue

* update format

* update to solve llama3 access issue

* update to make forward right

* solve processor save load problem from instructblip solution

* remove unexpected file

* skip `test_generation_tester_mixin_inheritance`

* add `test_single_forward_correct` and `test_inference_natural_language_protein_reasoning`

* add `modular_evolla.py`

* solved issue #36362

* run `make fixup`

* update modular

* solve float32 training

* add fix

* solve `utils/check_docstrings.py`

* update

* update

* update

* remove other files and replace sequential and einsum

* add use case in document

* update the models

* update model

* change some wrong code

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix issues mentioned in PR

* update style and rearrange the placement

* fix return_dict argument issue

* solve SaProtConfig issue

* Solve EvollaSaProtRotaryEmbedding issue

* solve attention_mask issue

* solve almosst all issues

* make style

* update config

* remove unrelated pickle file

* delete pickle files

* fix config

* simplify a lot

* remove past k-v from encoder

* continue work

* style

* skip it from init

* fix init

* fix init

* simplify more

* fill in docstrings

* change test for generation

* skip test

* fix style

---------

Co-authored-by: Chenchen Han <13980209828@163.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-25 19:11:57 +02:00
2670da66ce update expected outputs for whisper after #38778 (#39304)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-25 16:48:10 +00:00
4b125e2993 fix kyutai tests (#39416)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-07-25 18:42:04 +02:00
4f17bf0572 Fixes the BC (#39636)
* fix

* update

* Update src/transformers/utils/generic.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fixup

* fixes

* fix more models

* fix fix fix

* add embedding to more models

* update

* update

* fix

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-07-25 18:41:21 +02:00
ddb0546d14 Delete bad rebasing functions (#39672)
* remove outdated stuff

* remove comment

* use register

* remove finally clause (to allow further check if fallback to sdpa)

* general exception

* add wrapper

* revert check

* typo
2025-07-25 18:28:09 +02:00
a91653561e [Ernie 4.5] Post merge adaptations (#39664)
* ernie 4.5 fixes

* Apply style fixes

* fix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-25 17:36:18 +02:00
5d0ba3e479 [CI] revert device in test_export_static_cache (#39662)
* revert device

* add todo
2025-07-25 15:36:12 +00:00
850bdeaa95 Fix ModernBERT Decoder model (#39671)
fix
2025-07-25 16:20:12 +01:00
17f02102c5 🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor (#39591)
* init

* Force qwen2VL image proc to fast

* refactor qwen2 vl fast

* fix copies

* Update after PR review and update tests to use return_tensors="pt"

* fix processor tests

* add BC for min pixels/max pixels
2025-07-25 11:11:28 -04:00
f90de364c2 Rename huggingface_cli to hf (#39630)
* Rename huggingface_cli to hf

* hfh
2025-07-25 14:10:04 +02:00
3b3f9c0c46 fix(voxtral): correct typo in apply_transcription_request (#39572)
* fix(voxtral): correct typo in apply_transcription_request

* temporary wrapper: apply_transcrition_request

* Update processing_voxtral.py

* style: sort imports in processing_voxtral.py

* docs(voxtral): fix typo in voxtral.md

* make style

* doc update

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-07-25 12:09:44 +00:00
2a82cf06ad make fixup (#39661) 2025-07-25 11:27:45 +00:00
e3760501b0 [docs] fix ko cache docs (#39644)
fix ko docs
2025-07-25 10:06:03 +01:00
91f591f7bc Make pytorch examples UV-compatible (#39635)
* update release.py

* add uv headers in some pytorch examples

* rest of pytorch examples

* style
2025-07-25 10:46:22 +02:00
c46c17db57 revert change to cu_seqlen_k and max_k when preparing from position_ids (#39653) 2025-07-25 10:28:22 +02:00
4600c27c4f Fix: explicit not none check for tensors in flash attention (#39639)
fix: explicit not none check for tensors
2025-07-25 10:09:14 +02:00
c392d47c9b [attention] fix test for packed padfree masking (#39582)
* fix most tests

* skip a few more tests

* address comments

* fix chameleon tests

* forgot to uncomment

* qwen has its own tests with images, rename it as well
2025-07-25 07:44:52 +00:00
565c035a2e Add owlv2 fast processor (#39041)
* add owlv2 fast image processor

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* change references to owlVit to owlv2 in docstrings for post process methods

* change type hints from List, Dict, Tuple to list, dict, tuple

* remove unused typing imports

* add disable grouping argument to group images by shape

* run make quality and repo-consistency

* use modular

* fix auto_docstring

---------

Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-25 02:40:11 +00:00
5a81d7e0b3 revert behavior of _prepare_from_posids (#39622)
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
2025-07-24 20:31:00 +02:00
ad6fd2da0e [Voxtral] values for A10 runners (#39605)
* values for A10 runners

* make

* as for Llava

* does not apply to Voxtral
2025-07-24 18:52:35 +02:00
4741e1f1b7 [timm] new timm pin (#39640) 2025-07-24 16:01:59 +00:00
12b612830d [efficientloftr] fix model_id in tests (#39621)
fix: wrong EfficientLoFTR model id in tests
2025-07-24 10:41:06 +01:00
947a37e8f5 Update recent processors for vLLM backend (#39583)
* update recent models and make sure it runs withh vLLM

* delete!
2025-07-24 10:29:27 +02:00
7b897fe583 [Docs] Translate audio_classification.md from English to Spanish (#39513)
* Docs: translate audio_classification to Spanish

* Update audio_classification.md

* Remove space
* Normalize backticks

* Update audio_classification.md

* Apply corrections recommended by aaronjimv

* Update _toctree.yml

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 15:55:13 -07:00
9b7244f189 standardized YOLOS model card according to template in #36979 (#39528)
* standardized YOLOS model card according to template in #36979

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* standardized YOLOS model card according to template in #36979

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* replaced YOLOS architecture image, deleted quantization and AttentionMaskVisualizer sections

* removed cli section

* Update yolos.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 11:00:25 -07:00
ec8a09a5fe Feature/standardize opt model card (#39568)
* docs: Standardize OPT model card with enhanced details

* Remove incorrect link from OPT model card

* Address review feedback on OPT model card

* Update opt.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 10:57:48 -07:00
c5a80dd6c4 🔴 Fix EnCodec internals and integration tests (#39431)
* EnCodec fixes and update integration tests.

* Apply padding mask when normalize is False.

* Update comment of copied function.

* Fix padding mask within modeling.

* Revert padding function.

* Simplify handling of padding_mask.

* Address variable codebook size.

* Add output for padding for consistency with original model, fix docstrings.

* last_frame_pad_length as int

* Update example code.

* Improve docstring/comments.

* Shorten expected output.

* Consistent docstring.

* Parameterize tests.

* Properties for derived variables.

* Update expected outputs from GitHub runner.

* Consistent outputs with runner GPUs.
2025-07-23 19:39:27 +02:00
7a4e2e7868 Fix DAC integration tests and checkpoint conversion. (#39313)
* Fix DAC (slow) integration tests.

* Fix DAC conversion.

* Address comments

* Sync with main, uncomment nn.utils.parametrizations.weight_norm.

* Update DAC integration tests with expected outputs.

* Added info about encoder/decoder error and longer decoder outputs.

* Parameterize tests.

* Set expected values to GitHub runners.
2025-07-23 19:21:26 +02:00
596a75f6e9 Move openai import (#39613) 2025-07-23 19:05:39 +02:00
a0e5a7d34b Transformers serve VLM (#39454)
* Add support for VLMs in Transformers Serve

* Raushan comments

* Update src/transformers/commands/serving.py

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Quick fix

* CPU -> Auto

* Update src/transformers/commands/serving.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixup

---------

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-23 17:03:18 +02:00
ea56eb6bed Fix important models CI (#39576)
* relax test boundaries and fix from config

* eager is always supported.
2025-07-23 16:24:29 +02:00
0fe03afeb8 Fix typos and grammar issues in documentation and code (#39598)
- Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md)
- Fix 'meanginful' to 'meaningful' in training documentation
- Fix duplicate 'Cohere' reference in modular transformers documentation
- Fix duplicate 'the the' in trainer and chat command comments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-23 12:43:11 +00:00
82603b6cc2 Allow device_mesh have multiple dim (#38949)
* Feat: something

* Feat: initial changes

* tmp changes to unblock

* Refactor

* remove todo

* Feat: docstring

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-23 12:27:36 +00:00
10c990f7e2 enable triton backend on awq xpu (#39443)
* enable triton backend on awq xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_awq.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* fix dtype check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-23 12:10:38 +00:00
e7e6efcbbd [idefics3] fix for vLLM (#39470)
* fix idefics3 for vllm tests

* fix copies
2025-07-23 14:00:43 +02:00
a62f65a989 fix moe routing_weights (#39581)
* fix moe routing_weights

* fix ernie4_5_moe routing_weights

* fix integration test

---------

Co-authored-by: llbdyiu66 <llbdyiu66@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-23 11:20:23 +00:00
623ab01039 FP-Quant support (#38696)
* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-07-23 11:41:10 +02:00
eb1a007f7f Rename supports_static_cache to can_compile_fullgraph (#39505)
* update all

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* apply suggestions

* fix copies

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-23 09:35:18 +00:00
b357cbb19d [Trackio] Allow single-gpu training and monitor power (#39595)
Allow not distributed and monitor power
2025-07-23 11:22:50 +02:00
019b74977d Generic task-specific base classes (#39584)
* first shot

* Update modeling_layers.py

* fix mro order

* finalize llama

* all modular and copied from from llama

* fix
2025-07-23 10:49:47 +02:00
5dba4bc7b2 Fix DynamicCache and simplify Cache classes a bit (#39590)
* fix

* use kwargs

* simplify

* Update cache_utils.py

* Update cache_utils.py

* Update test_cache_utils.py

* fix

* style
2025-07-23 10:13:45 +02:00
d9b35c635e Mask2former & Maskformer Fast Image Processor (#35685)
* add maskformerfast

* test

* revert do_reduce_labels and add testing

* make style & fix-copies

* add mask2former and make fix-copies
TO DO:
	add test for mask2former

* make fix-copies

* fill docstring

* enable mask2former fast processor

* python utils/custom_init_isort.py

* make fix-copies

* fix PR's comments

* modular file update

* add license

* make style

* modular file

* make fix-copies

* merge

* temp commit

* finish up maskformer mask2former

* remove zero shot examples

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-23 02:47:47 +00:00
6e9972962f 🎯 Trackio integration (#38814)
* First attempt

* fix

* fix

* Enhance TrackioCallback to log GPU memory usage and allocation

* Enhance Trackio integration in callbacks and training arguments documentation

* re order

* remove unused lines

* fix torch optional
2025-07-22 14:50:20 -07:00
c6d0500d15 [WIP] Add OneformerFastImageProcessor (#38343)
* [WIP] OneformerFastImageProcessor

* update init

* Fully working oneformer image processor fast

* change Nearest to Neares exact interpolation where needed

* fix doc

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-22 20:41:39 +00:00
4884b6bf41 Fix link in "Inference server backends" doc (#39589)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-22 16:44:08 +00:00
075a65657a Torchdec RuntimeError catch (#39580)
* fix

* fix

* maybe better

* style
2025-07-22 18:35:03 +02:00
2936902a76 [Paged-Attention] Handle continuous batching for repetition penalty (#39457)
* Handle continuous batching for repetition penalty

* fix last scores and with token mask creation

* add test

* Update src/transformers/generation/continuous_batching.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix formatting

* remove unneeded cast

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-22 18:13:40 +02:00
cbcb8e6c1f updated mistral3 model card (#39531)
* updated mistral3 model card (#1)

* updated mistral3 model card

* applying suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* made all changes to mistral3.md

* adding space between paragraphs in docs/source/en/model_doc/mistral3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* removing duplicate in mistral3.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* adding 4 backticks to preserve formatting

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 09:01:55 -07:00
601260fd96 Update docs/source/ko/_toctree.yml (#39516)
docs: update `docs/source/ko/_toctree.yml`
2025-07-22 09:00:42 -07:00
c338fd43b0 [cache refactor] Move all the caching logic to a per-layer approach (#39106)
* Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077)

- Introduces CacheLayer and Cache base classes
- Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers
- Implements method/attr dispatch across layers to reduce boilerplate
- Adds CacheProcessor hooks for offloading, quantization, etc.
- Updates and passes tests

* fix quantized, add tests

* remove CacheProcessorList

* raushan review, arthur review

* joao review: minor things

* remove cache configs, make CacheLayer a mixin (joaos review)

* back to storage inside Cache()

* remove cachebase for decorator

* no more __getattr__

* fix tests

* joaos review except docs

* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.

* Revert "back to storage inside Cache()"

This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913.

* cyril review

* simplify cache export

* fix lfm2 cache

* HybridChunked to layer

* BC proxy object for cache.key_cache[i]=...

* reorder classes

* bfff come on LFM2

* better tests for hybrid and hybridChunked

* complete coverage for hybrid chunked caches (prefill chunking)

* reimplementing HybridChunked

* cyril review

* fix ci

* docs for cache refactor

* docs

* oopsie

* oopsie

* fix after merge

* cyril review

* arthur review

* opsie

* fix lfm2

* opsie2
2025-07-22 16:10:25 +02:00
b16688e96a General weight initialization scheme (#39579)
* general + modulars from llama

* all modular models

* style and fix musicgen

* fix

* Update configuration_musicgen.py

* Update modeling_utils.py
2025-07-22 16:04:20 +02:00
015b62bf3e Add AMD GPU expectations for LLaVA tests (#39486)
* Add AMD GPU expectation to llava tests

* FMT

* Remove debug print

* Address review  comments
2025-07-22 14:01:54 +00:00
efceeaf267 Kernels flash attn (#39474)
* use partial to wrap around `transformers` utils!

* try to refactor?

* revert one wrong change

* just a nit

* push

* reverter watever was wrong!

* some nits

* fixes when there is no attention mask

* bring the licence back

* some fixes

* nit

* style

* remove prints

* correct dtype

* fa flags for testing

* update

* use paged attention if requested!

* updates

* a clone was needed, not sure why

* automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute

* simplify and improve?

* flash attention is kinda broken on recent cuda version so allow the opportunity to use something else

* fix!

* protect kernels import

* update

* properly parse generation config being passed

* revert and update

* add two tests

* some fixes

* fix test FA2

* takes comment into account

* fixup

* revert changes

* revert the clone, it is only needed because the metal kernel is not doing it?

* [docs] update attention implementation and cache docs (#39547)

* update docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applu suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix mps on our side for now

* Update src/transformers/integrations/flash_paged.py

* no qa

---------

Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:41:06 +02:00
b62557e712 Add AMD expectations to Mistral3 tests (#39481)
Add AMD expectations to mistral3 tests
2025-07-22 15:40:16 +02:00
1806583390 [docs] Create page on inference servers with transformers backend (#39550)
* draft docs on inference servers

* Update docs/source/en/_toctree.yml

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* update

* dic build failed

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply last suggestions

---------

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:31:10 +02:00
cd98c1fee3 [docs] update attention implementation and cache docs (#39547)
* update docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applu suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:06:43 +02:00
ef99537f37 Add AMD test expectations to DETR model (#39539)
* Add AMD test expectations to DETR model

* Fix baseline expectation

* Address review comments

* Make formatting a bit more consistent
2025-07-22 12:07:10 +00:00
30567c28e8 [timm_wrapper] add support for gradient checkpointing (#39287)
* feat: add support for gradient checkpointing in TimmWrapperModel and TimmWrapperForImageClassification

* ruff fix

* refactor + add test for not supported model

* ruff

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 11:07:52 +00:00
a44dcbe513 Fixes needed for n-d parallelism and TP (#39562)
Handle non-DTensors cases in TP Layers

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-22 10:24:59 +00:00
0cae633ce1 Bump AMD container for 2.7.1 PyTorch (#39458)
* Bump AMD container for 2.7.1 PyTorch

* Forgot to update pinned packages
2025-07-22 12:11:38 +02:00
a88ea9cbc8 Add EfficientLoFTR model (#36355)
* initial commit

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: various typos, typehints, refactors from suggestions

* fix: fine_matching method

* Added EfficientLoFTRModel and AutoModelForKeypointMatching class

* fix: got rid of compilation breaking instructions

* docs: added todo for plot

* fix: used correct hub repo

* docs: added comments

* fix: run modular

* doc: added PyTorch badge

* fix: model repo typo in config

* fix: make modular

* fix: removed mask values from outputs

* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor

* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list

* fix: reformat

* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride

* doc: added q, kv aggregation kernel size and stride doc to config

* refactor: converted efficientloftr implementation from modular to copied from mechanism

* tests: overwrote batching_equivalence for "keypoints" specific tests

* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils

* fix: make fix-copies

* fix: make style

* fix: update rope function to make meta tests pass

* fix: rename plot_keypoint_matching to visualize_output for clarity

* refactor: optimize image pair processing by removing redundant target size calculations

* feat: add EfficientLoFTRImageProcessor to image processor mapping

* refactor: removed logger and updated attention forward

* refactor: added auto_docstring and can_return_tuple decorators

* refactor: update type imports

* refactor: update type hints from List/Dict to list/dict for consistency

* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching

* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]

* fix typing

* fix some typing issues

* nit

* a few more typehint fixes

* Remove output_attentions and output_hidden_states from modeling code

* else -> elif to support efficientloftr

* nit

* tests: added EfficientLoFTR image processor tests

* refactor: reorder functions

* chore: update copyright year in EfficientLoFTR test file

* Use default rope

* Add docs

* Update visualization method

* fix doc order

* remove 2d rope test

* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py

* fix docs

* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py

* update gradient

* refactor: removed unused codepath

* Add motivation to keep postprocessing in modeling code

* refactor: removed unnecessary variable declarations

* docs: use load_image from image_utils

* refactor: moved stage in and out channels computation to configuration

* refactor: set an intermediate_size parameter to be more explicit

* refactor: removed all mentions of attention masks as they are not used

* refactor: moved position_embeddings to be computed once in the model instead of every layer

* refactor: removed unnecessary hidden expansion parameter from config

* refactor: removed completely hidden expansions

* refactor: removed position embeddings slice function

* tests: fixed broken tests because of previous commit

* fix is_grayscale typehint

* not refactoring

* not renaming

* move h/w to embeddings class

* Precompute embeddings in init

* fix: replaced cuda device in convert script to accelerate device

* fix: replaced stevenbucaille repo to zju-community

* Remove accelerator.device from conversion script

* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module

* fix: removed unused attributes in configuration

* fix: missing self

* fix: refactoring and tests

* fix: make style

---------

Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 10:53:16 +01:00
3bc726b381 [gemma3] fix bidirectional image mask (#39396)
* fix gemma3 mask

* make compile happy, and use only torch ops

* no full attention between images

* update tests

* fix tests

* add a fast test
2025-07-22 10:04:56 +02:00
fbeaf96f9e Update OLMoE model card (#39344)
* Update OLMoE model card

* Checks Test

* Add license and code

* Update docs/source/en/model_doc/olmoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update olmoe.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-21 16:41:01 -07:00
641aaed7c0 Update modernbertdecoder docs (#39453)
* update docs with paper and real model

* nit

* Apply suggestions from code review

Thanks to @stevhlui!

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Remove usage examples, add quantization

---------

Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-21 16:40:22 -07:00
049a674e68 [CI] Fix post merge ernie 4.5 (#39561)
fix repo consistency
2025-07-21 20:56:24 +02:00
b3ebc761e2 [Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps) (#39489)
* improve handlike of other image-like inputs in fast image processors

* fix issues with _prepare_images_structure

* update sam image processor fast

* use dict update
2025-07-21 14:12:14 -04:00
b4115a426e [Ernie 4.5] Add ernie text models (#39228)
* init

* copied from remote

* add proper structure and llama like structure

* fixup

* revert to state that works

* get closer to llama

* slow and steady

* some removal

* masks work

* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm

* nice

* getting closer

* closer to transformers style

* let's simplify this, batching works now

* simplified

* working version with modular

* it is indeed the rotation per weights, make it complete llama style

* cleanup conversion, next to look at -> tokenizer

* remove llama artefacts

* fix modeling tests (common ones)

* style

* integration test + first look into tokenization (will need more work, focussing on modeling other models first)

* style

* working moe version, based on remote

* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)

* more cleanup

* refactor namings and remove addition forXXX classes

* our moe won't cut it it seems, correction bias seems to be missing in remote code version

* tokenization change (remote)

* our moe version works when adding normalization :D

* cleanup moe

* nits

* cleanup modeling -> let's get to modular next

* style

* modular v1

* minor things + attempt at conversion (which doesn't work)

* no conversion follow glm, fixup modular and other nits

* modular cleanup

* fixes

* tests, tests, tests + some moe dtype forcing

* simplify modular, fix fatal fa2 bug, remaining tests

* fix import issue?

* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float

* fix sdpa test, load on init dtype only

* fixup post merge

* style

* fix doc links

* tokenization cleanup beginnings

* simplify tokenizer by a lot as its basically llama

* tokenizer is full llama with different defaults + extra special tokens

* sync og special tokens of ernie

* fix decoding with numbers (also in remote done what a timing), begin of tok tests

* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)

* nits

* docs

* my daily post merge it is

* check

* tokenization update with explanations and conversion script

* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)

* post merge fixes

* fixup tokenization, llama fast is the way to go

* more fixups

* check

* import fixes

* correction bias following the paddle code

* fix

* fix TP plan, fix correction bias sharding during forward

* style

* whoops

* fix tied weights

* docs and last nit

* license

* flasky tests

* move repo id, update when merged on the hub
2025-07-21 19:51:49 +02:00
69b158260f Refactor embedding input/output getter/setter (#39339)
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
2025-07-21 18:18:14 +02:00
2da97f0943 🌐 [i18n-KO] Translated perf_infer_gpu_multi.md to Korean (#39441)
* docs: ko: perf_infer_gpu_many.md

* feat: nmt draft

* docs: refine KO translation and enhance naturalness

* docs: add missing TOC to documentation

* Align toctree and filename with original: perf_infer_gpu_multi

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Refine Korean translation

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-07-21 09:14:15 -07:00
82807e56b1 [Fast image processor] refactor fast image processor glm4v (#39490)
refactor fast image processor glm4v
2025-07-21 11:18:46 -04:00
4b4f04fcca fix ndim check of device_mesh for TP (#39538) 2025-07-21 13:09:33 +00:00
1aa7256f01 Refactor MambaCache to modeling_mamba.py (#38086)
* Refactor MambaCache to modeling_mamba.py (parity with Zamba)

* ruff

* fix dummies

* update

* update

* remove mamba ref in cache tests

* remove cache_implementation from tests

* update

* ruff

* ruff

* sneaky regression

* model consistency

* fix test_multi_gpu_data_parallel_forward

* fix falcon slow tests

* ruff

* ruff

* add sample false

* try to fix slow tests

* Revert "fix test_multi_gpu_data_parallel_forward"

This reverts commit 66b7162c7c5c5ce8a73ccf48cffc8a96343ebb33.

* fix tests on nvidia t4, remove dataparallel tests from mamba

* ruff

* remove DDP tests from mamba and falcon_mamba

* add explicit error for MambaCache

* mamba2 also needs to init cache in prepare_inputs_for_generation

* ruff

* ruff

* move MambaCache to its own file

* ruff

* unprotected import fix

* another attempt to fix unprotected imports

* Revert "another attempt to fix unprotected imports"

This reverts commit 2338354fcab630de5899321f5daced5fb312c2a2.

* fixing unprotected import, attempt 3

* Update src/transformers/cache_utils.py

* ruff's fault

* fix arthur review

* modular falcon mamba

* found a hack

* fix config docs

* fix docs

* add export info

* merge modular falcon branch

* oopsie

* fix fast path failing

* new approach

* oopsie

* fix types

* Revert new pragma in modular

This reverts commit 80b1cf160ee251536f07c40b8a0857d499e70db6.

* trying another modular workaround

* review & fix ci

* oopsie

* clear prepare_inputs on mamba/mamba2/falcon_mamba
2025-07-21 14:59:36 +02:00
a419a40234 Fix Docstring of BarkProcessor (#39546)
* Fix Docstring of BarkProcessor

* Fix typo

* Add type hint of return value for BarkProcessor.__call__
2025-07-21 12:56:44 +00:00
9323d0873c use the enable_gqa param in torch.nn.functional.scaled_dot_product_at… (#39412)
* use the enable_gqa param in torch.nn.functional.scaled_dot_product_attention

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* ci failure fix

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add check

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix ci failure

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine code, extend to cuda

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine code

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix review comments

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine the PR

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 14:46:43 +02:00
6b3a1f2f51 Fix missing initializations for models created in 2023 (#39239)
* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 14:43:52 +02:00
970d9a75ce Raise TypeError instead of ValueError for invalid types (#38660)
* Raise TypeError instead of ValueError for invalid types.

* Removed un-necessary changes.

* Resolved conflicts

* Code quality

* Fix failing tests.

* Fix failing tests.
2025-07-21 12:42:00 +00:00
822c5e45b2 Fix pylint warnings (#39477)
* Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>

* Fix variable names

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-21 12:38:05 +00:00
dc017cd763 Fix Qwen Omni integration test (#39553)
fix
2025-07-21 14:11:46 +02:00
fdc0566e15 🚨🚨🚨 [Trainer] Enable average_tokens_across_devices by default in TrainingArguments (#39395)
Enable average_tokens_across_devices by default in TrainingArguments

Fixes #39392

This change improves loss calculation correctness for multi-GPU training by enabling proper token averaging across devices by default.

Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-21 12:11:20 +00:00
8c102e2eb1 Rename _supports_flash_attn_2 in examples and tests (#39471)
* delete `_supports_flash_attn_2` from examples and tests

* simplify docs
2025-07-21 14:02:57 +02:00
3a152e3a5c Fix the check in flex test (#39548)
* fix the check

* fix flags

* flags
2025-07-21 13:29:44 +02:00
78fb2d2760 Fix bad tensor shape in failing Hubert test. (#39502)
Fix bad tensor shape in Hubert test.
2025-07-21 12:25:52 +01:00
39ba5f3cc2 GLM-4 Update (#39393)
* one commit with full

* Create glm4_moe.md

* Update check_config_docstrings.py

* Update __init__.py

* update

* argue

* argue: router problem

* 1

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update modular_glm4_moe.py

* update

* use dsv3 pretrainmodel in modular

* update for test

* upodate new modular

* use LlamaAttention and avoid use  CohereAttention cause repeat norm

* update the modular

* update attn modular

* update

* Update modular_glm4_moe.py

* MTP layer is need to ignore

* fix gradient error using with dots_1 method

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
344012b3a6 [qwen2 vl] fix packing with all attentions (#39447)
* fix qwen2 vl packing in FA2

* why? delete!

* qwen2-5-vl seems to work now

* update

* fix tests

* start by adapting FA2 tests

* add similar tests for sdpa/eager

* address comments

* why is this even in conditional model and not base model?
2025-07-21 12:19:15 +02:00
e42681b48b [gemma3] support sequence classification task (#39465)
* add seq clf class

* fix docs and add in auto-map

* skip tests

* optional pixels
2025-07-21 11:03:20 +02:00
34133d0a79 Fix placeholders replacement logic in auto_docstring (#39433)
Fix and simplify placeholders replacement logic
2025-07-18 22:56:23 +00:00
433d2a23d7 Update SAM/SAM HQ attention implementation + fix Cuda sync issues (#39386)
* update attention implementation and improve inference speed

* modular sam_hq + fix integration tests on A10

* fixup

* fix after review

* softmax in correct place

* return attn_weights in sam/sam_hq
2025-07-18 18:46:27 -04:00
541bed22d6 Improve @auto_docstring doc and rename args_doc.py to auto_docstring.py (#39439)
* rename `args_doc.py` to `auto_docstring.py` and improve doc

* modifs after review
2025-07-18 18:00:34 +00:00
de0dd3139d Add fast image processor SAM (#39385)
* add fast image processor sam

* nits
2025-07-18 17:27:16 +00:00
561a79a2f4 Fix BatchEncoding.to() for nested elements (#38985) 2025-07-18 14:14:45 +01:00
f4d076561f [gemma3] Fix do_convert_rgb in image processors. (#39438)
* [gemma3] Fix do_convert_rgb in image processors.

* [gemma3] Fix do_convert_rgb in image processors.
2025-07-18 12:33:00 +00:00
bcc0091937 [chat template] return assistant mask in processors (#38545)
* messed up the git history, squash commits

* raise error if slow and refine tests

* index was off by one

* fix the test
2025-07-18 12:23:20 +00:00
328ca9cf1d [dependencies] Update datasets pin (#39500)
* pyarrow pin

* make fixup

* test?

* like this?

* like this?

* like this?

* datasets pin

* comment
2025-07-18 12:05:28 +00:00
fb58377700 Slack CI bot: set default result for non-existing artifacts (#39499)
* Set default result for non-existing artifacts

* FMT

* Address review comments
2025-07-18 11:45:47 +00:00
4ded9a4113 🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423)
* first try

* Update modeling_utils.py

* Update modeling_utils.py

* big refactor

* Update modeling_utils.py

* style

* docstrings and simplify inner workings of configs

* remove all trace of _internal

* Update modeling_utils.py

* fix logic error

* Update modeling_utils.py

* recursive on config

* Update configuration_utils.py

* fix

* Update configuration_dpt.py

* Update configuration_utils.py

* Update configuration_utils.py

* Update modeling_idefics.py

* Update modeling_utils.py

* fix for old models

* more old models fixup

* Update modeling_utils.py

* Update configuration_utils.py

* Remove outdated test

* remove the deepcopy!! 🥵🥵

* Update test_modeling_gpt_bigcode.py

* fix qwen dispatch

* restrict to only models supporting it

* style

* switch name

* Update modeling_utils.py

* Update modeling_utils.py

* add tests!

* fix

* rypo

* remove bad copies

* fix

* Update modeling_utils.py

* additional check

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* fix

* skip
2025-07-18 13:41:54 +02:00
2b819ba4e3 [dependencies] temporary pyarrow pin (#39496)
* pyarrow pin

* make fixup

* test?

* like this?

* like this?

* like this?
2025-07-18 10:05:40 +00:00
967045082f Add voxtral (#39429)
* draft

* draft update (conversion working)

* mend

* draft update

* draft update: working generate

* refactor

* VoxtralProcessor draft

* processor update

* update convert_tekken_tokenizer

* refactor processor

* update convert

* make style

* better handle prefil

* make style

* add tests

* add mistral_common audio loading

* processor update

* revert changes

* audio utils update

* add audio to apply chat template mistral update

* voxtral processor update

* fix

* udpate converstion script

* make mistral tokenier from pretrain work from local dir

* fix udpates

* add integration tests

* add batched version

* processor docstring

* make style

* revert convert_tekken_tokenizer changes

* revert processing_qwen2.5 changes

* add multi-turn test

* processor improvements

* address review changes

* Update src/transformers/tokenization_mistral_common.py

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* update audio utils

* nits

* integration test update

* correct _support

* update tests

* test update

* update integration tests

* fix

* fix

* fix

* add test_apply_chat_template_with_audio

* add model doc

* model doc

* nit

* doc uptade

* nit

* processor improvement

* ensure default is 3B

* nits

* make

* make

* convert modular

* update checkpoint

* fix test

* make

* make

* autos

* make

* make

* nit

* nit

* nit

---------

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-18 00:02:04 +00:00
73869f2e81 Fix typing order (#39467)
* fix type order

* change all Union[str, dict] to Union[dict, str]

* add hf_parser test && fix test order

* add deepspeed dependency

* replace deepspeed with accelerator
2025-07-17 15:47:31 +00:00
bda75b4011 Add unified logits_to_keep support to LLMClass (#39472)
* add supports for logits_to_keep for qwen25vl and glm4v

* Update relevant modular files
2025-07-17 17:07:12 +02:00
bf6c997685 [serve] Add speech to text (/v1/audio/transcriptions) (#39434)
* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* use openai

* validate request, including detecting unused fields

* dict indexing

* dict var access

* tmp commit (tests failing)

* add slow

* use oai output type in completions

* (little rebase errors)

* working spec?

* guard type hint

* type hints. fix state (CB can now load different models)

* type hints; fn names; error type

* add docstrings

* responses + kv cache

* metadata support; fix kv cache; error event

* add output_index and content_index

* docstrings

* add test_build_response_event

* docs/comments

* gate test requirements; terminate cb manager on model switch

* nasty type hints

* more type hints

* disable validation by default; enable force models

* todo

* experiment: base model from typed dict

* audio working

* fix bad rebase

* load audio with librosa

* implement timed models

* almost working

* make fixup

* fix tests

* transcription request type

* tokenizer -> processor

* add example in docs

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-07-17 14:29:57 +00:00
8b3de61a65 Update integration_utils.py (#39469)
* Update integration_utils.py

sanitize mlflow upload metric

* Update integration_utils.py

change import order to pass CI

* Update integration_utils.py

add comments

* Update integration_utils.py

Remove whitespace from blank line
2025-07-17 13:57:49 +00:00
7fd60047c8 fix: ImageTextToTextPipeline handles user-defined generation_config (#39374)
fix: ImageTextToTextPipeline handles user-defined generation_config passed to the pipeline

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-07-17 13:23:29 +00:00
60b5471da3 Enable some ruff checks for performance and readability (#39383)
* Fix inefficient sequence tests

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PERF102

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PLC1802

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PLC0208

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-17 13:21:59 +00:00
fc700c2a26 Fix convert_and_export_with_cache failures for GPU models (#38976)
* Add the `device` option for `generate()`

* Add device for default tensors to avoid tensor mismatch

* [test] Enable test_static_cache_exportability for torch_device

* infer device from the prompt_token_ids

* Add device for generated tensor

* [Test] Make `test_export_static_cache` tests to run on devices rather than only CPU

* fix format

* infer device from the model
2025-07-17 13:12:32 +00:00
54680d75c9 Update GemmaIntegrationTest::test_model_2b_bf16_dola (#39362)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-17 14:06:23 +01:00
322400af58 fix a comment typo in utils.py (#39459) 2025-07-17 13:06:04 +00:00
43f07018cf Use newer typing notation (#38934)
Signed-off-by: cyy <cyyever@outlook.com>
2025-07-17 13:05:21 +00:00
565dd0bad7 Fix tests due to breaking change in accelerate (#39451)
* update values

* fix
2025-07-17 13:51:50 +01:00
26fed50460 fix max_length calculating using cu_seq_lens (#39341) 2025-07-17 10:54:23 +02:00
cdfe6164b3 fix(pipelines): QA pipeline returns fewer than top_k results in batch mode (#39193)
* fixing the bug

* Try a simpler approach

* make fixup

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-07-17 10:24:30 +02:00
b85ed49e0a Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings (#38822)
* Correcting PR #38642.  The PR removed references to the deprecated method "as_target_processor()" in the
__call__ and pad method docstrings, which is correct, but also removed all references to PreTrainedTokenizer,
which is incorrect.  This commit adds back the reference to PreTrainedTokenizer and also takes the
opportunity to enhance the docstrings with the invocation procedure post removal of "as_target_processor()"
and adds information on return values.

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: René Tio <tor@Jammer.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-16 14:13:07 -07:00
787a0128a9 create ijepa modelcard (ref : PR #36979 ). (#39354)
* wip: adding first version of the IJEPA model card.

* refactor based on the @stevhliu feedbacks

* refactor:
- revert the accidental removal of the autodoc api description and the image reerece architecture

- general context updation.

* - changes of model for example quantization.
- merging the  quantization content.
2025-07-16 12:40:22 -07:00
48f2233cdf Improve grammar and clarity in perf_hardware.md (#39428) 2025-07-16 12:15:15 -07:00
e68ebb695f fix cached file error when repo type is dataset (#36909)
* fix cached file

* Update hub.py
2025-07-16 18:02:26 +02:00
35a416c400 Fix indentation bug in SmolVLM image processor causing KeyError (#39452)
Fix indentation bug in Idefics3 image processor

- Fix KeyError when do_image_splitting=False
- Move split_images_grouped assignment inside loop
- Ensures all image shapes are stored, not just the last one
- This fixes the bug in both Idefics3 and generated SmolVLM processors

cc @yonigozlan

Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
2025-07-16 11:59:28 -04:00
2c58705dc2 Updated Megatron conversion script for gpt2 checkpoints (#38969)
* update script to support new megatron gpt format

* fixed quality failures

---------

Co-authored-by: Luke Friedrichs <LckyLke>
2025-07-16 15:54:29 +00:00
26be7f717e [CI] Fix partially red CI (#39448)
fix
2025-07-16 15:53:43 +02:00
0a88751940 Fixes #39204: add fallback if get_base_model missing (#39226)
* Fixes #39204: add fallback if get_base_model missing

* Inline try_get_base_model logic as suggested in PR review

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-16 15:51:30 +02:00
ba506f87db make the loss context manager easier to extend (#39321) 2025-07-16 15:47:24 +02:00
9f1ac6f185 Remove something that should have never been there (#38254)
* what the hell

* update

* style

* style

* typing

* fix init issue

* fix granite moe hybrid as well
2025-07-16 15:22:44 +02:00
a7ca5b5d67 Fix processor tests (#39450)
fix
2025-07-16 15:01:35 +02:00
71818f570b [Bugfix] [Quantization] Remove unused init arg (#39324)
remove unused arg from ct config init

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-16 14:57:42 +02:00
cc24b0378e Better typing for model.config (#39132)
* Apply to all models config annotation

* Update modular to preserve order

* Apply modular

* fix define docstring

* fix dinov2 consistency (docs<->modular)

* fix InstructBlipVideoForConditionalGeneration docs<->modular consistency

* fixup

* remove duplicate code

* Delete config_class attribute from the modeling code

* Add config_class attribute in base model

* Update init sub class

* Deprecated models update

* Update new models

* Fix remote code BC issue

* fixup

* fixing more corner cases

* fix new models

* add test

* modular docs update

* fix comment a bit

* fix for py3.9
2025-07-16 14:50:35 +02:00
4b258454a7 Fix typo in generation configuration for Janus model weight conversion (#39432)
* Fix typo in generation configuration for Janus model weight conversion

* Fix typo

* Update Janus model generation configuration

* Update Janus model to use generation_kwargs
2025-07-16 14:28:02 +02:00
de5ca373ac Responses API in transformers serve (#39155)
* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* Responses API (to be merged into #39155) (#39338)

* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* use openai

* validate request, including detecting unused fields

* dict indexing

* dict var access

* tmp commit (tests failing)

* add slow

* use oai output type in completions

* (little rebase errors)

* working spec?

* guard type hint

* type hints. fix state (CB can now load different models)

* type hints; fn names; error type

* add docstrings

* responses + kv cache

* metadata support; fix kv cache; error event

* add output_index and content_index

* docstrings

* add test_build_response_event

* docs/comments

* gate test requirements; terminate cb manager on model switch

* nasty type hints

* more type hints

* disable validation by default; enable force models

* todo

---------

Co-authored-by: Lysandre <hi@lysand.re>

* Slight bugfixes

* PR comments from #39338

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2025-07-16 14:16:16 +02:00
c8524aeb07 [cache] make all classes cache compatible finally (#38635)
* dump

* push other models

* fix simple greedy generation

* xmod

* add fmst and clean up some mentions of old cache format

* gpt-bigcode now follows standards

* delete tuple cache reference in generation

* fix some models

* fix some models

* fix mambas and support cache in tapas

* fix some more tests

* fix copies

* delete `_reorder_cache`

* another fix copies

* fix typos and delete unnecessary test

* fix rag generate, needs special cache reordering

* fix tapas and superglue

* reformer create special cache

* recurrent gemma `reorder_cache` was a no-op, delete

* fix-copies

* fix blio and musicgen pipeline tests

* fix reformer

* fix reformer, again...

* delete `_supports_cache_class`

* delete `supports_quantized_cache`

* fix failing tests

* fix copies

* some minor clean up

* style

* style

* fix copies

* fix tests

* fix copies

* create causal mask now needs positions?

* fixc copies

* style

* Update tests/test_modeling_common.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* clean-up of non-generative model after merging main

* check `is_decoder` for cache

* delete transpose for scores

* remove tuple cache from docs everywhere

* fix tests

* fix copies

* fix copies once more

* properly deprecate `encoder_attention_mask` in Bert-like models

* import `deprecate_kwarg` where needed

* fix copies again

* fix copies

* delete `nex_decoder_cache`

* fix copies asks to update for PLM

* fix copies

* rebasing had a few new models, fix them and merge asap!

* fix copies once more

* fix slow tests

* fix tests and updare PLM checkpoint

* add read token and revert accidentally removed line

* oh com -on, style

* just skip it, read token has no access to PLM yet

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-16 14:00:17 +02:00
6cb43defd0 docs: add missing numpy import to minimal example (#39444)
docs: add numpy import to minimal example
2025-07-16 11:57:13 +00:00
61163099f1 Remove runtime conditions for type checking (#37340)
Remove dynamic conditions for type checking

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 13:36:48 +02:00
bfc9ddf5c6 Add StableAdamW Optimizer (#39446)
* Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour.

* Fixed issue with

* Added docs for StableAdamW. Also fixed a typo in schedule free optimizers

---------

Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>
2025-07-16 13:35:53 +02:00
b9ee528246 add test scanner (#39419)
* add test scanner

* add doc + license

* refactor for only 1 tree traversal

* add back test of only one method

* document single method scan

* format

* fixup generate tests

* minor fix

* fixup

* fixup doc
2025-07-16 12:45:46 +02:00
79941c61ce Fix missing definition of diff_file_url in notification service (#39445)
Fix missing definition of diff_file_url
2025-07-16 12:09:18 +02:00
e048d48bd0 Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer (#31870)
* add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer

* Update src/transformers/optimization.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update optimization.py

fix the error of the unclosed "("

* Update optimization.py

remove whitespace in line 402 in order to pass the quality test

* Update src/transformers/optimization.py

* Update src/transformers/optimization.py

* Apply style fixes

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-16 12:01:08 +02:00
0cf08e90dd Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor (#39372)
Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor
2025-07-16 11:54:20 +02:00
ae4e306a40 Defaults to adamw_torch_fused for Pytorch>=2.8 (#37358)
* Defaults to adamw_torch_fused for latest Pytorch

Signed-off-by: cyy <cyyever@outlook.com>

* Fix test

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-16 09:52:33 +00:00
4524a68c66 Fix L270 - hasattr("moe_args") returning False error (#38715)
* Fix L270 - hasattr("moe_args") returning False error

* Update src/transformers/models/llama4/convert_llama4_weights_to_hf.py

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 09:45:58 +00:00
d33a1c389f [chat template] add a testcase for kwargs (#39415)
add a testcase
2025-07-16 11:31:35 +02:00
99c9763398 Fixed a bug calculating cross entropy loss in JetMoeForCausalLM (#37830)
fix: 🐛 Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM

In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function.

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 11:22:00 +02:00
667ad02374 Remove double soft-max in load-balancing loss. Fixes #39055 . (#39056)
Remove double soft-max in load-balancing loss. Fixes #39055
2025-07-16 09:20:23 +00:00
31d81943c9 [Core] [Offloading] Fix saving offloaded submodules (#39280)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated change

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add clarifying comment

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test_save_offloaded_model_with_direct_params

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix merge conflict, add decorators

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-16 08:44:40 +00:00
add43c4d09 [autodocstring] add video and audio inputs (#39420)
* add  video and audio inputs in auto docstring

* fix copies
2025-07-16 09:41:50 +02:00
0dc2df5dda CI workflow for performed test regressions (#39198)
* WIP script to compare test runs for models

* Update line normalitzation logic

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-16 04:20:02 +02:00
1bc9ac5107 docs: update LightGlue docs (#39407)
* docs: update LightGlue docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-15 12:40:50 -07:00
d9574f2fe3 docs: update SuperGlue docs (#39406)
* docs: update SuperGlue docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-15 12:40:26 -07:00
9f41f67135 [vlm] fix loading of retrieval VLMs (#39242)
* fix vlm with retrieval

* we can't use AutoModel because new ColQwen was released after refactor

* no need for colqwen

* tied weight keys are necessary, if using IMageTextToText

* need to apply renaming in tied weights, only for ColPali

* overwrite tied keys in ColPali

* fix copies, modular can't handle if-statements
2025-07-15 17:23:54 +02:00
b1d14086e4 handle training summary when creating modelcard but offline mode is set (#37095)
* handle training summary when creating modelcard but offline mode is set

* chore: lint
2025-07-15 17:21:15 +02:00
67f42928f0 Remove residual quantization attribute from dequantized models (#39373)
* fix: removing quantization trace attribute from dequantized model

Fixes #39295

* add: test `to(dtype=torch.float16)` after dequantization
2025-07-15 17:16:10 +02:00
30c508dbcb Remove deprecated audio utils functions (#39330)
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-15 14:02:25 +00:00
d8e05951b8 Fix bugs in pytorch example run_clm when streaming is enabled (#39286) 2025-07-15 15:37:28 +02:00
a989bf8d84 Fix bugs from pipeline preprocessor overhaul (#39425)
* Correct load classes for VideoClassificationPipeline

* Correct load classes for the ASR pipeline
2025-07-15 14:28:59 +01:00
53c9dcd6fd refactor: remove set_tracer_provider and set_meter_provider calls (#39422) 2025-07-15 14:22:12 +02:00
f03b384149 Fix invalid property (#39384)
Signed-off-by: cyy <cyyever@outlook.com>
2025-07-15 12:11:37 +00:00
c4d41567fa set document_question_answering pipeline _load_tokenizer to True (#39411)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-15 12:05:49 +00:00
f56b49f48f Ignore extra position embeddings weights for ESM (#39063)
* Ignore extra position embeddings weights

* Slight name fix
2025-07-15 11:57:32 +00:00
2b79f14375 support loading qwen3 gguf (#38645)
* support loading qwen3 gguf

* Add qwen3 into GGUF_TO_FAST_CONVERTERS for tokenizer conversion

* Add testcase

* Fix formatting
2025-07-15 09:53:41 +00:00
0e4b7938d0 Add ModernBERT Decoder Models - ModernBERT, but trained with CLM! (#38967)
* working locally; need to style and test

* added docs and initial tests; need to debug and flesh out

* fixed tests

* working long context; batches

* working fa2 and eager

* update tests

* add missing confnigs

* remove default autoset

* fix spacing

* fix most tests

* fixed tests

* fix to init

* refactor to match new transformers updates

* remove static cache option

* fa2 fix

* fix docs

* in progress

* working on tests

* fixed issue with attn outputs

* remove debug

* fix local config attr

* update doc string

* fix docstring

* add docs to toc

* correct typo in toc

* add new updates from main w.r.t. ModernBERT RoPE

* fix local param

---------

Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>
2025-07-15 10:40:41 +02:00
0b724114cf Fix typo in /v1/models output payload (#39414) 2025-07-15 08:59:25 +01:00
8d6259b0b8 [refactor] set attention implementation (#38974)
* update

* fix some tests

* init from config, changes it in-place, add deepcopy in tests

* fix modernbert

* don't delete thsi config attr

* update

* style and copies

* skip tests in generation

* fix style

* accidentally removed flash-attn-3, revert

* docs

* forgot about flags set to False

* fix copies

* address a few comments

* fix copies

* custom code BC
2025-07-15 09:34:06 +02:00
6017f5e8ed [siglip] fix pooling comment (#39378)
* feat(siglip2): add forward pass with pooled output logic in Siglip2TextModel

* test(siglip2): add test_text_model.py to verify pooled output behavior

* style(siglip2): fix formatting in test_text_model.py using Ruff

* fix(siglip2): remove misleading 'sticky EOS' comment and sync modular-classic files

* fix(siglip2): remove misleading 'sticky EOS' comment and sync modular-classic files

* chore(siglip2): regenerate classic model after modular change

* Update
2025-07-14 17:47:19 +00:00
8d40ca5749 Update phi4_multimodal.md (#38830)
* Update phi4_multimodal.md

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-14 10:35:17 -07:00
3635415af2 [Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic (#39391)
Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic
2025-07-14 09:25:06 -07:00
3a48e9534c Use np.pad instead of np.lib.pad. (#39346)
* Use np.pad instead of np.lib.pad.

* Update audio_utils.py

Formatting
2025-07-14 16:05:28 +00:00
3d8be20cd2 Totally rewrite how pipelines load preprocessors (#38947)
* Totally rewrite how pipelines load preprocessors

* Delete more mappings

* Fix conditionals, thanks Cyril!
2025-07-14 16:40:04 +01:00
903944a411 [examples] fix do_reduce_labels argument for run_semantic_segmentation_no_trainer (#39322)
* no use do_reduce_labels argument in model

* use do_reducer_labels in AutoImageProcessor
2025-07-14 10:16:49 +00:00
8165c703ab Fix Lfm2 and common tests (#39398)
* fix

* better fix

* typo
2025-07-14 12:02:59 +02:00
878d60a3cb Deprecate AutoModelForVision2Seq (#38900)
deprecate vision2seq
2025-07-14 11:42:06 +02:00
ad333d4852 [Qwen2.5-VL] Fix torch.finfo() TypeError for integer attention_mask_tensor (#39333)
* Update modeling_qwen2_5_vl.py

### 🐛 Bug Description

When using Unsloth’s Qwen2.5-VL vision models (both 3B and 7B) with the latest HuggingFace Transformers (commit: 520b9dcb42cef21662c304583368ff6645116a45), the model crashes due to a type mismatch in the attention mask handling.

---

### 🔥 Error Traceback

* Fix dtype compatibility in attention mask processing

Replace hardcoded torch.finfo() usage with dtype-aware function selection to handle both integer and floating-point attention mask tensors.
Technical Details:

Problem: Line 1292 assumes floating-point dtype for attention_mask_tensor
Solution: Add dtype check to use torch.iinfo() for integer types and torch.finfo() for float types
Files Modified: transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

* Update modeling_qwen2_5_vl.py

* Update modeling_qwen2_5_vl.py

* Fix: Cast to float before applying torch.finfo

* # Fix: Use appropriate function based on dtype

* Update modular_qwen2_5_vl.py

* Fix: Cast to float before applying torch.finfo

* Fix: Use appropriate function based on dtype

* Fix: Use appropriate function based on dtype

* Updatet modeling_glm4v.py

* Only apply conversion for floating point tensors (inverted masks)

* corrected the format issue

reformatted modeling_glm4v.py

All done!  🍰 
1 file reformatted

* Fix: Cast to float before applying torch.finfo

Corrected the format issue

* Fix torch.finfo() for integer attention mask

#39333

* Run make fix-copies and make style for CI compliance

- Updated dependency versions table
- Fixed code formatting and style issues
- Sorted auto mappings
- Updated documentation TOC

* Fix torch.finfo() TypeError for

Fix torch.finfo() TypeError for integer attention_mask_tensor #39333

* Fix torch.finfo() TypeError for integer
2025-07-14 07:47:39 +00:00
c30af65521 [BLIP] remove cache from Qformer (#39335)
* remove cache from Qformer

* fix

* this was never correct...
2025-07-14 09:20:01 +02:00
66cd995618 [shieldgemma] fix checkpoint loading (#39348)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-14 08:34:58 +02:00
a1ad9197c5 Fix overriding Fast Image/Video Processors instance attributes affect other instances (#39363)
* fix and add tests

* nit
2025-07-12 23:39:06 +00:00
dc98fb3e5e update docker file to use latest timm (for perception_lm) (#39380)
update docker file for timm

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-12 23:19:37 +02:00
5c30f7e390 Update Model Card for Encoder Decoder Model (#39272)
* update model card.

* add back the model contributors for mamba and mamba2.

* update the model card.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update batches with correct alignment.

* update examples and remove quantization example.

* update the examples.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update example.

* correct the example.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-11 11:23:08 -07:00
0d7efe3e4b fix gpt2 usage doc (#39351)
fix typo of gpt2 doc usage
2025-07-11 10:59:41 -07:00
a646fd55fd Updated CamemBERT model card to new standardized format (#39227)
* Updated CamemBERT model card to new standardized format

* Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution

* Updated CamemBERT usage examples, quantization, badges, and format

* Updated CamemBERT badges

* Fixed CLI Section
2025-07-11 10:59:09 -07:00
af74ec65a7 Update Readme to Run Multiple Choice Script from Example Directory (#39323)
* Update Readme to run in current place

* Update Readme files to execute PyTorch examples from their respective folders
2025-07-11 10:58:26 -07:00
70e57e4710 Add mistral common support (#38906)
* wip: correct docstrings

* Add mistral-common support.

* quality

* wip: add requested methods

* wip: fix tests

* wip: add internally some methods not being supported in mistral-common

* wip

* wip: add opencv dependency and update test list

* wip: add mistral-common to testing dependencies

* wip: revert some test changes

* wip: ci

* wip: ci

* clean

* check

* check

* check

* wip: add hf image format to apply_chat_template and return pixel_values

* wip: make mistral-common non-installed safe

* wip: clean zip

* fix: from_pretrained

* fix: path and base64

* fix: path and import root

* wip: add docs

* clean

* clean

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-11 16:26:58 +00:00
665418dacc Remove device check in HQQ quantizer (#39299)
* Remove device check in HQQ quantizer

Fix https://github.com/huggingface/transformers/issues/38439

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-11 14:59:51 +00:00
601bea2c4e Verbose error in fix mode for utils/check_docstrings.py (#38915)
* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.
2025-07-11 14:36:10 +00:00
24f771a043 fix failing test_sdpa_can_dispatch_on_flash (#39259)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-11 16:30:56 +02:00
ee74397d20 update cb TP (#39361)
* update cb TP

* safety
2025-07-11 15:54:25 +02:00
9bc675b3b6 Fix link for testpypi (#39360)
fix link
2025-07-11 15:34:01 +02:00
bf607f6d3b PerceptionLM (#37878)
* plm template

* A working plm with fixed image features

* hacked processor

* First version that reproduced PLM output using PE from timm.

* Simplify and fix tie_word_embeddings

* Use PIL resize. Simplify converstion.

* First version that works with video input.

* simplifed image preprocessing (not batched)

* Minor fixes after rebasing on main.

* Video processor based on new API.

* Revert to use _preprocess for image processor.

* refactor with modular

* fix tie_word_embedding

* Testing with timm PE

* check in missed converstion from modular to model.py

* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.

* address review comments

* Fixed batching if video and image examples mixed.

* Simplify PE configuration.

* Enable AutoModel for PerceptionEncoder.

* Update PE config style.

* update all headers

* Minor fixes.

* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.

* Fix for testing_modeling_perception_lm.py

* Image processing refactoring to use more common parts.

* Fix processor test.

* update tests to use model from hub

* More test fixes.

* integration test GT update after rebasing; probably due to video preprocessing

* update test media path to hub

* Stop tracking local scripts

* address some review comments

* refactor image processing.

* small fixes

* update documentation and minor fixes

* remove scripts

* Minor fix for CI

* Fix image processing

* CI and doc fix

* CI formatting fix

* ruff fix

* ruff formatting

* ran utils/sort_auto_mappings.py

* update docstring

* more docstring udpates

* add vision_input_type default fallback for image processing

* more verbose variable naming

* test update

* Remove PE and PEConfig use AutoModel(TimmWrapper) instead

* Minor cleanup.

* Minor Fix: remove any ref to PE. Ruff format and check.

* fix docstring

* Fix modular/model consistency.Improvex docstringfor  .

* Fix PerceptionLMForConditionalGenerationModelTest

* ruff fix

* fix for check_repo

* minor formatting

* dummy size arg to fix for processor test.

* Update docstring for PerceptionLMConfig

* Minor fixes from review feedback.

* Revert some minor changes per reviewer feedback.

* update base_model_prefix

* address reviewer feedback

* fix comment in modeling file

* address reviewer feedback

* ruff format

* Pre-merge test update.

* reapply modular and fix checkpoint name

* processor test path

* use modular a bit more

* remove dead code

* add token decorator

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-11 11:07:32 +02:00
4b47b2b8ea Updated Switch Transformers model card with standardized format (Issue #36979) (#39305)
* Updated Switch Transformers model card with standardized format (Issue #36979)

* Apply reviewer suggestions to the new standardised Switch Transformer's model card

* Update switch_transformers.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-10 15:34:10 -07:00
fe1a5b73e6 [modular] speedup check_modular_conversion with multiprocessing (#37456)
* Change topological sort to return level-based output (lists of lists)

* Update main for modular converter

* Update test

* update check_modular_conversion

* Update gitignore

* Fix missing conversion for glm4

* Update

* Fix error msg

* Fixup

* fix docstring

* update docs

* Add comment

* delete qwen3_moe
2025-07-10 19:07:59 +01:00
571a8c2131 Add a default value for position_ids in masking_utils (#39310)
* set default

* Update masking_utils.py

* add small test
2025-07-10 18:53:40 +02:00
bdc8028cb3 [Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups (#39263)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-10 18:33:30 +02:00
df49b399dc [tests] tag serve tests as slow (#39343)
* maybe they need more cpu resources?

* add todo
2025-07-10 15:40:08 +00:00
36e80a18da [modeling][lfm2] LFM2: Remove deprecated seen_tokens (#39342)
* [modeling][lfm2] remove deprecated seen_tokens

* [modular][lfm2] remove deprecated seen_tokens from modular file
2025-07-10 17:27:55 +02:00
9682d07f92 LFM2 (#39340)
* [modeling][lfm2] LFM2 model on 4.53.0 interface

* [configuration] hook in LFM2 keys

* [modeling][lfm2] update modeling interface for 4.53.1

* [modeling][lfm2] apply mask to hidden conv states

* [misc] ruff format/lint

* [modeling][lfm2] minor: NotImplemented legacy cache conversion

* Create lfm2.md

* create nice modular

* style

* Update modeling_auto.py

* clean and start adding tests

* style

* Update test_modeling_lfm2.py

* Update __init__.py

* small test model size

* config

* small fix

* fix

* remove useless config attrs -> block_dim and conv_dim are hiden_size

* fix prepare inputs

* fix config

* test

* typo

* skip tests accordingly

* config docstrings

* add doc to .md

* skip config docstring check

---------

Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-10 16:07:33 +02:00
38c3931362 [server] add tests and fix passing a custom generation_config (#39230)
* add tests; fix passing a custom generation_config

* tool integration test

* add install step

* add accelerate as dep to serving

* add todo
2025-07-10 13:41:38 +00:00
6b09c8eab0 Handle DAC conversion when using weight_norm with newer PyTorch versions (#36393)
* Update convert_dac_checkpoint.py

* Update convert_dac_checkpoint.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-10 10:36:58 +00:00
92043bde29 fix phi3 tests (#39312)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-10 11:51:55 +02:00
520b9dcb42 fix Glm4v batch videos forward (#39172)
* changes for video

* update modular

* change get_video_features

* update video token replacement

* update modular

* add test and fix typo

* lint

* fix order

* lint

* fix

* remove dependency

* lint

* lint

* remove todo

* resize video for test

* lint..

* fix test

* new a processor for video_test

* fix test
2025-07-10 10:44:28 +02:00
bc161d5d06 Delete deprecated stuff (#38838)
* delete deprecated stuff

* fix copies

* remove unused tests

* fix modernbert and fuyu

* Update src/transformers/cache_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* bye bye `seen_tokens`

* address comments

* update typings

* ecnoder decoder models follow same pattern as whisper

* fix copies

* why is it set to False?

* fix switch transformers

* fix encoder decoder models shared weight

* fix copies and RAG

* remove `next_cache`

* fix gptj/git

* fix copies

* fix copies

* style...

* another forgotten docsrting

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-10 05:18:44 +00:00
c6ee0b1da8 Fix broken SAM after #39120 (#39289)
fix
2025-07-09 17:46:22 -04:00
aff7df8436 enable static cache on TP model (#39164)
* enable static cache on TP model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check tp size before init kv cache

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix docstring

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add tp tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix other cache head size

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-09 21:14:45 +00:00
2ef59646b8 Fix max_length_q and max_length_k types to flash_attn_varlen_func (#37206)
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-09 23:12:39 +02:00
2d600a4363 Granite speech speedups (#39197)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property

* speedup relative position embeddings

* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.

* minor

* minor

* minor

* fixing a crash without peft active

* add todo to replace einsum

* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.

* support audio.shape=(1,L)
2025-07-09 23:09:50 +02:00
5111c8ea2f Fix typo: langauge -> language (#39317) 2025-07-09 12:06:46 -07:00
2781ad092d docs: update LLaVA-NeXT model card (#38894)
* docs: update LLaVA-NeXT model card

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [docs] Updated llava_next model card

* Update docs/source/en/model_doc/llava_next.md remove image sources

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [fix] Change Flash Attention to SDPA badge

* [doc] fixed quantization example

* docs: updated contribution details and badges

* Update llava_next.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 11:32:40 -07:00
16dd7f48d0 skip files in src/ for doctest (for now) (#39316)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:36:48 +02:00
d61c0d087c Updated the Model docs - for the MARIAN model (#39138)
* Update marian.md

This update improves the Marian model card to follow the Hugging Face standardized model card format. The changes include:

- Added a clear description of MarianMT, its architecture, and how it differs from other models.
- Provided usage examples for Pipeline and AutoModel.
- Added a quantization example for optimizing model inference.
- Included instructions and examples for multilingual translation with language codes.
- Added an Attention Mask Visualizer example.
- Added a Resources section with relevant links to papers, the Marian framework, language codes, tokenizer guides, and quantization documentation.
- Fixed formatting issues in the code blocks for correct rendering.

This update improves the readability, usability, and consistency of the Marian model documentation for users.

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 10:23:03 -07:00
161cf3415e add stevhliu to the list in self-comment-ci.yml (#39315)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:07:44 +02:00
3be10c6d19 Fix consistency and a few docstrings warnings (#39314)
* Update modeling_deepseek_v2.py

* fix docstrings

* fix

* fix
2025-07-09 18:40:37 +02:00
4652677c89 🌐 [i18n-KO] Translated quark.md to Korean (#39268)
* initial translation

* removed english parts

* maintain consistency

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* add toctree

* fixed indentation

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-07-09 09:29:51 -07:00
c980904204 Add DeepSeek V2 Model into Transformers (#36400)
* add initial structure

* doc fixes, add model base logic

* update init files

* some fixes to config and modular

* some improvements for attention

* format

* remove unused attn

* some fixes for moe layer and for decoder

* adapt _compute_yarn_parameters for deepseek

* format

* small fix

* fix for decoder forward

* add tests, small refactoring

* fix dummies

* fix init

* fix doc

* fix config docs

* add sequce doc, fix init for gate

* fix issues in tests

* fix config doc

* remove unused args

* some fixes and refactoring after review

* fix doc for config

* small fixes for config args

* revert config refactoring

* small refactoring

* minor fixes after rebase

* small fix after merge

* fix modular

* remove rotaryembd from public init

* small test fix

* some rotary pos calculation improvement

* fix format

* some improvements and fixes

* fix config

* some refactoring

* adjust some unit tests

* skip test

* small fixes and tests adjustment

* reapply modular

* fix all tests except Integration

* fix integration testzs

* cleanup BC stuff

* rope

* fix integrations tests based on a10

* style

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-09 17:04:28 +02:00
accbd8e0fe [sliding window] revert and deprecate (#39301)
* bring back and deprecate

* oops

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-09 16:10:38 +02:00
1cefb5d788 [modular] Allow method with the same name in case of @property decorator (#39308)
* fix

* add example

* fix

* Update modular_model_converter.py
2025-07-09 15:46:53 +02:00
4798c05c64 skip test_torchscript_* for now until the majority of the community ask for it (#39307)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 15:35:48 +02:00
fe5f3c85d2 fix aria tests (#39277)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 13:49:33 +02:00
0687d481e2 [flash attn 3] bring back flags (#39294)
* flash attn 3 flag

* fix copies
2025-07-09 09:45:01 +02:00
25343aafee Fix SDPA attention precision issue in Qwen2.5-VL (#37363)
* solve conflicts and remove  redundant attention_mask in qwenvit

* update decoded text check

* remove trailing whitespace
2025-07-09 07:03:44 +02:00
0e1c281745 [Tests] Update model_id in AIMv2 Tests (#39281)
* Update model_id in tests

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 21:46:32 +02:00
7ef592c96c Update T5gemma (#39210)
* bug fix: add vocab_size to t5gemmaconfig for pipeline.

* Update checkpoint placeholder

* minor change

* minor change

* minor change: update example.

* fix: add vocab_size as an explict arg.

* buf fix:

remove vocab_size verification; instead, re-set encoder/decoder vocab size.

Note, in t5gemma, vocab size of encoder/decoder shoud be always the same.

* add `add_generation_prompt` for message preprocessing.
2025-07-08 19:08:48 +02:00
1ecd52e50a Add torchcodec in docstrings/tests for datasets 4.0 (#39156)
* fix dataset run_object_detection

* bump version

* keep same dataset actually

* torchcodec in docstrings and testing utils

* torchcodec in dockerfiles and requirements

* remove duplicate

* add torchocodec to all the remaining docker files

* fix tests

* support torchcodec in audio classification and ASR

* [commit to revert] build ci-dev images

* [commit to revert] trigger circleci

* [commit to revert] build ci-dev images

* fix

* fix modeling_hubert

* backward compatible run_object_detection

* revert ci trigger commits

* fix mono conversion and support torch tensor as input

* revert map_to_array docs + fix it

* revert mono

* nit in docstring

* style

* fix modular

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 17:06:12 +02:00
1255480fd2 [lightglue] add support for remote code DISK keypoint detector (#39253)
* feat: add trust_remote_code in LightGlueConfig

* fix: made sure trust_remote_code is provided only when necessary

* fix: make style

* docs: added missing trust_remote_code docstring

* refactor: refactored LightGlue config init

* fix: removed unnecessary argument
2025-07-08 15:03:04 +00:00
838a0268b8 fix flaky test_generate_compile_model_forward (#39276)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 15:36:05 +02:00
29d0030e23 Refactor PretrainedConfig.__init__ method to make it more explicit (#39158)
* cleanup

* fix no `__init__` test

* fix missing inits
2025-07-08 14:24:39 +01:00
1580f64653 [smollm3] add tokenizer mapping for smollm3 (#39271)
add tok mapping to smollm3
2025-07-08 10:44:01 +00:00
db05e4ff33 [pagged-attention] fix off-by-1 error in pagged attention generation (#39258)
* fix off-by-1 error in pagged attention generation

* formatting

* use update_with_token
2025-07-08 12:34:22 +02:00
6f1a43896c [CI] fix docs (#39273)
* fix docs

* add ko gloassary file to toctree
2025-07-08 11:31:03 +01:00
fbdaa7b099 Add Aimv2 model (#36625)
* Model skelton

* changes

* temp push

* changes

* Added support for aimv2-native

* More changes

* More changes

* Stupid mistake correction

* Added config and refactor

* Added vison model

* update

* Refactor for lit variant

* Added Text Model

* Minor fixes

* nits

* update

* Preliminary tests

* More fixes

* Updated tests 🤗

* Refactor

* Updated testcase

* Updated config

* make fixup

* more fixes

* Bug fix and updates

* deadcode

* Fixes

* nit

* up

* Happy CI 

* Reduce LOC

* nit

* nit

* make style

* return_dict refactor

* bug fix

* fix

* doc update

* nit

* make fixup

* Minor update

* _init_weigths modifcation

* update tests

* Minor fixes post review

* Update w.r.t GradientCheckpointingLayer

* docs update

* update

* nit

* Use more Modular 😉

* Change name from AIMv2 to Aimv2

* Nit

* make style

* Add model doc pointer

* make style

* Update model doc section

* updates

* Modify attn mask and interface

* update test

* Final change

* Utilize flash and flex attn

* keep attn mask

* camelcase model name in test file

* Fix docstring

* Fix config warning finally and create_causal_mask

* disable torchscript

* remove unused arg

* remove from tests

* balance model size for tests

* fix device

* tests

* tests

* flaky test

* fix import

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-08 11:53:21 +02:00
d8590b4b0c Add Doge model (#35891)
* Add Doge Model

* Fix code quality

* Rollback an error commit

* Fix config for open-source weights

* Revert "Fix config for open-source weights"

This reverts commit 229cdcac10a6a4274d1dd13b729bc14c98eb0c76.

* Add modular_doge

* Update Doge inherits from Llama

* Fix import bug

* [docs] Add usage of doge model

* Fix Doge import pretrainedconfig from modeling_utils to configuration_utils

* [docs] remove trust remote code from doge

* Fix dynamo bug in doge model

* Update docstrings

* Import apply_rotary_pos_emb and repeat_kv from Llama

* Fix all nits

* Fix code quality

* Fix some bugs

* Fix code quality

* Remove inherited `_update_causal_mask` from Llama
This leads to incorrect weight initialization.

* Fix the wrong tensor orderings in DogeCDMoE

* Fix attention mask bug
We have to provide attention_mask for dynamic mask computation

* Modify most implementations to inherit from Llama
But there are two problems:
1. `flex_attention_forward` is not updated properly
2. `Example` error in the forward method of DogeForCausalLM

* Modify CDMoE for batch efficient implementation

* Uniform MoE configuration names, just like QwenMoE

* Fix code quality

* Fix code quality

* Fix code quality

* Add tp plan of CDMoE Module

* Hybird DMA with sliding window

* Update valid tokens greater than window size

* Fix code quality

* Add `convert_doge_weights_to_hf`

* Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py

* Fix nits in modular_doge

* Fix code quality

* Fix all nits

* Fix all nits

* Make sure the attention function is updated inside the class

* Fix code quality issues in the Doge model and add a test for it

* Fix `test_generate`

* Fix code quality

* Fix nits fllowing suggestions

* Fix code quality

* Fix code quality issues

* Fix nits

* Fix code quality nits

* Fix the missing parameters in the configuration.

* Fix the missing parameters in the configuration.

* Fix nits

* Add initialization of attention

* Fix last nits

* Simplify dynamic mask generation logic

* Rename router_logits to gate_logits for matching latest changes of MixtralModel

* Rename typings for matching latest changes of MixtralModel

* Fixes typo in comment

* Update src/transformers/models/doge/modular_doge.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix code quality issues to match other modular

* Fix code quality issues to match other modular

* Fix the static compilation errors

* Update model weights link

* Fix code quality issues to match other modular

* reapply modular and support for new outputs

* style

* simplify a lot

* fix import location

* reapply modular

* fix

* fix integration test

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-08 11:44:29 +02:00
d370bc64c6 Fix errors when use verl to train GLM4.1v model (#39199)
* Fix errors when use verl to train GLM4.1v model

* Support glm4v load from AutoModelForVision2Seq
* Set glm4v model _checkpoint_conversion_mapping attr from None to {}

* Update modeling_auto.py
2025-07-08 09:39:31 +00:00
5fb8bb3e1a fix recompiles due to instance key, and deepcopy issues (#39270)
* fix recompiles due to instance key, and deepcopy issues

* dict
2025-07-08 11:38:11 +02:00
356fd68109 fix(generation): stop beam search per-instance when heuristic satisfied (#38778)
* fix(decoding): stop beam search per-instance when heuristic satisfied

Previously, when early_stopping is set to `False`, the early-stopping heuristic only halted generation when **all** batch instances reached the criterion. This caused instances that are impossible (suggested by the heuristic) to improve keep generating, leading to inconsistent and overlong outputs across the batch.

Now we apply the heuristic **per-instance**: once a certain instance of batch has its all beams impossibe to improve, we mark that instance finished while letting others continue. This restores expected behavior and ensures consistency in batched generation.

* Add test case GenerationIntegrationTests.test_beam_search_early_stop_heuristic

* Update naming improvement_possibility -> is_early_stop_heuristic_unsatisfied

* Add comments for early stop heuristic

* Update src/transformers/generation/utils.py

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-08 08:59:37 +00:00
0b0ede8b2b remove broken block (#39255)
* remove broken block

* fixup
2025-07-08 10:41:44 +02:00
a21557fa3e Skip test_eager_matches sdpa generate and update an integration test for blip-like models (#39248)
* skip

* skip

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 10:38:25 +02:00
ea3c2c0277 Fix license text, duplicate assignment, and typo in constant names (#39250)
- Complete Apache License text in Italian documentation
- Remove duplicate variable assignment in Perceiver converter
- Fix typo in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES constant
2025-07-08 10:20:52 +02:00
b2816da802 fix xpu failures on PT 2.7 and 2.8 w/o IPEX and enable hqq cases on XPU (#39187)
* chameleon xpu bnb groundtruth update on bnb triton backend since we are
deprecating ipex backend

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enable hqq uts on XPU, all passed

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix comment

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-07-08 10:18:26 +02:00
17b3c96c00 Glm 4 doc (#39247)
* update the glm4 model readme

* update test

* update GLM-4.1V model

* update as format

* update

* fix some tests

* fix the rest

* fix on a10, not t4

* nit: dummy import

---------

Co-authored-by: raushan <raushan@huggingface.co>
2025-07-08 08:22:04 +02:00
bbca9782ca Update LED model card (#39233)
* Update LED model card

* Remove extra arguments

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-07 15:56:57 -07:00
41e865bb8d fix some flaky tests in tests/generation/test_utils.py (#39254)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 19:49:41 +02:00
93747d89ea Simplify Mixtral and its modular children (#39252)
* simplify mixtral a lot

* fix

* other moes

* mixtral

* qwen3

* back

* Update modular_qwen3_moe.py
2025-07-07 19:40:41 +02:00
3993ee1e98 Add segmentation_maps support to MobileNetV2ImageProcessor (#37312)
* Add `segmentation_maps` support to mobilenet_v2 image processor and `reduce_labels` to mobilevit

* Changed mobilenetv2 tests to support fastimageprocessor

* added `segmentation_maps` support to fast image processor

* reverted to upstream/main

* Add optional

* Use autodocstring

* Changed docs

* Docs fix

* Changed fp to match beit fp

* Change typing imports

* Fixed repo inconsistency

* Added fast-slow equivalence tests

* Removed unnecessary call

* Add `reduce_labels` to Mobilevit fast processor

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-07 13:34:59 -04:00
b96f213fcf Clarify per_device_train_batch_size scaling in TrainingArguments (#38… (#38857)
Clarify global batch size calculation in TrainingArguments (#38484)
2025-07-07 16:57:42 +00:00
9698052560 Add Korean translation for glossary.md (#38804)
* Add Korean translation for glossary.md

* Update docs/source/ko/glossary.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Joosun40 <77312900+Joosun40@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-07-07 09:12:55 -07:00
bf203aa9da Update tiny-agents example (#39245) 2025-07-07 15:58:36 +02:00
c4e39ee59c adjust input and output texts for test_modeling_recurrent_gemma.py (#39190)
* adjust input and output texts for test_modeling_recurrent_gemma.py

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix bug

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update Expectation match

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 15:13:25 +02:00
14cba7ad33 enable xpu on kv-cache and hqq doc (#39246)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-07 13:12:02 +00:00
32db48db73 Fix patch helper (#39216)
remove -1
2025-07-07 15:11:48 +02:00
a3618d485a RotaryEmbeddings change is not None -> isinstance(..., dict) (#39145)
is None -> isinstance dict
2025-07-07 14:05:28 +01:00
9b09fe479f fix fastspeech2_conformer tests (#39229)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 15:04:26 +02:00
00e9efceab [bugfix] fix flash attention 2 unavailable error on Ascend NPU (#39166)
[bugfix] fix flash attention 2 error on Ascend NPU
2025-07-07 13:03:39 +00:00
056fa73fae [modular] Simplify logic and docstring handling (#39185)
* simplify a lot

* Update modular_model_converter.py

* finalize

* remove outdated functions

* apply it

* and examples
2025-07-07 14:52:57 +02:00
f16fbfb89a Make _compute_dynamic_ntk_parameters exportable (#39171)
* Make _compute_dynamic_ntk_parameters exportable

* add unit test
2025-07-07 14:48:31 +02:00
4243bb844d fix bug using FSDP V1 will lead to model device not properly set (#39177)
* fix bug using FSDP V1 will lead to model device not properly set

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update the code

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-07-07 14:47:04 +02:00
34c16167eb Don't send new comment if the previous one is less than 30 minutes (unless the content is changed) (#39170)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 14:43:50 +02:00
b8f397e456 fix typo in Gemma3n notes (#39196) 2025-07-07 14:41:33 +02:00
5348fbc005 [modular] Follow global indexing and attribute setting, and their dependencies (#39180)
* export global indexing statements

* add example

* style

* examples
2025-07-07 14:36:43 +02:00
8570bc29f3 Fix missing fast tokenizer/image_processor in whisper/qwen2.5-omni processor (#39244)
* fix missing fast tokenizer in whisper processor

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix processor test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix qwen2.5 omni processor

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-07 13:54:18 +02:00
b283d52f7f [vjepa2] replace einsum with unsqueeze (#39234) 2025-07-07 11:14:08 +01:00
a325409a50 Expectations re-order and corrected FA3 skip (#39195)
* Fix Expectations and a FA3 skip

* Fixed docstring

* Added context for Default expectation
2025-07-07 11:42:33 +02:00
b0a8e0b8d7 [video processors] Support float fps for precise frame sampling (#39134)
* [video processors] Support float fps for precise frame sampling

Enable fractional fps values (e.g., 1.5, 29.97) in video processors
for more precise frame sampling control.

- Change fps type from int to float across all video processors
- Maintain backward compatibility with integer values

Extends: #38105

* [video processors] Refine fps typing to Union[int, float]

Change fps type from Optional[float] to Optional[Union[int, float]]
for more explicit type information about supporting both integer
and floating-point frame rates.

- Update type hints and docstrings across 8 files
- Maintain backward compatibility
- Clarify support for both int and float values

Extends: #38105

* Revert "[video processors] Support float fps for precise frame sampling"

This reverts commit 7360d6e661b413ca0239e5ef61f9b1abbeab8e65.
2025-07-07 03:43:43 +00:00
ca7e1a3756 Refactor the way we handle outputs for new llamas and new models (#39120)
* just update 2 files

* update other models as well just making fix-copies

* also add the changes needed to modeling utils

* put this on the pretrained model instead

* nits and fixes

* update generic, fix to use config value

* update other modelings

* use transformers kwargs instead

* update

* update

* update other models

* update

* updates

* update

* update

* update

* fix

* finally

* very small nits

* this fixes more tests

* fix other models as well!

* update modularqwen2

* update models based on qwen2

* update

* update

* remove the **flash stuff in favor of noraml kwargs

* update

* propagate gemma?

* remove output attentions

* propagate

* support cross attention edge case

* same

* test this

* fixes

* more fix

* update

* update

* fix conflicts

* update

* fix emu3

* fix emu3

* move the fix a bit

* quel enfer

* some fixes, loss_kwargs should never had been

* finish fixing gemma3n

* fix small lm3

* fix another one

* fix csm now

* fux csm and mistral

* fix mistral now

* small fixes

* fix janusss

* only for some models

* fixup

* phix phi3

* more fixes?

* dose this fix it?

* update

* holy shit it was just graph breaks

* protect torch

* updates

* fix samhq?

* fix moonshine

* more moonshine fixes, 3 failures left!

* nits

* generic needs to support more

* more fixes to moonshine!

* fix cross attention outputs!

* fix csm!

* nits

* fix stupid kosmos2

* current updates

* fixes

* use output recorder?

* nicer!

* a little bit of magic

* update

* fix protect

* fix

* small fixes

* protect import

* fix a bunch of more models

* fix fixups

* fix some of the last ones

* nit

* partly fix phi

* update

* fix import path

* make something that is fullgraph compatible just to be sure

* typing was wrong on llama so the rest was wrong as well

* fucking ugly but at least it is still exportable

* syle

* supposed to fix moonshine, it still breaks

* fix some default

* fix the last bits of sam

* update samhq

* more fixes to am hq

* nit

* fix all output+hidden states and output_attentions!

* fix?

* fix diffllama

* updates to fix initialization on the sam pips

* ups there was a bug

* fix the last sam hq test

* fix gotocr

* fix gotocr2!

* fixes

* skip stupid tests

* there was one left :)

* fixup

* fix fix copies issues with this test file

* fix copies for sam_hq

* rm some comments

* skip 2 more failing tests

* fix

* fix everything

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* add more doc!

* fix public init

* fix modular qwen3

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-07-05 11:34:28 +02:00
e6a8063ef1 Update expected values (after switching to A10) - part 8 - Final (#39220)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-04 13:35:53 +02:00
cd8a041a4f Update expected values (after switching to A10) - part 7 (#39218)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-04 12:48:10 +02:00
0cf27916f0 Add packed tensor format support for flex/sdpa/eager through the mask! (#39194)
* Add the necesary logic to mask_utils

* add it everywhere

* Update masking_utils.py

* style

* Update masking_utils.py

* Update modeling_mimi.py

* Update masking_utils.py

* add support for more than batch size 1

* Update masking_utils.py

* add test

* style

* Update test_masking_utils.py

* Update masking_utils.py

* add require_token

* fix tests

* fix
2025-07-04 09:01:56 +02:00
037755ed54 Update expected values (after switching to A10) - part 6 (#39207)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 22:45:30 +02:00
1168f57abf Update expected values (after switching to A10) - part 5 (#39205)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 19:56:02 +02:00
7d9e52f376 Fix continuous batching in transformers serve (#39149)
* Fix CB

* Nit

* Update src/transformers/commands/serving.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add todos

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-03 18:15:31 +02:00
85d93cc6e3 [serve] Cursor support, move docs into separate page, add more examples (#39133)
* jan docs

* rm

* [cursor] tmp commit

* Cursor working :D

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/commands/serving.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* cursor docs

* try to fix agents/tools docs?

* try to fix agents/tools docs?

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* add transformers chat example with transformers serve

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-07-03 17:04:16 +01:00
e15b06d8dc [typing] better return typehints for from_pretrained (#39184)
* config

* processor

* feature-extractor

* jukebox

* fixup

* update other methods in config

* remove "PretrainedConfig" annotations
2025-07-03 14:22:47 +00:00
a25fc3592e Update expected values (after switching to A10) - part 4 (#39189)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 15:13:06 +02:00
b31e9d19a6 [Dia] Change ckpt path in docs (#39181)
fix ckpt path
2025-07-03 10:02:58 +00:00
18e0cae207 Fix many HPU failures in the CI (#39066)
* more torch.hpu patches

* increase top_k because it results in flaky behavior when Tempreture, TopP and TopK are used together, which ends up killing beams early.

* remove temporal fix

* fix scatter operation when input and src are the same

* trigger

* fix and reduce

* skip finding batch size as it makes the hpu go loco

* fix fsdp (yay all are passing)

* fix checking equal nan values

* style

* remove models list

* order

* rename to cuda_extensions

* Update src/transformers/trainer.py
2025-07-03 11:17:27 +02:00
bff964c429 Decouple device_map='auto' and tp_plan='auto' (#38942)
* dissociate

* better place

* fix
2025-07-03 11:07:11 +02:00
8178c43112 when delaying optimizer creation only prepare the model (#39152) 2025-07-03 09:04:16 +02:00
91221da2f1 [glm4v] fix video inference (#39174)
fix video inference
2025-07-03 05:20:41 +00:00
ebfbcd42da Test fixes for Aria (and some Expectation for llava_next_video) (#39131)
* Expectations for llava_next_video

* Updated image src in aria

* Fix test_small_model_integration_test

* Fix small model integration llama

* Fix a bunch of tests

* Style

* Shortened generation in test from 900 to 90
2025-07-02 23:41:14 +02:00
37a239ca50 Update expected values (after switching to A10) - part 3 (#39179)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-02 22:48:30 +02:00
9326fc332d Update expected values (after switching to A10) - part 2 (#39165)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* empty

* [skip ci]

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-02 22:47:55 +02:00
25cd65ac43 Random serve fixes (#39176)
* Fix index out of bounds exception on wrong kv reuse

* Prevent loading same model twice

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2025-07-02 22:09:58 +02:00
548794b886 [serve] Model name or path should be required (#39178)
* Model name or path should be required

* Fix + add tests

* Change print to log so it doesn't display in transformers chat
2025-07-02 22:06:47 +02:00
2d561713f8 [generate] document non-canonical beam search default behavior (#39000) 2025-07-02 18:29:16 +01:00
df12d87d18 [docs] ViTPose (#38630)
* vitpose

* fix?

* fix?

* feedback

* fix

* feedback

* feedback

* update sample image
2025-07-02 07:56:29 -07:00
2b4a12b5bf Reduce Glm4v model test size significantly (#39173)
* fix test size

* Update test_modeling_glm4v.py
2025-07-02 15:55:05 +02:00
e355c0a11c Fix missing initializations for models created in 2024 (#38987)
* fix GroundingDino

* fix SuperGlue

* fix GroundingDino

* fix MambaModel

* fix OmDetTurbo

* fix SegGpt

* fix Qwen2Audio

* fix Mamba2

* fix DabDetr

* fix Dac

* fix FalconMamba

* skip timm initialization

* fix Encodec and MusicgenMelody

* fix Musicgen

* skip timm initialization test

* fix OmDetTurbo

* clean the code

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* add reviewed changes

* add back timm

* style

* better check for parametrizations

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-02 15:03:57 +02:00
1125513a8d Blip2 fixes (#39080)
* Fixed some devices errors

* Fixed other device issues and more expectations

* Reverted support flags

* style

* More granular support

* Fixed some rebase stuff

* add a not None check before .to
2025-07-02 14:39:39 +02:00
28df7f854a Fix multimodal processor get duplicate arguments when receive kwargs for initialization (#39125)
* fix processor tokenizer override

Signed-off-by: Isotr0py <2037008807@qq.com>

* code format

Signed-off-by: Isotr0py <2037008807@qq.com>

* add regression test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* check image processor same

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-02 19:57:15 +08:00
b61023a1b7 🚨🚨🚨 [eomt] make EoMT compatible with pipeline (#39122)
* Make EoMT compatible with pipeline

* Implicit patch offsets

* remove patch offsets from arg

* Modify tests

* Update example

* fix proc testcase

* Add few more args

* add pipeline test suite

* fix

* docstring fixes

* add pipeline test

* changes w.r.t review

* 🙈 MB

* should fix device mismatch

* debug

* Fixes device mismatch

* use decorator

* we can split mlp

* expected values update

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2025-07-02 12:25:26 +01:00
4d5822e65d [smolvlm] fix video inference (#39147)
* fix smolvlm

* better do as before, set sampling params in overwritten `apply_chat_template`

* style

* update with `setdefault`
2025-07-02 12:05:10 +02:00
9b2f5b66d8 fix default value of config to match checkpionts in LLaVa-OV models (#39163) 2025-07-02 09:45:50 +00:00
e8e0c76162 Add activation sparsity reference in gemma3n doc (#39160)
Add activation sparsity reference in the description of gemma3n
2025-07-02 04:11:03 +02:00
8e87adc45f fix llama tests (#39161)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 23:27:22 +02:00
4c1715b610 Update expected values (after switching to A10) (#39157)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* empty

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 20:54:31 +02:00
ab59cc27fe Suggest jobs to use in run-slow (#39100)
* pr

* pr

* pr

* pr

* pr

* pr

* pr

* pr

* pr

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 20:19:06 +02:00
db2f535443 update bnb ground truth (#39117)
* update bnb resulte

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* set seed to avoid sampling different results

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix int8 tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add comments

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-01 20:06:37 +02:00
260846efad fix: remove undefined variable (#39146) 2025-07-01 19:10:29 +02:00
cdfe49a4d0 Change @lru_cache() to @lru_cache to match styles from #38883. (#39093)
Match styles in #38883
2025-07-01 18:29:16 +02:00
f46798193e Fix: Ensure wandb logs config in offline mode (#38992)
* Fix: Ensure wandb logs config in offline mode

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-07-01 16:17:58 +00:00
fe838d6631 Fix missing fsdp & trainer jobs in daily CI (#39153)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 18:10:30 +02:00
1283877571 [superglue] fix wrong concatenation which made batching results wrong (#38850) 2025-07-01 12:14:44 +00:00
f8b88866f5 [VLMs] support passing embeds along with pixels (#38467)
* VLMs can work with embeds now

* update more models

* fix tests

* fix copies

* fixup

* fix

* style

* unskip tests

* fix copies

* fix tests

* style

* omni modality models

* qwen models had extra indentation

* fix some other tests

* fix copies

* fix test last time

* unrelated changes revert

* we can't rely only on embeds

* delete file

* de-flake mistral3

* fix qwen models

* fix style

* fix tests

* fix copies

* deflake the test

* modular reverted by fixes, fix again

* flaky test, overwritten

* fix copies

* style
2025-07-01 11:33:20 +00:00
20901f1d68 [typing] LlamaAttention return typehint (#38998)
* helo llama

* helo llama

* helo llama

* apply modular

* fix dia

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-07-01 11:29:52 +01:00
7a25f8dfdb [qwen2-vl] fix FA2 inference (#39121)
* fix FA2

* update is causal flag and remove mask for FA2

* update for FA2 with varlen path

* how the tests were passing with different devices?

* add comment and ref to the PR

* move mask preparation to base pretrained model

* seq len is the first dim, not second

* fix copies to fix GLM4V
2025-07-01 10:18:37 +00:00
def9663239 feat: support indivisible shards for TP model loading and TPlizing. (#37220)
* feat: support uneven loading and sharding
resolve merge conflicts
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: allow for empty tensor computations

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* test: add llama1b test case

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* due to q_proj colwise it has to be multi of 2

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-07-01 10:03:22 +00:00
06c4a4d499 fix caching_allocator_warmup with tie weights (#39070)
* fix caching_allocator_warmup with tie weights

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-01 11:32:20 +02:00
e435574721 🚨 Don't use cache in non-generative models (#38751)
* deprecate for 1 version

* style

* fix some tests

* fix esm

* skip for now, GC requires positional args but we have keyword args

* remove transpose for scores in modified models only

* skip fx trace tests
2025-07-01 09:08:21 +00:00
dbc98328da Several fixes for Gemma3n (#39135)
* remove the skips

* fix the epsilon to a small value (does not make sense otherwise)

* safeguard

* overload test_eager_matches_sdpa

* Update test_modeling_common.py

* skip appropriate tests

* correct no_split_layer

* fix all devices issue

* fix backward

* fix
2025-07-01 10:34:53 +02:00
d53518c5f2 Fix key mapping for VLMs (#39029)
* fix key mapping for VLMs

* use __mro__ instead

* update key mapping in save_pretrained
2025-07-01 09:47:53 +02:00
3457e8e73e [Whisper] update token timestamps tests (#39126)
* fixes

* update comment

* update for A10

* all a10

* all a10

* all a10

* all a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-30 21:55:36 +02:00
fe35eca7bd Update BigBirdPegasus model card (#39104)
* Update igbird_pegasus.md

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-30 10:42:56 -07:00
29a3f5ed8c switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8 (#39024)
* switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update docs/source/en/perf_infer_gpu_multi.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update perf_infer_gpu_multi.md

* Update perf_infer_gpu_multi.md

* Update perf_infer_gpu_multi.md

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-30 08:54:05 -07:00
9e0c865b8b docs: correct two typos in awesome-transformers.md (#39102)
* docs(awesome-projects): fix typo “Itt leverages” → “It leverages” (#39101)

closes #39101

* docs(awesome-projects): fix grammar “We provides” → “We provide” (#39101)

closes #39101
2025-06-30 08:53:43 -07:00
03db2700ab Enable XPU doc (#38929)
* fix example with dataset

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device type

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert torchao change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert torchao change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update xpu torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update chat_templating_multimodal.md

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* use full name for int8

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert int8 title

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-30 07:56:55 -07:00
ea0ea392e5 Fix chat (#39128) 2025-06-30 13:47:48 +00:00
ed36f8490e Licenses (#39127)
* Licenses

* Licenses
2025-06-30 15:25:36 +02:00
e8f90b5397 Split transformers chat and transformers serve (#38443)
* Next token

* Split chat and serve

* Support both generation methods

* Style

* Generation Config

* temp

* temp

* Finalize serving.py

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>

* Finalize chat.py

* Update src/transformers/commands/serving.py

Co-authored-by: célina <hanouticelina@gmail.com>

* Lucain's comments

Co-authored-by: Lucain <lucain@huggingface.co>

* Update

* Last comments on PR

* Better error handling

* Better error handling

* CI errors

* CI errors

* Add tests

* Fix tests

* Fix tests

* [chat] Split chat/serve (built on top of lysandre's PR) (#39031)

* Next token

* Split chat and serve

* Support both generation methods

* Style

* Generation Config

* temp

* temp

* Finalize serving.py

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>

* Finalize chat.py

* Update src/transformers/commands/serving.py

Co-authored-by: célina <hanouticelina@gmail.com>

* Lucain's comments

Co-authored-by: Lucain <lucain@huggingface.co>

* Update

* Last comments on PR

* Better error handling

* Better error handling

* CI errors

* CI errors

* Add tests

* Fix tests

* Fix tests

* streaming tool call

* abstract tool state; set tool start as eos

* todos

* server working on models without tools

* rm chat's deprecated flags

* chat defaults

* kv cache persists across calls

* add server docs

* link

* Update src/transformers/commands/serving.py

* Apply suggestions from code review

* i love merge conflicts

* solve multi turn with tiny-agents

* On the fly switching of the models

* Remove required positional arg

---------

Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>

* Protect names

* Fix tests

---------

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-06-30 15:10:53 +02:00
539c6c2fa8 All CI jobs with A10 (#39119)
all a10

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-30 14:23:27 +02:00
ed9f252608 docs: Gemma 3n audio encoder (#39087)
Updating Gemma 3n docs and docstrings to clarify the relationship
between the newly trained audio encoder used in Gemma 3n and the USM
model from the original paper.
2025-06-30 14:10:51 +02:00
4a79bf947d Fix some bug for finetune and batch infer For GLM-4.1V (#39090)
* update

* 1
2025-06-30 12:16:22 +02:00
2100ee6545 fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8 (#39116)
* fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* zamba2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* internvl

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* tp cases

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-30 11:49:03 +02:00
ccf2ca162e skip some test_sdpa_can_dispatch_on_flash (#39092)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 23:08:14 +02:00
a11f692895 Fixes the failing test test_is_split_into_words in test_pipelines_token_classification.py (#39079)
* Fix test pipelines token classification for is_split_into_words

* Fix incorrect import format
2025-06-27 19:25:32 +01:00
18143c76bf Sandeepyadav1478/2025 06 19 deberta v2 model card update (#38895)
* [docs]: update deberta-v2.md model card

* chore: req updates

* chore: address code review feedback and update docs

* chore: review feedback and updates

* chore: model selection updates

* chores: quantizations review updates
2025-06-27 10:35:30 -07:00
02a769b058 [fix] Add FastSpeech2ConformerWithHifiGan (#38207)
* add to mapping

* oops

* oops

* add to config_mapping_names

* revert

* fix?

* config-mapping-names

* fix?

* fix?
2025-06-27 09:38:21 -07:00
c2dc72bb5f TST Fix PEFT integration test bitsandbytes config (#39082)
TST Fix PEFT integration test bitsandbytes config

The PEFT integration tests still used load_in_{4,8}_bit, which is
deprecated, moving to properly setting BitsAndBytesConfig. For 4bit,
also ensure that nf4 is being used to prevent

> RuntimeError: quant_type must be nf4 on CPU, got fp4
2025-06-27 18:33:11 +02:00
c8064bea9a Fix: unprotected import of tp plugin (#39083) 2025-06-27 17:28:05 +02:00
dd7dc4a4a2 Add Fast Image Processor for Chameleon (#37140)
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-27 15:26:57 +00:00
6d773fc3bc fix dots1 tests (#39088)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 16:54:11 +02:00
c8764ab935 guard torch distributed check (#39057)
* guard torch distributed check

* Update src/transformers/pipelines/base.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-06-27 14:49:47 +00:00
49d9fd49bd Add Fast Image Processor for mobileViT (#37143)
* Add image_processing_mobilevit_fast.py

* Fix copies

* update _preprocess for channel_flip

* Update for batched image processing

* Resolve merge conflicts with main

* Fix import order and remove trailing whitespace (ruff clean-up)

* Fix copy inconsistencies

* Add NotImplementedError for post_process_semantic_segmentation to satisfy repo checks

* Add auto_docstring

* Adjust style

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Delete not used function

* test: add missing tests for  and

* Add post_process_semantic_segmentation to mobilevit_fast.py

* Add preprocess function to image_processing_mobilebit_fast.py

* ruff check for formatting

* fix: modify preprocess method to handle BatchFeature correctly

* Remove logic for default value assignment

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove normalization adn RGB conversion logic not used in slow processor

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Simplify return_tensors logic using one-liner conditional expression

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove unused normalization and format parameters

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* add **kwargs and remove default values in _preprocess

* add slow_fast equivalence tests for segmentation

* style: autoformat code with ruff

* Fix slow_fast equivalence test

* merge + remove skipped test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-27 14:40:24 +00:00
4336ecd1ea add fast image processor nougat (#37661)
* add fast image processor nougat

* test fixes

* docstring white space

* last fixes

* docstring_type

* tolerance unit test

* fix tolerance

* fix rtol

* remove traling white space

* remove white space

* note for tolerance unit test

* fix tests

* remove print

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-06-27 14:39:43 +00:00
0c35280e58 TST PEFT integration tests with pipeline generate (#39086)
Some PEFT integration tests involving text generation pipelines were
failing since #38129 because the base model is too small to generate
longer sequences. Setting max_new_tokens fixes this.
2025-06-27 15:58:10 +02:00
993665a5ff fixed typo for docstring in prepare_inputs method (#39071) 2025-06-27 13:57:56 +00:00
839893c86b fix mistral3 tests (#38989)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 15:44:10 +02:00
2b85b6ce19 [Whisper] 🚨 Fix pipeline word timestamp: timestamp token is end of token time !!! (#36632)
* timestamp token is end of token time !!!

* ensure correct alignment between tokens and timestamp tokens

* ignore input tokens for DTW computation

* use num_frames to avoid token timestamp hallucinations

* token timestamps test updates !

* num_frames: deprecate and use attention_mask instead

* avoid breaking change

* fix the pipeline usage for chunk approach

* make style

* better logging

* better logging

* make style

* update tests with correct values
2025-06-27 12:51:43 +00:00
9c8d3a70b8 Pipeline: fix unnecessary warnings (#35753)
* return attention mask

* use correct model input name

* fix

* make
2025-06-27 14:32:03 +02:00
1750c518dd Add EoMT Model || 🚨 Fix Mask2Former loss calculation (#37610)
* Initial Commit

* up

* More changes

* up

* Only mask_logits mismatch

* close enough logits debug later

* fixes

* format

* Add dummy loss

* Close enough processing for semantic seg

* nit

* Added panoptic postprocessor

* refactor

* refactor

* finally fixed panoptic postprocessor

* temp update

* Refactor ForUniversalSegmentation class

* nits and config update

* Few fixes and inference matches

* change mapping

* Added training support but loss slightly off 🥲

* Loss is matching 😀

* update

* Initial tests skelton

* changes

* tests update

* more modular

* initial tests

* updates

* better docstrings

* changes

* proc tests passing :)

* Image processor update

* tiny change

* QOL changes

* Update test w.r.t latest attn refactor

* repo-consistency fixes

* up

* Image proc fix and integration tests :)

* docs update

* integration tests

* fix

* docs update 🥰

* minor fix

* Happy CI

* fix

* obvious refactoring

* refactoring w.r.t review

* Add fask image proc skelton

* Fast Image proc and cleanups

* Use more modular

* tests update

* Add more tests

* Nit

* QOL updates

* change init_weights to torch default

* add eager func coz of make style

* up

* changes

* typo fix

* Updates

* More deterministic tests

* More modular

* go more modular 🚀

* up

* dump

* add supprot for giant ckpts

* overhaul

* modular

* refactor

* instace seg is ready

* cleanup

* forgot this

* docs cleanup

* minor changes

* EoMT - > Eomt

* Happy CI

* remove redundant comment

* Change model references

* final change

* check annealing per block

* My other PR changes 😂

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-27 14:18:18 +02:00
0106a50a6b fix a bunch of XPU UT failures on stock PyTorch 2.7 and 2.8 (#39069)
* fix a bunch of XPU UT failures on stock PyTorch 2.7 and 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* qwen3

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* quanto

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* models

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* idefics2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-27 14:01:53 +02:00
cb17103bd5 Uninstallling Flash attention from quantization docker (#39078)
* update

* revert
2025-06-27 13:51:46 +02:00
371c471113 Fix initialization of OneFormer (#38901)
* fix initialization of OneFormer

* remove redundant initializations

* remove redundant initializations

* remove redundant initializations

* keep BC
2025-06-27 12:39:37 +02:00
540a10848c fix Gemma3nProcessorTest (#39068)
* fix

* fix

* oups forgot style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-27 12:28:10 +02:00
0d66ef7792 Cleanup Attention class for Siglip and dependent models (#39040)
* cleanup attention class

* More models

* more models

* Changes

* make style

* Should fix CI

* This should work 🙏
2025-06-27 12:14:09 +02:00
1ccc73dee9 [Whisper] fix shape mismatch in tests (#39074)
fix shape mismatch
2025-06-27 09:27:42 +00:00
a52478253b [docs] Tensor parallelism (#38241)
* updates

* feedback

* badges

* fix?

* fix?

* fix?

* fix?
2025-06-26 14:40:45 -07:00
84e8696cae [docs] @auto_docstring (#39011)
* refactor

* feedback
2025-06-26 14:21:54 -07:00
018855de63 Update PEGASUS-X model card (#38971)
* Update PEGASUS-X model card

* Add cache_implementation argument in quantization code example

* Update CLI example

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Remove TensorFlow and Flax badges

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-26 13:54:48 -07:00
757c26fb40 [docs] Model contribution (#38995)
improve
2025-06-26 12:25:14 -07:00
b372bb5ed1 fix layoutlmv3 tests (#39050)
* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 20:07:17 +02:00
f171e7e884 Update SuperPoint model card (#38896)
* docs: first draft to more standard SuperPoint documentation

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs: reverted changes on Auto classes

* docs: addressed the rest of the comments

* docs: remove outdated reference to keypoint detection task guide in SuperPoint documentation

* Update superpoint.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-26 10:13:06 -07:00
2f50230c59 fix t5gemma tests (#39052)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 18:48:14 +02:00
23b7e73f05 fix test_compare_unprocessed_logit_scores (#39053)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 18:36:56 +02:00
58c7689226 [Flex Attn] Fix torch 2.5.1 incompatibilities (#37406)
* remove compile on mask creation, ensure kv blocks do not explode on indices

* trigger ci

* switch dynamic compilation to false

* patch new masking functions as well

* add len check

* i was wrong

* last comment
2025-06-26 18:23:55 +02:00
5154497607 Dev version 2025-06-26 18:04:36 +02:00
0a8081b03d [Modeling] Fix encoder CPU offloading for whisper (#38994)
* fix cpu offloading for whisper

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* unskip offloading tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert small change

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-06-26 15:56:33 +00:00
c63cfd6a83 Gemma 3n (#39059)
* Gemma 3n

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3p5RMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3)

* Adding gemma3p5 text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* regenerating modeling file after syncing to HEAD

* Use torch.std(..., unbiased=False) for activation sparsity (#8)

* Refactoring to a single QVK Norm (#13)

* AltUp: support scale_corrected_output (#14)

* Converts einsums to nn.Linear (#7)

* Converts einsums to nn.Linear

* Removing unused variables

* Aligning SharedKVCache with HybridCache (#11)

* Alinging SharedKVStore with HybridCache

* Remove KVStore. Refactor apply_rotary_pos_emb for sharing

* Addressing review comments

* Supporting split modality embeddings in Gemma3n (#10)

* Adding the Embedder class

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation

* Apply suggestions from code review

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, prop drilling audio and vision configs to the text config

* Removing TODO's that have been addressed

* Simplify Embedder init and add audio embeddings

* Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder

* Refactoring vision and audio embeddings into ConditionalGeneration model

---------

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating attention mask for Gemma 3.5 (#15)

* xxx_token_index to xxx_token_id

* remvoing deprecated last_cache_position

* Removing references to SigLIP

* Always init per-layer inputs

* Using torch.finfo().min for epsilon_tensor

* Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas

* fix modular GEMMA3N_INPUTS_DOCSTRING

* Gemma3nAttention inherits from Gemma3Attention

* Modular inheritance fixes

* CausalLM conversion script for 4B model (#16)

* Add Gemma3n Audio Encoder (#6)

* initial commit of Gemma 3.5 scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3n overall and text config with vision and audio config placeholders (#3)

* Adding gemma3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3.5 (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right Gemma 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* CausalLM conversion script for 4B model

* inv_timescales to non-persistent buffer

* Addressing audio encoder Attention feedback

* Addressing Gemma3nAudioSSCPConvBlock feedback

* Addressing Gemma3nAudioConformerAttention feedback

* Addressing padding feedback

* Weights conversion loads audio state dict

* Always use vision_config so saving works

* Token id updates for configs

* Stubs for interleaving audio embs

* Addressing reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Fixing cache access error

* Removing duplicate code from a bad merge

* Gemma 3n Text + Vision Part 1 (#17)

* testing utilities for numerics comparisons

* Corrected einsum to nn.Linear weights conversion

* Inherit scaled word embs from Gemma3 not Bart

* Fixing transposes for collapsed linears

* More transpose fixes

* numpy api fix

* RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True

* Force AltUp  to float32

* Updating debugging script for AudioEncoder debugging

* Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs

* Correcting attention einsum conversions

* RMSNorm in type of x

* Fixing douplicate laurel norm/gating

* KV sharing using the right previous indices

* Refactor kv shared index computation. Correct frac_shared_layers

* Use num_shared_layers instead of inferring from a fraction

* fixing a bug for logging

* Fix shared data_ptrs in altup inits

* rope: adjust proj -> norm -> rope to preserve computation (#20)

* rope: adjust proj -> norm -> rope to preserve computation

* Removing some breaking language model fluff in ConditionalGeneration

* Consolidate query_states transforms

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Vectorize the loops in AltUp (#19)

* Vectorize the loops in AltUp

* fix typo

* Expanding to support batched inputs

* remove extra debug script

* Fix AltUp.forward

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel

* Convert norm to 1/sqrt (#21)

* Convert norm to 1/sqrt

* Scale shift change per Phil's rec

* Adding default activation sparsity

* Fixing 2B config in weights conversion script

* Fixing RMSNorm parameters - adding scale_shift and with_scale

* Correcting query pre-attention scaling

* Adding query_rescale_scalar to text config

* Adding layer_idx to MLP

* Permafix for input_layernorm

* Use 1/sqrt instead of rsqrt in DecoderLayer

* Fix o_proj conversion

* Conversion script update for vision encoder

* Removing logging for debugging timm model

* Fixing bugs in Gemma3nForConditionalGeneration for text generation

* Generating the modeling_gemma3n.py file

* Removing the addition of an erroneous line in the modeling file

* Adding gemma3n text model to modeling_auto

* Bugfix: Updating the interleaving of inputs_embeds and vision_embeds

* Updating the modeling file with the latest bugfix changes

* Updating models/auto for Gemma 3n

* using AutoTokenizer in forward test

* Adding processing_gemma3n.py

* Gemma 3n configured for AutoModel. Conversion script updated.

* Removing errant merge artifacts

---------

Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Removing errant debugging statements from Gemma 3

* Gemma3n audio model (#18)

* testing utilities for numerics comparisons

* Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock

* Add audio version of forward script based on RyanMullins' implementation

* Updating to match encoder tests. WIP: config question needs resolving

* Updates to audio classes to enable end-to-end running

* Removing vestigial classes, cleaning up print statements

* Adding SiLU / Swish to audio conformer feed forward block

* Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio

* Adding outputs to audio test

* Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model

* Update forward test to load from local weights

* Update conversion to process / output audio layers

* Update __all__ to export audio encoder

* AutoModel registration for Gemma 3n Audio

* Use AutoModel for ConditionalGeneration.audio_tower

* Fixing input_proj_linear transpose

* Fixing Gemma3NanoAudioConformerAttention.post conversion

* Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion

* Correcting indentation issue on Gemma3p5RMSNorm

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Text + Vision Part 2 (#23)

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3p5.py

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Updating configs for the 2B variant in the conversion script

* Using final generation config in conversion script

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Audio Integration (#12)

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma 3n overall and text config with vision and audio config placeholders (#3)

* Adding Gemma 3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemma3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3n

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* Converting sl.Frontend to FeatureExtractor

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3n.py

* Update modular

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Draft of audio data in chat template

* Removing image processing. Using SigLIP instead.

* Audio input going end-to-end

* Fixing dtype issues in audio encoder

* x-lib formatting consistency

* Adding example data

* Save preprocessor_config.json from conversion script

* Instrumentaiton for debugging

* Additional instrumentation for preprocessing debugging

* Updates to preprocessor, padding; produces correct end-to-end results on sample

* Tackling configuraiton TODOs

* Start of feature extractor refatcor

* Adds Numpy version of USM extractor, removes Torch version and dependencies

* Fixing AltUp.correct coef permute

* Supporting batches of single audio segment inputs

* Docstrings updates for config

* In-lining audio feature extraction

* Adjustments to conversion script and smoke test script

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>

* Gemma 3n renaming

* Removing test data and utilities

* Renaming test files

* Gemma 3n refactor

* Fix tokenizer config in conversion script

* Address reviewer feedback

* FeatureExtractor returns float32 by default

* Adding basic tests for audio, and input name for audio encoder

* Audio integration test, updates to model_id for other integration tests

* Use scales for q and k norms (#26)

* Update audio integration test to use HF dataset

* Reviewer feedback

* Expand embedding table to full vocab size in weights conversion

* Mix-n-match MatFormers for Gemma 3n (#25)

* Remove in-place operations (#30)

* chore: removing inplace ops

* remove [tensor] * n pattern

* chore: reviewer feedback in AudioEncoder and AltUp

* More grad clipping

* Dynamo compatibility

* fix: cache slicing error

* chore: simplify shared kv cache slicing

* chore: vision encoder rename in timm

* fix: image processor do_normalize=False

* fixup: style

* chore: model_doc

* fix: docs for code quality

* chore: repo consistency

* fix: RMSNorm in float as in prior Gemmas

* fix: per_layer_inputs = None

* chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint

* chore: repo consistency

* Add initial unit tests for Gemma3nAudioFeatureExtractor (#27)

* Add initial unit tests for Gemma3nAudioFeatureExtractor

* Add basic unit tests for Gemma3nProcessor (#28)

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* parameterize tests

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* chore: code style

* fix: test cases

* style and consistency

* fix config in the test to be coherent with layer cache sharing

* fix hidden states in tests and code

* inits and mappings

* fix modality prefixes

* test order and prefixes

* fix test exception

* fix class order and reduce model size for faster tests

* restore _checkpoint_conversion_mapping to load Caual from Conditional

* fix config mapping!

* fix: reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix import test

* add model args

* auto_docstring

* replace test path

* consistency

* skip tests for now

* fix docstring for doc builder

* skip unused attr

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-26 17:55:47 +02:00
3e5cc12855 [tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051)
* rm tf/flax tests

* more flax deletions

* revert fixture change

* reverted test that should not be deleted; rm tf/flax test

* revert

* fix a few add-model-like tests

* fix add-model-like checkpoint source

* a few more

* test_get_model_files_only_pt fix

* fix test_retrieve_info_for_model_with_xxx

* fix test_retrieve_model_classes

* relative paths are the devil

* add todo
2025-06-26 16:25:00 +01:00
cfff7ca9a2 [Whisper] Pipeline: handle long form generation (#35750)
* handle long form generation

* add warning

* correct incorrect in place token change

* update test to catch edge case

* make style

* update warning

* add doc
2025-06-26 14:33:31 +00:00
02ecdcfc0f add _keep_in_fp32_modules_strict (#39058)
* add _keep_in_fp32_modules_strict

* complete test
2025-06-26 13:55:28 +00:00
vb
d973e62fdd fix condition where torch_dtype auto collides with model_kwargs. (#39054)
* fix condition where torch_dtype auto collides with model_kwargs.

* update tests

* update comment

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 14:52:57 +02:00
44b231671d [qwen2-vl] fix vision attention scaling (#39043)
scale lost its `-` when refactoring
2025-06-26 14:06:52 +02:00
ae15715df1 polishing docs: error fixes for clarity (#39042)
* fix duplicate deprecate_models.py

* fix duplicate modular_model_converter.py
2025-06-26 11:56:31 +00:00
3abeaba7e5 Create test for #38916 (custom generate from local dir with imports) (#39015)
* create test for #38916 (custom generate from local dir with imports)
2025-06-26 13:54:36 +02:00
25c44d4b68 Internvl fix (#38946)
* Image processor compile fix (#38540)

* Added a compile-friendly versiom of resize to BaseImgProcessorFast

* Changed qwen2 processor to use its parent class .resize

* Style

* underlined issue only happens on AMD w/ comment and bool check

* Fixed some utils functions

* Fixed the same issue for bridgetower

* Fixed the same issue for llava_next

* Repo consistency for llava onevision

* Update src/transformers/image_processing_utils_fast.py

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

---------

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

* Added an Expectation to an internvl test

* Made qwen2_vl use the resize method of its parent clas

* Changed to torch.where

---------

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
2025-06-26 13:44:59 +02:00
f85b47d1b8 [Generate] Fix no grad on some models (#39008)
fixes on torch no grad for generate
2025-06-26 13:06:09 +02:00
583db52bc6 Add Dia model (#38405)
* add dia model

* add tokenizer files

* cleanup some stuff

* brut copy paste code

* rough cleanup of the modeling code

* nuke some stuff

* more nuking

* more cleanups

* updates

* add mulitLayerEmbedding vectorization

* nits

* more modeling simplifications

* updates

* update rope

* update rope

* just fixup

* update configuration files

* more cleanup!

* default config values

* update

* forgotten comma

* another comma!

* update, more cleanups

* just more nits

* more config cleanups

* time for the encoder

* fix

* sa=mall nit

* nits

* n

* refacto a bit

* cleanup

* update cv scipt

* fix last issues

* fix last nits

* styling

* small fixes

* just run 1 generation

* fixes

* nits

* fix conversion

* fix

* more fixes

* full generate

* ouf!

* fixes!

* updates

* fix

* fix cvrt

* fixup

* nits

* delete wrong test

* update

* update

* test tokenization

* let's start changing things bit by bit - fix encoder step

* removing custom generation, moving to GenerationMixin

* add encoder decoder attention masks for generation

* mask changes, correctness checked against ad29837 in dia repo

* refactor a bit already --> next cache

* too important not to push :)

* minimal cleanup + more todos

* make main overwrite modeling utils

* add cfg filter & eos filter

* add eos countdown & delay pattern

* update eos countdown

* add max step eos countdown

* fix tests

* fix some things

* fix generation with testing

* move cfg & eos stuff to logits processor

* make RepetitionPenaltyLogitsProcessor flexible

- can accept 3D scores like (batch_size, channel, vocab)

* fix input_ids concatenation dimension in GenerationMixin for flexibility

* Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility.

* Add stopping criteria

* refactor

* move delay pattern from processor to modeling like musicgen.

- add docs
- change eos countdown to eos delay pattern

* fix processor & fix tests

* refactor types

* refactor imports

* format code

* fix docstring to pass ci

* add docstring to DiaConfig & add DiaModel to test

* fix docstring

* add docstring

* fix some bugs

* check

* porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first

* experimental testing of left padding for first channel

* whoops

* Fix merge to make generation work

* fix cfg filter

* add position ids

* add todos, break things

* revert changes to generation --> we will force 2d but go 3d on custom stuff

* refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos

* some first fixes to get to 10. in generation

* some more generation fixes / adjustment

* style + rope fixes

* move cfg out, simplify a few things, more todos

* nit

* start working on custom logit processors

* nit

* quick fixes

* cfg top k

* more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar

* lets keep changes to core code minimal, only eos scaling is questionable atm

* simpler eos delay logits processor

* that was for debugging :D

* proof of concept rope

* small fix on device mismatch

* cfg fixes + delay logits max len

* transformers rope

* modular dia

* more cleanup

* keep modeling consistently 3D, generate handles 2D internally

* decoder starts with bos if nothing

* post processing prototype

* style

* lol

* force sample / greedy + fixes on padding

* style

* fixup tokenization

* nits

* revert

* start working on dia tests

* fix a lot of tests

* more test fixes

* nit

* more test fixes + some features to simplify code more

* more cleanup

* forgot that one

* autodocs

* small consistency fixes

* fix regression

* small fixes

* dia feature extraction

* docs

* wip processor

* fix processor order

* processing goes brrr

* transpose before

* small fix

* fix major bug but needs now a closer look into the custom processors esp cfg

* small thing on logits

* nits

* simplify indices and shifts

* add simpler version of padding tests back (temporarily)

* add logit processor tests

* starting tests on processor

* fix mask application during generation

* some fixes on the weights conversion

* style + fixup logits order

* simplify conversion

* nit

* remove padding tests

* nits on modeling

* hmm

* fix tests

* trigger

* probably gonna be reverted, just a quick design around audio tokenizer

* fixup typing

* post merge + more typing

* initial design for audio tokenizer

* more design changes

* nit

* more processor tests and style related things

* add to init

* protect import

* not sure why tbh

* add another protect

* more fixes

* wow

* it aint stopping :D

* another missed type issue

* ...

* change design around audio tokenizer to prioritize init and go for auto - in regards to the review

* change to new causal mask function + docstrings

* change ternary

* docs

* remove todo, i dont think its essential tbh

* remove pipeline as current pipelines do not fit in the current scheme, same as csm

* closer to wrapping up the processor

* text to audio, just for demo purposes (will likely be reverted)

* check if it's this

* save audio function

* ensure no grad

* fixes on prefixed audio, hop length is used via preprocess dac, device fixes

* integration tests (tested locally on a100) + some processor utils / fixes

* style

* nits

* another round of smaller things

* docs + some fixes (generate one might be big)

* msytery solved

* small fix on conversion

* add abstract audio tokenizer, change init check to abstract class

* nits

* update docs + fix some processing :D

* change inheritance scheme for audio tokenizer

* delete dead / unnecessary code in copied generate loop

* last nits on new pipeline behavior (+ todo on tests) + style

* trigger

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
2025-06-26 11:04:23 +00:00
5995cfa0a0 Fix Bad Outputs in Fast Path for GraniteMoeHybrid (#39033)
Fix bug in previous state setting
2025-06-26 09:45:57 +02:00
22b0a89878 Granite speech speedup + model saving bugfix (#39028)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property

* speedup relative position embeddings

* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.

* minor

* minor

* minor

* fixing a crash without peft active

* add todo to replace einsum
2025-06-26 09:44:17 +02:00
1d45d90e5d [tests] remove TF tests (uses of require_tf) (#38944)
* remove uses of require_tf

* remove redundant import guards

* this class has no tests

* nits

* del tf rng comment
2025-06-25 17:29:10 +00:00
d37f751797 Two ReDOS fixes (#39013)
* two_redos_fixes

* Fix two redos issues

* Just don't use RE at all
2025-06-25 17:31:26 +01:00
551e48f182 [Kyutai-STT] correct model type + model id (#39035)
* correct model type + model id

* udpate doc

* init fix

* style !!!
2025-06-25 16:09:00 +00:00
dad0e87c79 Add SmolLM3 (#38755)
* init smollm3

* integration tests

* config quirks

* docs stub

* rests round 2

* tests round 3

* tests round 4

* bring SWA back

* config checker pls

* final checkpoint

* style and copies

* Update src/transformers/models/smollm3/modular_smollm3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/smollm3/modular_smollm3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-25 15:12:15 +00:00
3233e9b7c3 refactor: remove custom BarkLayerNorm (#39003)
`nn.LayerNorm` supports `bias=False` since Pytorch 2.1
2025-06-25 16:07:52 +01:00
3c1d4dfbac Fix grammatical error in models documentation (#39019) 2025-06-25 14:55:22 +00:00
858f9b71a8 Remove script datasets in tests (#38940)
* remove trust_remote_code

* again

* Revert "Skip some tests for now (#38931)"

This reverts commit 31d30b72245aacfdf70249165964b53790d9c4d8.

* again

* style

* again

* again

* style

* fix integration test

* fix tests

* style

* fix

* fix

* fix the last ones

* style

* last one

* fix last

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-25 14:31:20 +00:00
3c322c9cdf fix gemma3 grad acc (#37208)
* fix gemma3 grad acc

* fix

* fix

* fix

* fix

* rmv print

* rm

* Update setup.py

* Apply style fixes

* propagate the changes

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-25 16:28:44 +02:00
860b898d03 fix: astronomical loss with ModernBERT when using gradient checkpointing (#38982) (#38983)
* fix: astronomical loss with ModernBERT when using gradient checkpointing

* update the modling fix

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-25 16:11:18 +02:00
a2eb75c891 Support for Flash Attention 3 (#38972)
* Support `flash_attn_3`
Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper

- Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...`

An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated

Based on https://github.com/huggingface/transformers/pull/36190 which has model implementations and examples which could be merged

* Add tests for Flash Attention 2 and 3 parity

* ci fix

* FA2 compatibiity
- `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids`
- Remove bettertransformer check in Flash Attention 3
- Merge tests
- Add licensing

* ci fix

* Test naming consistency

* ci fix

* Deprecation warning for `prepare_fa2_from_position_ids`

* ci fix
2025-06-25 14:39:27 +02:00
de98fb25a3 Fix the seamless_m4t cannot work on Gaudi (#38363)
* Fix the seamless_m4t cannot work on Gaudi

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Refine the patch

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix seamless_m4t_v2 crash

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Use the patched_gather

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove debug logs

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove useless modifications

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add hpu check

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add comments

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-06-25 12:40:01 +02:00
7503cb9113 [Model] add dots1 (#38143)
* add dots1

* address comments

* fix

* add link to dots1 doc

* format

---------

Co-authored-by: taishan <rgtjf1@163.com>
2025-06-25 11:38:25 +02:00
3ef8896906 Encoder-Decoder Gemma (#38332)
* Initial submit

* Fix bugs:
1. add __init__ file
2. tied word embedding
3. support flash/flex attention
4. model saving and loading

* Code refactor:
* Rename encdecgemma to t5gemma.
* Split attention into self- and cross-attention
* Split stack into encoder and decoder
* Add test cases
* Add auto configuration

* Update configurations.

* Fix bugs related to copy and attribute checks

* Fix type union

* Fix merge errors

* run ruff format

* Run make style and update tests.

* Add t5gemma model doc.

* ruff and style formatting.

* Add missed module config.

* Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.).

* Update model doc.

* Minor updates following Arthur's comments:
* replace docstrings with auto_docstrings
* remove checkpoint layers
* remove deprecate_kwargs

* fix rebase errors

* Fix docstring issues.

* fix t5gemma doc issue.

* run ruff format

* Updates:
* split encoder-only model out
* make t5gemmamodel encoder-decoder only
* update token and sequence classification
* update tests
2025-06-25 09:05:10 +00:00
af9870265e GLM-4.1V Model support (#38431)
* 20250508 Model Architecture

* Update modeling_glm4v.py

* Update modeling_glm4v.py

* Update modeling_glm4v.py

* update 1447

* 0526

* update

* format

* problem

* update

* update with only image embed diff

* Final

* upload

* update

* 1

* upload with ruff

* update

* update

* work

* 1

* 1

* update with new note

* 2

* Update convert_glm4v_mgt_weights_to_hf.py

* Update tokenization_auto.py

* update with new format

* remove rmsnrom

* draft with videos

* draft

* update

* update

* fix for review problem

* try to remove min_pixel

* update

* for test

* remove timestamps

* remove item

* update with remove

* change

* update 2200

* update

* Delete app.py

* format

* update

* Update test_video_processing_glm4v.py

* 1

* 2

* use new name

* Update test_video_processing_glm4v.py

* remove docs

* change

* update for image processors update

* 2108

* 2128

* Update modular_glm4v.py

* 1

* update some

* update

* rename

* 1

* remove tests output

* 2

* add configuration

* update

* Update test_video_processing_glm4v.py

* fix simple forward tests

* update with modular

* 1

* fix more tests

* fix generation test

* fix beam search and init

* modular changed

* fix beam search in case of single-image/video. Fails if multiple visuals per text

* update processor

* update test

* pass

* fix beam search

* update

* param correct

* Update convert_glm4v_mgt_weights_to_hf.py

* 1

* Update test_modeling_glm4v.py

* 4

* 2

* 2123 video process

* 2

* revert

* 1

* 2

* revert processing

* update preprocesor

* changed

* 1

* update

* update

* 6

* update

* update

* update

* Delete tmp.txt

* config

* Update video_processing_glm4v.py

* apply modular correctly

* move functions

* fix order

* update the longest_edge

* style

* simplify a lot

* fix random order of classes

* skip integration tests

* correctly fix the tests

* fix TP plan

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-25 10:43:05 +02:00
7b3807387b Drop unnecessary tokens in GPT2Model generation (#39016)
Drop unnecessary tokens in GPT2Model generation.

Co-authored-by: Yi Pan <conlesspan@outlook.com>
2025-06-25 08:29:00 +00:00
e212ff9e6a [video processor] support torchcodec and decrease cuda memory usage (#38880)
* don't move the whole video to GPU

* add torchcodec

* add tests

* make style

* instrucblip as well

* consistency

* Update src/transformers/utils/import_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/utils/import_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/video_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-25 08:23:37 +00:00
11d0feacce [AutoModelForMaskGeneration] Remove duplicate code (#38622)
Remove duplicate code
2025-06-25 10:00:13 +02:00
3ee72af6b6 Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 (#37332)
* Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1

* fix code format

* add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1

* add assert loss is not nan
2025-06-25 07:58:34 +00:00
ae32f1ad11 Add zero dim tensor check when using flash_attention (#38280)
* Add zero dim tensor check when using flash_attention

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

* Add zero dim tensor check when using flash_attention

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

---------

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>
2025-06-25 09:48:50 +02:00
ca402e2116 [LightGlue] Fixed attribute usage from descriptor_dim to keypoint_detector_descriptor_dim (#39021)
fix: fix descriptor dimension handling in LightGlue model
2025-06-24 23:32:07 +01:00
48b6ef0238 Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code,… (#38954)
* Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code, etc.)

* Update quicktour.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-24 11:48:15 -07:00
ea9a30923e [HPU][Critical Issue Fix] ThreadPool instead of Pool for parallel pre-processing (#39002)
* ThreadPool instead of Pool for parallel pre-processing

* ThreadPool only if hpu available
2025-06-24 20:24:50 +02:00
995666edb5 Skip sdpa dispatch on flash test due to unsupported head dims (#39010) 2025-06-24 20:16:56 +02:00
f367c6337d Update self-comment-ci.yml user list (#39014)
add ivarflakstad to self-comment-ci.yml
2025-06-24 20:13:36 +02:00
67d36dc1d7 Fix bugs in DynamicCache (#37880)
* Fix bugs in DynamicCache

* Updarte

* Update

* Lint

* lint

* Rename test

* update

* update
2025-06-24 19:43:40 +02:00
6bdd4ec952 Add kyutai stt (#38909)
* first draft

* cleaner version

* udpate tests + modeling

* add tests

* init

* udpate test_modeling_common

* fix tests

* csm Processor draft

* convertion update

* mimi cache padding convolutions draft

* mimi streaming udpates

* update mimi padding cache test

* udpate cache padding mimi test

* make style mimi

* updates generate moshi asr

* moshi asr integration tests (single + batched)

* update tests

* update conversion script

* good default sliding window value

* udpdate generate

* update test checkpoint

* nit

* fix mimi

* fix codec prefix

* revert

* revert

* update config

* update config

* unnecessary mimi input restriction

* remove delay in tokens

* remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask

* test update

* modular update

* make style

* nit

* rename

* create codec model generation config at init

* remove delay

* max_new_tokens/length warning

* correct conv1 padding cache import for modular

* nit

* fix on encoder_past_key_values

* convert modular

* move frame_size to config

* move frame_size to config

* update test name

* handle first token is bos

* better handling of max_new_tokens

* fix

* fix batch size in test input prep

* update docstring

* convert modular

* make style

* make style

* add feature extractor

* correct modular convention name for feature_extraction file

* update convertion script

* doc processor

* update doc

* udpate init

* update model type

* fixes

* update tests

* fix

* make

* add doc

* nit

* fix

* doc

* auto mappings

* doc

* nit

* convert modular

* doc

* nit

* extend _keep_in_fp32_modules to enforce fp32

* renaming to stt

* doc update + test update

* doc fixes

* doc fix

* doc fix

* fix musicgen tests

* fix musicgen tests

* make style

* fix musicgen tests

* correct frame_rate config param for mimi

* update mimi test

* revert update mimi test

* enforce cpu test

* move cache init in cache class

* convert modular

* docstring update

* update model id

* feature_extractor -> feature_extraction (SEW)

* convert modular

* update model id
2025-06-24 18:01:15 +02:00
08bf7f1afe Add kernelize to transformers (#38205)
* fix

* fix

* fix flow

* remove non compiling path

* change

* style

* fix

* update

* update pin

* revert
2025-06-24 17:38:54 +02:00
be10d4df60 Granite speech - minor fixes to support training with the HF trainer (#38833)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property
2025-06-24 17:06:52 +02:00
e1e11b0299 Fix undeterministic order in modular dependencies (#39005)
* sort correctly

* Update modeling_minimax.py

* Update modular_model_converter.py
2025-06-24 17:04:33 +02:00
bdf5fb70aa Skip non-selected experts for qwen3_moe (#38133)
* fix(qwen3moe): skip experts with no workload

* avoid tolist and also update other moe models

* fix: should squeeze 0-dim only
2025-06-24 16:33:48 +02:00
719058c625 Update attention_visualizer.py (#37860) 2025-06-24 16:21:36 +02:00
9f42c1f192 Added scikit-learn to the example image-classification requirements.txt (#37506)
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-24 15:24:02 +02:00
1636a7bcb9 Fixes for Arcee model (#39001)
* fix modular

* Update modular_arcee.py

* fix
2025-06-24 15:23:52 +02:00
71de20b818 Add Arcee model support (#38621)
* Add Arcee model support to transformers

- Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer
- Use LlamaTokenizer for tokenization
- Add FX graph support for Arcee models
- Create lazy loading module structure for Arcee

* feat: update YARN scaling and RoPE validation for Arcee model

* feat: add auto_docstring checkpoint config to Arcee model classes

* docs: add pre-trained model weights reference to Arcee configuration files

* refactor: move RoPE utilities to dedicated modeling_rope_utils module

* Add comprehensive test suite for Arcee model

- Add test_modeling_arcee.py following standard transformers test patterns
- Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add specific test for ReLU² activation in ArceeMLP
- Add RoPE scaling tests including YARN support
- Follow CausalLMModelTest pattern used by similar models

* Add documentation for Arcee model

- Add comprehensive model documentation with usage examples
- Include all model variants in autodoc
- Add to table of contents in proper alphabetical order
- Fixes documentation coverage for Arcee model classes

* Make style/fixup

* fix copyright year

* Sync modular conversion

* revert in legacy supported models in src/transformers/utils/fx

* cleaned redundant code in modular_arcee.py

* cleaned testing

* removed pretraining tp

* fix styles

* integration testing

---------

Co-authored-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>
2025-06-24 15:05:29 +02:00
23c89a6732 [Attention] Small fix on output attentions (#38948)
small fix
2025-06-24 14:42:10 +02:00
4f650040a6 Removing extra space in large command for speech-pretraining example (#38705)
Removing extra space in Large command
2025-06-24 12:24:56 +00:00
d3d835d4fc [qwen] refactor attentions for vision/audio (#38930)
* refactor attentions in vision/audio

* remove fa2 import

* make config the only args

* pass along kwargs from modality encoders

* style
2025-06-24 10:53:52 +02:00
vb
2e4c045540 🔴 Update default dtype for pipelines to auto (#38882)
* check typing

* Fallback to fp32 if auto not supported.

* up.

* feedback from review.

* make style.
2025-06-24 10:39:18 +02:00
21cb353b7b [docs] Typos - Single GPU efficient training features (#38964)
* Typos

- corrected bf16 training argument
- corrected header for SDPA

* improved readability for SDPA suggested by @stevhliu

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-23 12:33:10 -07:00
f9be71b34d Fix rag (#38585)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-23 17:42:46 +02:00
9eac19eb59 [Feature] Support is_split_into_words in the TokenClassificationPipeline. (#38818)
* some fixes

* some fixes

* now the pipeline can take list of tokens as input and is_split_into_words argument

* now the pipeline can take list of tokens as input and is_split_into_words argument

* now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input

* now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input

* solving test problems

* some fixes

* some fixes

* modify tests

* aligning start and end correctly

* adding tests

* some formatting

* some formatting

* some fixes

* some fixes

* some fixes

* resolve conflicts

* removing unimportant lines

* removing unimportant lines

* generalize to other languages

* generalize to other languages

* generalize to other languages

* generalize to other languages
2025-06-23 15:31:32 +00:00
2ce02b98bf fix mistral and mistral3 tests (#38978)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-23 17:07:18 +02:00
b6b4d43d6d Add support for auto_docstring with model outputs (#38242)
* experiment auto_docstring model outputs

* Fix PatchTSMixer

* Add check model output docstring to check_auto_docstring and fix all model outputs docstring

* add reordering of docstring in check_docstrings

* add check for redundant docstring in check_docstrings, remove redundant docstrings

* refactor check_auto_docstring

* make style

* fix copies

* remove commented code

* change List-> list Tuple-> tuple in docstrings

* fix modular

* make style

* Fix modular vipllava

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-23 10:39:41 -04:00
0c98f24889 fix: add __bool__ operator to tokenizer to avoid bloated asserts (#38899)
* fix: add __bool__ operator to tokenizer to avoid bloated asserts

When a user does 'assert tokenizer' to ensure that the tokenizer is not None, they inadvertently set off a rather expensive process in the '__len__()' operator. This fix adds a trivial '__bool__()' that returns True, so that a None tokenizer asserts and an actual tokenizer returns True when asserted, without calling length op.

* typo
2025-06-23 14:32:16 +00:00
d29482cc91 Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157)
* add working idefics2 fast and improvements for fast nested images processing

* add fast image processors idefics 3 and smolvlm

* cleanup tests

* fic doc idefics2

* PR review and fix issues after merge

* Force providing disable_grouping to group_images_by_shape

* simplify group_images_by_shape

* fix modular

* Fix nits after review
2025-06-23 14:17:25 +00:00
1a96127e46 Break tie in Expectations and gemma3 fixes (#38943)
* Added major / minor version to Expectations ordering

* Added fixes to gemma3

* Style
2025-06-23 15:13:27 +02:00
84d19be41e Apply GradientCheckpointingLayer to the whole repo (#38913)
* first batch (4)

* align

* altclip

* beit

* bert

* yolos

* dino, pvt_v2

* bark, bart, bert_generation

* big_bird, biogpt

* blnderbot, bloom

* bridgetower

* camambert, canine, chameleon

* chinese clip, clap, clip

* codegen, conditional detr, convbert

* dab_detr, data2vec

* dbrx, deberta

* deberta, decicion_tranformer, deformable_detr

* deit, deta, mctct

* detr, dinov2, distilbert

* donut, dpt, electra

* ernie, esm, falcon

* flava, fnet, falcon_mamba

* focalnet, git, gpt2

* gpt - bigcode, neo, neox

* gptj, groupvit

* idefics2, idefics3

* ijepa, imagegpt, internvl

* jetmoe, kosmos2, layoutlm

* layoutlm2-3, led

* lilt, longformer, longt5, luke

* m2m, mamba1-2

* marian, markuplm, mask2former

* maskformer

* mbart, megatron_bert, mimi

* mixtral, mlcd

* mobilevit1-2, modernbert

* moshi, mpt, mra

* mt5, musicgen

* mvp, nemotron

* nllb_moe

* nystromformer, omdet_turbo

* opt, owlvit, owlv2

* pegasus, pegasus_x, presimmon

* phimoe, pix2struct, pixtral

* plbart, pop2piano, prophetnet

* qwen2*

* qwen2, qwen3 moe,  rec gemma

* rembert

* roberta

* roberta prelayernorm

* roc_bert, roformer, rwkv

* sam, sam_hq

* seggpt, smolvlm, speech_to_text

* splinter, stablelm, swin

* swin2sr, switch_transformer, t5, table_transformer

* tapas, time_series_tranformer, timesformer

* trocr, tvp, umt5

* videomae, vilt, visual_bert

* vit, vit_mae, vit_msn

* vitpose_backbone, vits, vivit

* whisper. x_clip, xglm

* xlm_roberta, xmod

* yoso

* zamba

* vitdet, wav2vec2, wav2vec2_bert

* unispeech, wav2vec_conformer

* wavlm

* speecht5

* swinv2

* sew / _d

* seamless_mt4 / _v2

* deprecated models update

* bros

* gemma2, gemma3

* got, hiera, hubert, llama4, mllama, oneformer, phi, olmoe, informer

* fixup

* Add use_cache=False and past_key_value=None to  GradientCheckpointingLayer

* fixup

* fix prophetnet

* fix bigbird_pegasus

* fix blenderbot

* fix mbart

* fix mvp

* fix zamba2

* fix bart

* fix blenderbot_small

* fix codegen

* Update gradient checkpointing layer to support more past_key_values arg names

* fix data2vec vision

* fix deformable_detr

* fix gptj

* fix led

* fix m2m_100

* add comment

* fix nnlb_moe

* Fix pegasus_x

* fix plbart

* udop

* fix-copies: beit, wav2vec2

* fix gpt_bigcode

* fixup

* fix t5

* fix switch_transformers

* fix longt5

* fix mt5

* update tapas

* fix blip2

* update blip

* fix musicgen

* fix gpt2, trocr

* fix copies

* !!! Revert zamba, mllama

* update autoformer

* update bros

* update args / kwargs for BERT and copies

* 2nd round of updates

* update conditional detr

* Pass encoder_hidden_states as positional arg

* Update to pass encoder_decoder_position_bias as positional arg

* fixup

* biogpt modular

* modular gemma2

* modular gemma3

* modular gpt_neox

* modular informer

* modular internvl

* modular mixtral

* modular mlcd

* modular modernbert

* modular phi

* modular qwen2_5_omni

* modular qwen2_5_vl

* modular sam_hq

* modular sew

* wav2vec2_bert

* modular wav2vec2_conformer

* modular wavlm

* fixup

* Update by modular instructblipvideo

* modular data2vec_audio

* nit modular mistral

* apply modular minimax

* fix modular moonshine

* revert zamba2

* fix mask2former

* refactor idefics
2025-06-23 14:24:48 +02:00
07aab1af1e Remove dead protected imports (#38980)
* remove them

* more
2025-06-23 13:44:50 +02:00
74f5e4a1fa [modular] CLI allows positional arguments, and more defaults names for the optional arg (#38979)
* More defaults

* Update modular_model_converter.py
2025-06-23 12:40:01 +02:00
334bf913dc Fix(informer): Correct tensor shape for input_size=1 (#38856)
* Fix(time_series): Correct scaler tensor shape in base model

The create_network_inputs function in TimeSeriesTransformerModel
handled the scaler's loc and scale tensors inconsistently.
When input_size=1, the tensors were not squeezed, leading to
downstream dimension errors for models like Informer.

This commit refactors the logic to unconditionally apply .squeeze(1),
which correctly handles all input_size cases and fixes the bug at its source.

Fixes #38745

* Fix(time_series): Correct scaler tensor shape in base model

The create_network_inputs function in TimeSeriesTransformerModel
handled the scaler's loc and scale tensors inconsistently.
When input_size=1, the tensors were not squeezed, leading to
downstream dimension errors for models like Informer.

This commit refactors the logic to unconditionally apply .squeeze(1),
which correctly handles all input_size cases and fixes the bug at its source.

Fixes #38745

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-06-23 11:50:51 +02:00
c184550daf Fix DTensor import compatibility for PyTorch < 2.5 (#38836) 2025-06-23 11:25:56 +02:00
984ff89e73 Gaudi3 CI (#38790) 2025-06-23 10:56:51 +02:00
2166b6b4ff Update blip model card (#38513)
* Update docs/source/en/model_doc/blip.md

* fix(docs/source/en/model_doc/blip.md): fix redundent typo error

* fix (docs/source/en/model_doc/blip.md): modify of review contents

* fix(docs/source/en/model_doc/blip.md): modify code block

* Update blip.md

---------

Co-authored-by: devkade <mouseku@moana-master>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-20 13:46:19 -07:00
166e823f77 Fix custom generate from local directory (#38916)
Fix custom generate from local directory:
1. Create parent dirs before copying files (custom_generate dir)
2. Correctly copy relative imports to the submodule file.
3. Update docs.
2025-06-20 17:36:57 +01:00
3d34b92116 Switch to use A10 progressively (#38936)
* try

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 16:10:35 +00:00
b8059e1f8f Fix more flaky test_initialization (#38932)
* try

* try

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 17:28:32 +02:00
5ee60f970a Correctly raise error for awq quantization (#38945)
fix warning
2025-06-20 17:18:06 +02:00
8ac2d75353 Pin PyTorch extras for AMD containers (#38941)
* Pin additional Torch packages

* Remove unused def

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-06-20 12:17:21 +00:00
9120567b02 Add kwargs for timm.create_model in TimmWrapper (#38860)
* Add init kwargs for timm wrapper

* model_init_kwargs -> model_args

* add save-load test

* fixup
2025-06-20 12:00:09 +00:00
ff95974bc6 [static cache] fix device map per layer in VLMs (#38488)
return lm as decoder
2025-06-20 13:49:29 +02:00
aa42987c1e Remove ALL_LAYERNORM_LAYERS (#38922)
* remove it everywhere

* Update trainer_pt_utils.py

* Update trainer_pt_utils.py

* style

* sort list in test

* CIs

* use recursion same way as before (for intermediate layer names)
2025-06-20 12:06:48 +02:00
38a9b70786 add pytorch-xpu Dockerfile (#38875)
* first commit

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* use rls pytorch

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-20 11:42:44 +02:00
9bcdd5cde9 Modernbert fixes (#38912)
* Removed deprecated argument in modernbert RotaryEmbedding

* Skip test_sdpa_can_dispatch_on_flash for modernbert

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-20 11:22:32 +02:00
31d30b7224 Skip some tests for now (#38931)
* try

* [test all]

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 11:05:49 +02:00
0725cd6953 Remove deprecated classes in modeling_utils.py (#38919)
* remove deprecated classes

* style
2025-06-19 19:25:20 +02:00
797860c68c feat: add flexible Liger Kernel configuration to TrainingArguments (#38911)
* feat: add flexible Liger Kernel configuration to TrainingArguments

Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.

Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag

Example usage:
```python
TrainingArguments(
    use_liger_kernel=True,
    liger_kernel_config={
        "rope": True,
        "swiglu": True,
        "cross_entropy": False,
        "fused_linear_cross_entropy": True
    }
)
Closes #38905

* Address comments and update Liger section in Trainer docs
2025-06-19 15:54:08 +00:00
89b35be618 Allow make-fixup on main branch, albeit slowly (#38892)
* Allow make-fixup on main branch, albeit slowly

* Make the other style checks work correctly on main too

* More update

* More makefile update
2025-06-19 15:22:59 +01:00
9a02e7602d feat: Add granite architectures to auto tokenizer name mappings (#38802)
Branch: GraniteTokenizerMapping

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-19 15:20:42 +01:00
54a02160eb Fix ReDOS in tokenizer digit substitution (#38844)
* Fix regexes vulnerable to ReDOS

* Let's just use regex

* Import regex/re correctly
2025-06-19 14:53:52 +01:00
af6120b3eb Skip sdpa tests if submodule does not support sdpa (#38907) 2025-06-19 13:11:01 +00:00
5d26a38735 Fix FalconMambaIntegrationTests (#38566)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-19 13:50:33 +02:00
a9ce8c69c9 align xpu's autocast behavior w/ cuda by using device agnostic torch APIs (#38284)
* siwtch to device agnostic autocast in nemotron to align xpu behavior w/
cuda

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix issue

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* refine

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* update get_autocast_gpu_dtype to device agnostic one

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-19 11:48:23 +00:00
0a53df1a77 Fix unnecessary super calls (#38897)
Signed-off-by: cyy <cyyever@outlook.com>
2025-06-19 11:45:51 +00:00
b949747b54 Fix fsmt tests (#38904)
* fix 1

* fix 2

* fix 3

* fix 4

* fix 5

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-19 10:56:34 +02:00
11738f8537 [phi-4] use mel filters from audio utils (#36966)
* use mel_filter_bank from audio utils

* Apply style fixes

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-19 12:35:32 +09:00
f7b21822e3 Use raise from e in hub.py utility (#37241)
Use raise from e

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-19 03:06:25 +00:00
3756bf192c Add support for specifying revisions when pushing to Hub via internal Trainer call (#36852)
* Update training_args.py

* Update trainer.py

* fixes

* fix

* remove extraneous comments

* explicit revision arg

* add msg

* fixup

* fix field name

* rename field revision to hub_revision

* restore gradient_checkpointing doc

* fix ws

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-19 02:35:33 +00:00
458e0b376c Update bamba model card (#38853)
* Update bamba model card

* Update the doc for bamba

* Update docs/source/en/model_doc/bamba.md

Bamba paragraph

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

Bamba collection url

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

Update Padding-Free Training to Notes heading

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

update examples

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

Update additional info

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

consistent casing

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

simplify sentences

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Include pipeline and cli examples + fix formatting

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

update cli id

* Update quantization example

* Fix auto code formatter changes

* Update cli command + include BambaModel

* Update docs/source/en/model_doc/bamba.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-18 16:01:25 -07:00
ea01334873 [video processor] fix slow tests (#38881)
* we need to check against mapping to be safe

* need to check only when inferring from image type, otherwise messes custom code

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-18 22:39:56 +02:00
b922b22ec2 36978 | Fast image processor for DPT model (#37481)
* chore: ran codegen script

* test: test_image_processor_properties

* test: test_image_processor_from_dict_with_kwargs

* test: wip - test_padding

* test: test_padding

* test: test_keep_aspect_ratio

* wip

* test

* test: wip

* test: wip

* test: test_call_segmentation_maps, wip

* chore: tidy up

* test: test_call_segmentation_maps

* fix: test_save_load_fast_slow

* test: reduce labels

* chore: make fixup

* chore: rm comment

* chore: tidy

* chore remove comment

* refactor: no need to infer channel dimesnion

* refactor: encapsulate logic for preparing segmentation maps

* refactor: improve readability of segmentation_map preparation

* improvement: batched version of pad_image

* chore: fixup

* docs

* chore: make quality

* chore: remove unecessary comment

* fix: add SemanticSegmentationMixin

* feat: add post_process_depth_estimation to fast dpt image processor

* chore: fix formatting

* remove max_height, max_width

* fix: better way of processin segmentation maps
- copied from Beit Fast processor

* chore: formatting + remove TODO

* chore: fixup styles

* chore: remove unecessary line break

* chore: core review suggestion to remove autodocstring

* fix: add do_reduce_labels logic + refactor
- refactor preprocess logic to make it consistent with other processors
- add missing reduce labels logic

* refactor: remove deprecated mixin

* chore: fixup

* use modular for dpt + final nit changes

* fix style

---------

Co-authored-by: Samuel Rae <samuelrae@Samuels-Air.fritz.box>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-18 17:33:29 +00:00
c27f628e98 Docs: Add custom fine-tuning tutorial to TrOCR model page (#38847)
* Update trocr.md

Docs: add community fine‑tuning notebook link to TrOCR page

* apply suggested changes from PR review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/trocr.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-18 09:38:58 -07:00
0a289d1630 log: Add logging when using split_batches and per_device_train_batch_size (#38633)
* log: Add logging when user uses split_batches and per_device_train_batch_size

* refactor: remove whitespace from blank line

* Update src/transformers/training_args.py

Change logging level to info

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-18 16:26:46 +00:00
c55d806355 [bugfix] fix ATTN_MASK_NPU device mismatch error on multi-device NPU … (#38876)
[bugfix] fix ATTN_MASK_NPU device mismatch error on multi-device NPU setups
2025-06-18 16:26:22 +00:00
9cd7570f34 Fix loop var naming (#38885) 2025-06-18 13:45:01 +00:00
1fc67a25c6 More PYUP fixes (#38883)
More pyup fixes

Signed-off-by: cyy <cyyever@outlook.com>
2025-06-18 14:38:08 +01:00
12d4c5b66f null deepspeed_plugin in args for wandb callback fake trainer (#38867) 2025-06-18 13:10:22 +00:00
3620b32cc8 Fixed markdown for BertTokenizer's '[CLS]' token. (#38506) 2025-06-18 13:09:58 +00:00
cb0f604192 Fix HQQ model param device transfer issue (#38466)
* Fix HQQ model param device transfer issue

* modify a comment

* clear the code and add test for hqq device/dtype

* fix test hqq code quality of imports

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-18 15:09:00 +02:00
c77bcd889f Fix qwen3_moe tests (#38865)
* try 1

* try 2

* try 3

* try 4

* try 5

* try 6

* try 7

* try 8

* try 9

* try 10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-18 14:36:03 +02:00
5a95ed5ca0 🚨🚨 Fix initialization of Mask2Former (#38864)
* Correctly fix init

Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>

* add back the block, breaking BC but this is correct author's code

* override the test for params needing it

---------

Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>
2025-06-18 09:46:22 +02:00
309e8c96f2 Fix phi4_multimodal tests (#38816)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-18 09:39:17 +02:00
3526e25d3d enable misc test cases on XPU (#38852)
* enable misc test cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* tweak bamba ground truth on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* remove print

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* one more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-18 09:20:49 +02:00
d058f81e5b Post-PR fixes! (#38868)
* Post-PR fixes!

* make fix-copies
2025-06-17 19:58:47 +01:00
508a704055 No more Tuple, List, Dict (#38797)
* No more Tuple, List, Dict

* make fixup

* More style fixes

* Docstring fixes with regex replacement

* Trigger tests

* Redo fixes after rebase

* Fix copies

* [test all]

* update

* [test all]

* update

* [test all]

* make style after rebase

* Patch the hf_argparser test

* Patch the hf_argparser test

* style fixes

* style fixes

* style fixes

* Fix docstrings in Cohere test

* [test all]

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-17 19:37:18 +01:00
a396f4324b Update roc bert docs (#38835)
* Moved the sources to the right

* small Changes

* Some Changes to moonshine

* Added the install to pipline

* updated the monshine model card

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated Documentation According to changes

* Fixed the model with the commits

* Changes to the roc_bert

* Final Update to the branch

* Adds Quantizaiton to the model

* Finsihed Fixing the Roc_bert docs

* Fixed Moshi

* Fixed Problems

* Fixed Problems

* Fixed Problems

* Fixed Problems

* Fixed Problems

* Fixed Problems

* Added the install to pipline

* updated the monshine model card

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated Documentation According to changes

* Fixed the model with the commits

* Fixed the problems

* Final Fix

* Final Fix

* Final Fix

* Update roc_bert.md

---------

Co-authored-by: Your Name <sohamprabhu@Mac.fios-router.home>
Co-authored-by: Your Name <sohamprabhu@Sohams-MacBook-Air.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-17 11:02:18 -07:00
3ae52cc312 Update CvT documentation with improved usage examples and additional … (#38731)
* Update CvT documentation with improved usage examples and additional notes

* initial update

* cvt

* Update docs/source/en/model_doc/cvt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update cvt.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-17 10:30:03 -07:00
e5a9ce48f7 Add LightGlue model (#31718)
* init

* chore: various changes to LightGlue

* chore: various changes to LightGlue

* chore: various changes to LightGlue

* chore: various changes to LightGlue

* Fixed dynamo bug and image padding tests

* refactor: applied refactoring changes from SuperGlue's concat, batch and stack functions to LightGlue file

* tests: removed sdpa support and changed expected values

* chore: added some docs and refactoring

* chore: fixed copy to superpoint.image_processing_superpoint.convert_to_grayscale

* feat: adding batch implementation

* feat: added validation for preprocess and post process method to LightGlueImageProcessor

* chore: changed convert_lightglue_to_hf script to comply with new standard

* chore: changed lightglue test values to match new lightglue config pushed to hub

* chore: simplified convert_lightglue_to_hf conversion map

* feat: adding batching implementation

* chore: make style

* feat: added threshold to post_process_keypoint_matching method

* fix: added missing instructions that turns keypoints back to absolute coordinate before matching forward

* fix: added typehint and docs

* chore: make style

* [run-slow] lightglue

* fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching

* tests: added CUDA proof tests similar to SuperGlue

* chore: various changes to modeling_lightglue.py

- Added "Copies from" statements for copied functions from modeling_superglue.py
- Added missing docstrings
- Removed unused functions or classes
- Removed unnecessary statements
- Added missing typehints
- Added comments to the main forward method

* chore: various changes to convert_lightglue_to_hf.py

- Added model saving
- Added model reloading

* chore: fixed imports in lightglue files

* [run-slow] lightglue

* chore: make style

* [run-slow] lightglue

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* [run-slow] lightglue

* chore: Applied some suggestions from review

- Added missing typehints
- Refactor "cuda" to device variable
- Variable renaming
- LightGlue output order changed
- Make style

* fix: added missing grayscale argument in image processor in case use of SuperPoint keypoint detector

* fix: changed lightglue HF repo to lightglue_superpoint with grayscale default to True

* refactor: make keypoints `(batch_size, num_keypoints, keypoint_dim)` through forward and unsqueeze only before attention layer

* refactor: refactor do_layer_keypoint_pruning

* tests: added tests with no early stop and keypoint pruning

* refactor: various refactoring to modeling_lightglue.py

- Removed unused functions
- Renamed variables for consistency
- Added comments for clarity
- Set methods to private in LightGlueForKeypointMatching
- Replaced tensor initialization to list then concatenation
- Used more pythonic list comprehension for repetitive instructions

* refactor: added comments and renamed filter_matches to get_matches_from_scores

* tests: added copied from statement with superglue tests

* docs: added comment to prepare_keypoint_matching_output function in tests

* [run-slow] lightglue

* refactor: reordered _concat_early_stopped_outputs in LightGlue class

* [run-slow] lightglue

* docs: added lightglue.md model doc

* docs: added Optional typehint to LightGlueKeypointMatchingOutput

* chore: removed pad_images function

* chore: set do_grayscale default value to True in LightGlueImageProcessor

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* docs: added missing LightGlueConfig typehint in nn.Module __init__ methods

* docs: removed unnecessary code in docs

* docs: import SuperPointConfig only from a TYPE_CHECKING context

* chore: use PretrainedConfig arguments `num_hidden_layers` and `num_attention_heads` instead of `num_layers` and `num_heads`

* chore: added organization as arg in convert_lightglue_to_hf.py script

* refactor: set device variable

* chore: added "gelu" in LightGlueConfig as hidden_act parameter

* docs: added comments to reshape.flip.reshape instruction to perform cross attention

* refactor: used batched inference for keypoint detector forward pass

* fix: added fix for SDPA tests

* docs: fixed docstring for LightGlueImageProcessor

* [run-slow] lightglue

* refactor: removed unused line

* refactor: added missing arguments in LightGlueConfig init method

* docs: added missing LightGlueConfig typehint in init methods

* refactor: added checkpoint url as default variable to verify models output only if it is the default url

* fix: moved print message inside if statement

* fix: added log assignment r removal in convert script

* fix: got rid of confidence_thresholds as registered buffers

* refactor: applied suggestions from SuperGlue PR

* docs: changed copyright to 2025

* refactor: modular LightGlue

* fix: removed unnecessary import

* feat: added plot_keypoint_matching method to LightGlueImageProcessor with matplotlib soft dependency

* fix: added missing import error for matplotlib

* Updated convert script to push on ETH org

* fix: added missing licence

* fix: make fix-copies

* refactor: use cohere apply_rotary_pos_emb function

* fix: update model references to use ETH-CVG/lightglue_superpoint

* refactor: add and use intermediate_size attribute in config to inherit CLIPMLP for LightGlueMLP

* refactor: explicit variables instead of slicing

* refactor: use can_return_tuple decorator in LightGlue model

* fix: make fix-copies

* docs: Update model references in `lightglue.md` to use the correct pretrained model from ETH-CVG

* Refactor LightGlue configuration and processing classes

- Updated type hints for `keypoint_detector_config` in `LightGlueConfig` to use `SuperPointConfig` directly.
- Changed `size` parameter in `LightGlueImageProcessor` to be optional.
- Modified `position_embeddings` in `LightGlueAttention` and `LightGlueAttentionBlock` to be optional tuples.
- Cleaned up import statements across multiple files for better readability and consistency.

* refactor: Update LightGlue configuration to enforce eager attention implementation

- Added `attn_implementation="eager"` to `keypoint_detector_config` in `LightGlueConfig` and `LightGlueAttention` classes.
- Removed unnecessary logging related to attention implementation fallback.
- Cleaned up import statements for better readability.

* refactor: renamed message into attention_output

* fix: ensure device compatibility in LightGlueMatchAssignmentLayer descriptor normalization

- Updated the normalization of `m_descriptors` to use the correct device for the tensor, ensuring compatibility across different hardware setups.

* refactor: removed Conv layers from init_weights since LightGlue doesn't have any

* refactor: replace add_start_docstrings with auto_docstring in LightGlue models

- Updated LightGlue model classes to utilize the new auto_docstring utility for automatic documentation generation.
- Removed legacy docstring handling to streamline the code and improve maintainability.

* refactor: simplify LightGlue image processing tests by inheriting from SuperGlue

- Refactored `LightGlueImageProcessingTester` and `LightGlueImageProcessingTest` to inherit from their SuperGlue counterparts, reducing code duplication.
- Removed redundant methods and properties, streamlining the test setup and improving maintainability.

* test: forced eager attention implementation to LightGlue model tests

- Updated `LightGlueModelTester` to include `attn_implementation="eager"` in the model configuration.
- This change aligns the test setup with the recent updates in LightGlue configuration for eager attention.

* refactor: update LightGlue model references

* fix: import error

* test: enhance LightGlue image processing tests with setup method

- Added a setup method in `LightGlueImageProcessingTest` to initialize `LightGlueImageProcessingTester`.
- Included a docstring for `LightGlueImageProcessingTester` to clarify its purpose.

* refactor: added LightGlue image processing implementation to modular file

* refactor: moved attention blocks into the transformer layer

* fix: added missing import

* fix: added missing import in __all__ variable

* doc: added comment about enforcing eager attention because of SuperPoint

* refactor: added SuperPoint eager attention comment and moved functions to the closest they are used

---------

Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-17 18:10:23 +02:00
2507169bf6 Fix qwen3 tests (#38862)
* fix

* update

* update

* update

* update

* update

* update

* format

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-17 15:21:36 +02:00
41e0c921cb Improve auxiliary_in_channels default behavior in UperNet (#37540)
Improve auxiliary_in_channels behavior in UperNet

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-17 12:56:46 +00:00
c61ca64aaa Fix qwen2_5_vl tests (#38845)
* fix

* breakpoint()

* breakpoint()

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-17 10:55:24 +02:00
37367c7d9f Allow customization of sdpa in executorch.py (#38827)
Earlier PR put executorch specific sdpa and mask function in the export function. This prevent any customization that can be done to sdpa, prior to export. By moving this to __init__, we still keep the original behavior but allow users like optimum-executorch to override sdpa by setting model.config._attn_implementation.
2025-06-17 10:38:20 +02:00
9c878d2f64 Fix incorrect width ratio calculation in Llama4 image processor (#38842) 2025-06-17 07:33:36 +00:00
bf370e446b [video processor] fix BC when no video config if found (#38840)
fix auto video processor
2025-06-17 09:20:16 +02:00
e61160c5db Remove merge conflict artifacts in Albert model doc (#38849) 2025-06-16 14:21:18 -07:00
64e9b049d9 Updated aya_vision.md (#38749)
* Update aya_vision.md

* Suggested changes made to aya_vision.md

* Quantization Example added - aya_vision.md

* Polished - aya_vision.md

* Update aya_vision.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-16 10:46:30 -07:00
5ab0f447ab GraniteMoeHybrid: Allow for only shared expert case. (#38801)
* Allow for only shared expert case.

* Style
2025-06-16 16:15:42 +01:00
a7593a1d1f [BugFix] QA pipeline edge case: align_to_words=True in QuestionAnsweringPipeline can lead to duplicate answers (#38761)
* fixing the problem align_to_words=True leading to duplicate solutions

* adding tests

* some fixes

* some fixes

* changing the handle_duplicate_answers=False by default

* some fixese

* some fixes

* make the duplicate handling the default behaviour and merge duplicates

* make the duplicate handling the default behaviour
2025-06-16 15:01:22 +00:00
18c7f32daa Fix broken tag in Longformer model card (#38828) 2025-06-16 07:44:40 -07:00
b44b04ee9a Fix broken notebooks link in Italian training docs (#38834) 2025-06-16 07:38:51 -07:00
9300728665 Fix peft integration (#38841)
Update peft.py
2025-06-16 10:39:25 +02:00
608884960e add default mapping to peft integration 2025-06-16 10:23:51 +02:00
ce6ac53ac1 bugfix: propage weight key_mapping to peft to fix 3.52 VLM renaming (#38627)
* propage key mapping to peft

* propage key mapping to peft

* make requested changes

* revert
2025-06-16 10:10:23 +02:00
925da8ac56 Fix redundant code in Janus (#38826)
* minor mistake

* modify return statements
2025-06-16 06:53:59 +00:00
d2fd3868bb [internvl] fix video inference (#38811)
fix
2025-06-16 08:37:30 +02:00
d5d007a1a0 Updated Albert model Card (#37753)
* Updated Albert model Card

* Update docs/source/en/model_doc/albert.md

added the quotes in <hfoption id="Pipeline">

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

updated checkpoints

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

changed !Tips description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

updated text

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

updated transformer-cli implementation

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

changed text

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

removed repeated description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update albert.md

removed lines

* Update albert.md

updated pipeline code

* Update albert.md

updated auto model code, removed quantization as model size is not large, removed the attention visualizer part

* Update docs/source/en/model_doc/albert.md

updated notes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update albert.md

reduced a  repeating point in notes

* Update docs/source/en/model_doc/albert.md

updated transformer-CLI

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

removed extra notes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 14:58:06 -07:00
443aafd3d6 [docs] updated roberta model card (#38777)
* updated roberta model card

* fixes suggested after reviewing

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 12:02:44 -07:00
fdb5da59dd [docs] Update docs moved to the course (#38800)
* update

* update

* update not_doctested.txt

* slow_documentation_tests.txt
2025-06-13 12:02:27 -07:00
8b73799500 fixed docstring in modular_qwen2_5_vl.py (#38798)
* fixed docstring in modular_qwen2_5_vl.py

* Regenerate file to match docstring update
2025-06-13 11:09:51 -07:00
9bec2654ed Add V-JEPA for video classification model (#38788)
* adding model and conversion scripts

* add imports to test vjepa conversion

* fix imports and make conversion work

* fix computation for short side

* replace attention with library attention function

* cleanup more attention classes

* remove config overrides

* add test cases, fix some of the failing ones

* fix the model outputs

* fix outputs of the model per review

* fix too big model test case

* fix styling __init__.py

* fix initialization test

* remove all asserts per review

* update sorting unsorting logic as per feedback

* remove is_video per review

* remove another is_video segment

* remove unwanted stuff

* small fixes

* add docstrings for the model

* revert adding vjepa2 config here

* update styling

* add config docstrings (wip)

* fix dpr issue

* removed test failing issues

* update styles

* merge predictor configs into main config

* remove processing code, add video processor

* remove permute which is not necessary now

* fix styles

* updated vjepa2 to be in video_processing_auto

* update comment for preprocessing

* test integration test and fix the outputs

* update test values, change test to look at repeated frames for a given image

* add a simple video processing test

* refactoring pixel_values_videos and upload ckpts to original

* fix torch_fx test cases

* remove unused config

* add all config docstrings

* add more integration tests

* add basic doc

* revert unwanted styling changes

* working make fixup

* Fix model_type in config

* Add ForVideoClassification model

* update attention implementation to fit new hf standards

* fix the preprocessing logic, ensure it matches the original model

* remove use_rope logic, cleanup

* fix docstrings

* Further cleanup, update doc

* Fix model prefix

* fix get_vision_features

* VJEPA2Embeddings style refactor

* nit, style comment

* change modules default values

* Only `str` activation in config

* GradientCheckpointingLayer

* fixup

* fix conversion script

* Remove return_dict

* remove None return typehint

* Refactor VJEPA2Layer, remove use_SiLU

* Fix fx tests

* dpr -> drop_path_rates

* move *ModelOutput on top

* format docs bit

* update docs

* update docs

* update doc example

* remove prune_heads from model

* remove unused config params

* refactor embed signature

* Add vjepa to docs

* Fix config docstring

* attention head

* update defaults

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fix import

* Min refactoring

* Update HUB_SOURCE and HUB_REPO in conversion script

* Add missing headers

* VJEPA -> V-JEPA in docs

* Add image to doc

* fix style

* fix init weights

* change checkpoint name in modeling tests

* Initial cls head setup

* remove rop attention from head (not needed)

* remove swigluffn - not needed

* Add siglip layer

* Replace with siglip layer

* Rename Siglip - VJEPA2

* remove unused modules

* remove siglip mlp

* nit

* remove MLP

* Refactor head cross attention

* refactor VJEPA2HeadCrossAttentionLayer

* nit renaming

* fixup

* remove commented code

* Add cls head params to config

* depth from config

* move pooler + classifier  to the model

* Update for cls model signature

* move layers, rename a bit

* fix docs

* update weights init

* remove typehint for init

* add to auto-mapping

* enable tests

* Add conversion script

* fixup

* add to docs

* fix docs

* nit

* refactor for mapping

* clean

* Add integration test

* Fixing multi gpu test

* update not-split-modules

* update video cls test tolerance

* Increase test_inference_image tolerance

* Update no-split modules for multi gpu

* Apply suggestions from code review

* fixing multi-gpu

* fix docstring

* Add cls snippet to docs

* Update checkpoint
2025-06-13 17:56:15 +01:00
2ff964bcb4 Fix trainer.py not showing signature columns (#38465)
Fix trainer.py not showing signature columns
2025-06-13 15:39:29 +00:00
4c3c177ecf Fix a minor security issue (#38815)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-13 17:37:46 +02:00
93445aed06 change fsdp_strategy to fsdp in TrainingArguments in accelerate doc (#38807) 2025-06-13 15:32:40 +00:00
b82a45b3b4 Refactor DBRX tests to use CausalLMModelTest base classes (#38475)
* Refactor DBRX tests to use CausalLMModelTest base classes

- Changed DbrxModelTester to inherit from CausalLMModelTester
- Changed DbrxModelTest to inherit from CausalLMModelTest
- Removed duplicate methods that are already in base classes
- Added required class attributes for model classes
- Updated pipeline_model_mapping to include feature-extraction
- Kept DBRX-specific configuration and test methods
- Disabled RoPE tests as DBRX's rotary embedding doesn't accept config parameter

This refactoring reduces code duplication and follows the pattern established
in other causal LM model tests like Gemma.

* Apply style fixes

* Trigger tests

* Refactor DBRX test

* Make sure the DBRX-specific settings are handled

* Use the attribute_map

* Fix attribute map

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-13 16:22:12 +01:00
64041694a8 Use wandb.run.url instead of wandb.run.get_url() (deprecated) (#38817) 2025-06-13 15:20:04 +00:00
9ff246db00 Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
e39172ecab Fix llava_next tests (#38813)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-13 15:19:41 +02:00
b3b7789cbc Better pipeline type hints (#38049)
* image-classification

* depth-estimation

* zero-shot-image-classification

* image-feature-extraction

* image-segmentation

* mask-generation

* object-detection

* zero-shot-object-detection

* image-to-image

* image-text-to-text

* image-to-text

* text-classification

* text-generation

* text-to-audio

* text2text_generation

* fixup

* token-classification

* document-qa

* video-classification

* audio-classification

* automatic-speech-recognition

* feature-extraction

* fill-mask

* zero-shot-audio-classification

* Add pipeline function typing

* Add code generator and checker for pipeline types

* Add to makefile

* style

* Add to CI

* Style
2025-06-13 13:44:07 +01:00
c989ddd294 Simplify and update trl examples (#38772)
* Simplify and update trl examples

* Remove optim_args from SFTConfig in Trainer documentation

* Update docs/source/en/trainer.md

* Apply suggestions from code review

* Update docs/source/en/trainer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <qgallouedec@Quentins-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 12:03:49 +00:00
de24fb63ed Use HF papers (#38184)
* Use hf papers

* Hugging Face papers

* doi to hf papers

* style
2025-06-13 11:07:09 +00:00
1031ed5166 Disable custom MRA kernels for ROCm (#38738)
* Disable custom MRA kernels for ROCm

* Move platform check code to utils

* Ruff

* Ruff again

* Fix querying HIP version

* Revert some changes

* Add missing return statement

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-06-13 12:25:28 +02:00
7f00b325f8 Unbreak optimum-executorch (#38646)
* Unbreak optimum-executorch

* use static cache if has layer_types but no sliding_window

* revert view on kv_arange

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
2025-06-13 11:13:32 +02:00
5f59a9b439 Fix configs and doc for the Qwens (#38808)
fix doc and configs
2025-06-13 11:10:55 +02:00
8222a9325d Fix erroneous docstring for the ordering of SWA layers (#38794) 2025-06-13 10:46:44 +02:00
e26ae89281 [docs] update cache docs with new info (#38775)
* update docs with new info

* Update docs/source/en/kv_cache.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 07:10:56 +00:00
324cc77dc3 refactor create_token_type_ids_from_sequences (#37681)
* rm build_input.. from old file

* refactor create_token_type_ids_from_sequences

* handle when cls_token_id is None

* updated fix

* markuplm

* refactoring rest of models

* copies

* revert funnel

* rm incorrect file

* ruff

* ruff
2025-06-12 23:24:43 +02:00
85f060e9b0 Updated moonshine modelcard (#38711)
* Moved the sources to the right

* small Changes

* Some Changes to moonshine

* Added the install to pipline

* updated the monshine model card

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated Documentation According to changes

* Fixed the model with the commits

* Update moonshine.md

* Update moshi.md

---------

Co-authored-by: Your Name <sohamprabhu@Mac.fios-router.home>
Co-authored-by: Your Name <sohamprabhu@Sohams-MacBook-Air.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-12 10:27:17 -07:00
645cf297cc Add missing div in Pegasus model card (#38773)
Add missing div
2025-06-12 10:27:07 -07:00
346f341630 [Docs] New DiT model card (#38721)
* documenation finished

* Update dit.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-12 10:26:50 -07:00
4b8ec667e9 Remove all traces of low_cpu_mem_usage (#38792)
* remove it from all py files

* remove it from the doc

* remove it from examples

* style

* remove traces of _fast_init

* Update test_peft_integration.py

* CIs
2025-06-12 16:39:33 +02:00
3542e0b844 build: 📌 Remove upper bound on PyTorch (#38789)
build: 📌 remove upper bound on torch dependency as issue which originally resulted in the pin has been released in torch 2.7.1
2025-06-12 16:34:13 +02:00
eea35a15b0 Fix mllama (#38704)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-12 16:15:35 +02:00
038a59e2cd Initialize flash attn flag (#38768)
_flash_supports_window_size is used further down in this file and relied on by e.g. [ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention/blob/123f924/ring_flash_attn/adapters/hf_adapter.py#L9-L11). Even though it is an unexported name, it still makes sense to keep the state of `globals()` in this file consistent.
2025-06-12 14:06:13 +00:00
910355a010 Fix Typos in Comments: "quantitation" → "quantization", "averege" → "average" (#38766)
* Update convert_llama4_weights_to_hf.py

* Update modeling_visual_bert.py
2025-06-12 14:04:39 +00:00
6a5fd0c6d2 Reword README in light of model definitions (#38762)
* Slight readme reword

* reword

* reword

* reword

* Slight readme reword
2025-06-12 14:43:31 +01:00
c87058beb8 Fix llava_onevision tests (#38791)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-12 15:06:49 +02:00
d4e7aa5526 Fix qwen_2_5 omni (#38658)
* fix

* fix

* break style

* break style

* Apply style fixes

* break style

* Apply style fixes

* fix modular

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-12 14:43:54 +02:00
e1812864ab [docs] Add int4wo + 2:4 sparsity example to TorchAO README (#38592)
* update quantization readme

* update

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-12 12:17:07 +00:00
bc68defcac Update PULL_REQUEST_TEMPLATE.md (#38770) 2025-06-12 14:03:33 +02:00
960fda25d1 Reduce verbosity for average_tokens_across_devices=True and world size = 1 (#38785)
* Warning to info for average_tokens_across_devices and world size = 1

* Update src/transformers/training_args.py
2025-06-12 14:02:53 +02:00
89c46b648d Skip some export tests on torch 2.7 (#38677)
* skip

* fix

* better check

* Update import_utils.py

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-12 12:47:15 +02:00
27459025b8 [video processors] support frame sampling within processors (#38105)
* apply updates smolVLM (still needs workaround for chat template)

* add other models

* dump qwen omni for now, come back later

* port qwen omni from their impl

* wait, all qwens sample videos in same way!

* clean up

* make smolvlm backwards compatible and fix padding

* dix some tests

* fox smolvlm tests

* more clean up and test fixing

* delete unused arg

* fix

* address comments

* style

* fix test
2025-06-12 09:34:30 +00:00
887054c714 Fix masking utils (#38783)
* fix

* Update masking_utils.py

* Update masking_utils.py
2025-06-12 11:00:46 +02:00
7c58336949 [Hotfix] Fix style bot (#38779)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-12 10:20:36 +02:00
7c6b1707c3 [masking utils] check None instead of try/except (#38561)
* fix vllm's compile backend

* fix the test

* apply the same changes in other masking strategies
2025-06-12 06:50:28 +00:00
9487765f07 Add Qwen2 MoE model card (#38649)
* Add Qwen2 MoE model card

* Revisions to qwen2 moe model card

* Add Qwen2 MoE model card
2025-06-11 15:14:01 -07:00
32dbf4bddb Update altCLIP model card (#38306)
* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Rename altclip.md to altclip.mdx

* Rename altclip.mdx to altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-11 14:48:34 -07:00
1dcb022e8f chore(pixtral): emit block attention mask when using flash attention (#38741)
* chore(pixtral): emit block attention mask when using flash attention

Since flash_attention_2 relies solely on position_ids, emitting the block attention mask avoids unnecessary memory usage and prevents OOM on large inputs.

* remove unnecessary attention_mask assignment
2025-06-11 18:55:23 +00:00
60d4b35b20 Make style bot trigger CI after push (#38754)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-11 20:40:04 +02:00
bb44d2a0f6 Update pegasus model card (#38675)
* Update Pegasus model card

* Fix transformers-cli command

* Update code examples to use bfloat16

* Reverted code examples to use float16

* Fix typo, update checkpoints link

* Update str formatting in code examples

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

* Remove inaccurate badges

* Revert badge removal

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Include cache_implementation argument in quantization example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-11 10:56:25 -07:00
L
b84ebb7f3c fix(qwen3_moe): pass kwargs to self_attn (#38691)
This is needed to avoid `.item()` calls in `_flash_attention_forward`.
2025-06-11 19:26:08 +02:00
9f563ada70 Deprecate TF + JAX (#38758)
* Scatter deprecation warnings around

* Delete the tests

* Make logging work properly!
2025-06-11 17:28:06 +01:00
337757cbd5 Update repo consistency check (#38763) 2025-06-11 17:02:03 +01:00
e2bdc13375 Remove IPEX requirement for bitsandbytes on CPU (#38594)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 17:46:34 +02:00
063bef0865 Prepare for TF+Jax deprecation (#38760)
* Prepare for TF+Jax deprecation

* Remove .circleci jobs
2025-06-11 16:03:31 +01:00
11ad9be153 Better typing for num_items_in_batch (#38728)
* fix

* style

* type checking ?

* maybe this ?

* fix

* can't be an int anymore

* fix
2025-06-11 16:26:41 +02:00
84710a4291 Add V-JEPA 2 (#38746)
* adding model and conversion scripts

* add imports to test vjepa conversion

* fix imports and make conversion work

* fix computation for short side

* replace attention with library attention function

* cleanup more attention classes

* remove config overrides

* add test cases, fix some of the failing ones

* fix the model outputs

* fix outputs of the model per review

* fix too big model test case

* fix styling __init__.py

* fix initialization test

* remove all asserts per review

* update sorting unsorting logic as per feedback

* remove is_video per review

* remove another is_video segment

* remove unwanted stuff

* small fixes

* add docstrings for the model

* revert adding vjepa2 config here

* update styling

* add config docstrings (wip)

* fix dpr issue

* removed test failing issues

* update styles

* merge predictor configs into main config

* remove processing code, add video processor

* remove permute which is not necessary now

* fix styles

* updated vjepa2 to be in video_processing_auto

* update comment for preprocessing

* test integration test and fix the outputs

* update test values, change test to look at repeated frames for a given image

* add a simple video processing test

* refactoring pixel_values_videos and upload ckpts to original

* fix torch_fx test cases

* remove unused config

* add all config docstrings

* add more integration tests

* add basic doc

* revert unwanted styling changes

* working make fixup

* Fix model_type in config

* update attention implementation to fit new hf standards

* fix the preprocessing logic, ensure it matches the original model

* remove use_rope logic, cleanup

* fix docstrings

* Further cleanup, update doc

* Fix model prefix

* fix get_vision_features

* VJEPA2Embeddings style refactor

* nit, style comment

* change modules default values

* Only `str` activation in config

* GradientCheckpointingLayer

* fixup

* fix conversion script

* Remove return_dict

* remove None return typehint

* Refactor VJEPA2Layer, remove use_SiLU

* Fix fx tests

* dpr -> drop_path_rates

* move *ModelOutput on top

* format docs bit

* update docs

* update docs

* update doc example

* remove prune_heads from model

* remove unused config params

* refactor embed signature

* Add vjepa to docs

* Fix config docstring

* update defaults

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fix import

* Min refactoring

* Update HUB_SOURCE and HUB_REPO in conversion script

* Add missing headers

* VJEPA -> V-JEPA in docs

* Add image to doc

* fix style

* fix init weights

* change checkpoint name in modeling tests

---------

Co-authored-by: Koustuv Sinha <koustuv.sinha@mail.mcgill.ca>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Koustuv Sinha <koustuvsinha@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-06-11 15:00:08 +01:00
a6f0e2b64a Add z-loss to Bamba for v2 (#37842)
* Remove const

* Fix arg ref

* Sharded save

* Add z_loss flag

* Add modeling zloss

* Demodularize clm forward for zloss

* Also demodularize init for z_loss flag

* PR comments (mostly modularizing right)

* Demodularize forward

* Better name zloss and explain typematch

* Fully propagate coeff name

* style fixes

* zloss default float

* Remove conflicting annotations

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-11 15:29:17 +02:00
6b610d89f1 Revert "Trigger doc-builder job after style bot" (#38735)
Revert "Trigger doc-builder job after style bot (#38398)"

This reverts commit 51e0fac29fc3994d49dfbfd1c8d085d29360d393.
2025-06-11 14:56:39 +02:00
0bf53e69e2 [DeepSeek-V3] implement when q_lora_rank is None (#38743)
* implement when q_lora_rank is None

* make style and quality
2025-06-11 13:35:10 +01:00
ye
b426c2b313 fix: bf16 with TPU is allowed in configuration (#38670)
* fix: tpu bf16

* fix: style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 12:35:01 +00:00
c8c1e525ed from 1.11.0, torchao.prototype.low_bit_optim is promoted to torchao.optim (#38689)
* since 1.11.0, torchao.prototype.low_bit_optim is promoted to
torchao.optim

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix review comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 12:16:25 +00:00
56a7cf5546 fix: Add method to get image features in PaliGemmaForConditionalGeneration (#38730)
* fix: Add method to retrieve image features in PaliGemmaForConditionalGeneration

* feat: Add get_image_features method to multiple models for image feature extraction

* fix: reformat the files with ruff.

* feat: Add methods for packing and retrieving image and video features across multiple models

modified:
- modeling_chameleon.py
- modeling_llava_next.py
- modular_llava_next_video.py
- modeling_qwen2_vl.py

and generate the:
- modeling_llava_next_video.py
- modeling_llava_onevision.py
- modeling_qwen2_5_vl.py

* feat: Implement get_image_features method in Aria, Mistral3, and VipLlava models with updated parameters

* fix: reformatted the code with fix-style
2025-06-11 10:26:31 +00:00
380e6ea406 [llava] fix integration tests with Siglip (#38732)
fix llava siglip test
2025-06-11 08:09:16 +00:00
f1849eab22 Fixed a multiple-devices issue in SmolVLM model (#38736)
Fixed a multiple-devices issue in SmolVLMModel (#38557)

* Fixed a multiple-devices issue in SmolVLMModel

* Changed the modular to reflect changes
2025-06-11 10:08:01 +02:00
aa798b7ac9 New canine model card (#38631)
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Commit for new_gpt_model_card.

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* commit for new canine model card.

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* implemented suggestion by @stevhliu.

* Update canine.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-10 09:30:05 -07:00
e28fb26e7d Add AGENTS.md (#38734)
* More name sync

* repeatedly underlining "WRITE LESS, ROBOT"

* fewer, commas, please

* Clarify "copied from"

* Clarify "copied from"

* Mention test dependencies

* Added a line on preferring `modular` style
2025-06-10 16:27:37 +00:00
cb4c56ce0d Fix typo in Language Modeling example scripts and update TPU type (#38652)
* Fix typo that prevents the examples to be run correctly

* return .TPU in accelerator.distributedtype comparison
2025-06-10 13:43:35 +00:00
8ff22e9d3b [add-new-model-like] Robust search & proper outer '),' in tokenizer mapping (#38703)
* [add-new-model-like] Robust search & proper outer '),' in tokenizer mapping

* code-style: arrange the importation in add_new_model_like.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-10 12:25:12 +00:00
8340e8746e Use OSError (#38712)
Signed-off-by: cyy <cyyever@outlook.com>
2025-06-10 12:13:49 +00:00
8257734b5f Fix llava tests (#38722)
* update

* fix 1

* fix 2

* fix 3

* fix 4

* fix 5

* fix 6

* fix 7

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-10 13:53:17 +02:00
71f7385942 Logging message for `` is_bitsandbytes_available() `` (#38528)
* bnb import log

* bnb import log

* log mesage change

* moved error issue in qunatizer_bnb_4_bit.py

* ruff

* arg added for bnb check

* required changes

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-10 10:15:01 +00:00
04cdf83244 Update some tests for torch 2.7.1 (#38701)
* fix 1

* fix 2

* fix 3

* fix 4

* fp16

* break

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-10 11:46:52 +02:00
afdb821318 Fix smart resize (#38706)
* Fix smart_resize bug

* Add smart_resize test

* Remove unnecessary error checking

* Fix smart_resize tests

---------

Co-authored-by: Richard Dong <rdong@rdong.c.groq-143208.internal>
2025-06-10 08:59:22 +00:00
81799d8b55 Standardize ByT5 model card format (#38699)
* Standardize ByT5 model card format

* Apply review feedback from @stevhliu

* Fix Notes formatting and wording

* Fix `aya_vision` test (#38674)

* fix 1: load_in_4bit=True,

* fix 2: decorateor

* fixfix 2: breakpoint

* fixfix 3: update

* fixfix 4: fast

* fixfix 5: cond

* fixfix 5: cond

* fixfix 6: cuda 8

* ruff

* breakpoint

* dtype

* a10

* a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix autodoc formatting for ByT5Tokenizer

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-09 15:02:50 -07:00
e55983e2b9 Fix aya_vision test (#38674)
* fix 1: load_in_4bit=True,

* fix 2: decorateor

* fixfix 2: breakpoint

* fixfix 3: update

* fixfix 4: fast

* fixfix 5: cond

* fixfix 5: cond

* fixfix 6: cuda 8

* ruff

* breakpoint

* dtype

* a10

* a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-09 22:18:52 +02:00
b61c47f5a5 Created model card for xlm-roberta-xl (#38597)
* Created model card for xlm-roberta-xl

* Update XLM-RoBERTa-XL model card with improved descriptions and usage examples

* Minor option labeling fix

* Added MaskedLM version of XLM RoBERTa XL to model card

* Added quantization example for XLM RoBERTa XL model card

* minor fixes to xlm roberta xl model card

* Minor fixes to mask format in xlm roberta xl model card
2025-06-09 13:00:38 -07:00
e594e75f1b Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596)
* Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout

* Added CLI command example and quantization example for XLM RoBERTa model card.

* Minor change to transformers CLI and quantization example for XLM roberta model card
2025-06-09 12:26:31 -07:00
29ca043856 Created model card for XLM model (#38595)
* Created model card for XLM model

* Revised model card structure and content of XLM model

* Update XLM model documentation with improved examples and code snippets for predicting <mask> tokens using Pipeline and AutoModel.
2025-06-09 12:26:23 -07:00
25f711aa89 Drop as_target_processor from the _call_ and pad methods (#38642)
Drop as_target_processor from _call_ and pad methods; reformat docstrings for readability
2025-06-09 12:26:09 -07:00
837ddac1ec Docs: update bitsandbytes torch.compile compatibility (#38651) 2025-06-09 14:51:57 -04:00
b9faf2f930 Fix TypeError: 'NoneType' object is not iterable for esm (#38667) (#38668)
Add post_init() calls to EsmForMaskedLM, EsmForTokenClassification and EsmForSequenceClassification.
2025-06-09 15:23:20 +00:00
11dca07a10 Fix retrieve function signature and remove faiss requirement (#38624)
Signed-off-by: Fiona Waters <fiwaters6@gmail.com>
2025-06-09 15:17:33 +00:00
b31d462c61 Fix some models import (#38694)
Fix models import
2025-06-09 16:09:24 +01:00
282d6684dc Fix attention mask expansion when converting to executorch (#38637) 2025-06-09 15:00:55 +00:00
19224c3642 fix: "check out" as verb (#38678)
"check out" as verb
2025-06-09 14:07:31 +00:00
237ff80387 Fixed modeling_auto.py MODEL_FOR_MASK_GENERATION_MAPPING_NAMES variable (#38664)
fix: grouped the two MODEL_FOR_MASK_GENERATION_MAPPING_NAMES variables
2025-06-09 13:40:46 +00:00
d7b87b415a Fix qwen2-audio chat template audio placeholder insertion (#38640)
* fix qwen2-audio template

Signed-off-by: Isotr0py <2037008807@qq.com>

* add message['type'] back

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-09 09:56:42 +00:00
10627c1a0f Use torch 2.7.1 on daily CI (#38620)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-08 14:37:45 +02:00
ebeec13609 Fix InternVL integration test (#38612)
* fix

* fix

* fix OOM

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-07 08:30:47 +02:00
3fb7e7bc01 Skip torchscript tests for 2 models (#38643)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 20:17:37 +02:00
dc76eff12b remove ipex_optimize_model usage (#38632)
* remove ipex_optimize_model usage

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update Dockerfile

Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>
Co-authored-by: root <root@a4bf01945cfe.jf.intel.com>
2025-06-06 20:04:44 +02:00
5009252a05 Better CI (#38552)
better CI

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 17:59:14 +02:00
2e889c18e1 fix torch_dtype on awq (#38463)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-06 17:14:00 +02:00
871901cb3d fix total batch size calculation in trainer (#38286)
* fix total batch size calculation

* update

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

* Update src/transformers/trainer.py

---------

Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-06 14:54:00 +00:00
02f946a038 Don't run AriaForConditionalGenerationModelTest on CircleCI (#38615)
git rid of this model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 11:30:31 +02:00
3d15606e64 fix: support grad clipping for TP through replicating non-sharded modules (#36132)
* feat: fix tp grad norm:

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: use implicit replication

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-06 11:07:22 +02:00
fca6748246 Improve test_initialization for SwiftFormer (#38636)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 10:47:10 +02:00
92a87134ea update ColQwen2ModelIntegrationTest (#38583)
* update

* update

* update

* update

* 4 bit

* 8 bit

* final

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 10:41:17 +02:00
dbfc79c17c [generation] bring back tests on vision models (#38603)
* bring back geenration tests on VLMs

* remove head mask tests overwritten
2025-06-06 08:23:15 +00:00
90c4b90a10 Use torch 2.7.1 on CircleCI jobs (#37856)
2.7.1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 10:16:57 +02:00
3e35ea1782 Improve test_initialization (#38607)
* fix flaky init tests

* fix flaky init tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 10:08:05 +02:00
89542fb81c enable more test cases on xpu (#38572)
* enable glm4 integration cases on XPU, set xpu expectation for blip2

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* refine wording

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* refine test case names

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* run

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* add gemma2 and chameleon

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix review comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-06 09:29:51 +02:00
31023b6909 Fix MiniMax (docs and integration tests checkpoint) (#38575)
* update checkpoints for integration tests

* minor fixes in docs
2025-06-06 08:43:11 +02:00
593e29c5e2 Updated Aria model card (#38472)
* Update aria.md

* Update aria.md

* Suggested Updates - aria.md
2025-06-05 14:36:54 -07:00
77cf4936fe [Nit] Add Note on SigOpt being in Public Archive Mode (#38610)
* add note on sigopt

* update

* Update docs/source/en/hpo_train.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-05 14:07:23 -07:00
c75bf2c36e Fix typo in LLaVa documentation (#38618)
* Fix typo in LLaVa documentation

In exactly one section, LlavaImageProcessor was spelt wrongly as LLavaImageProcessor, which throws off copy-pasting the section.

* Fix LlavaImageProcessor url to make it valid (and copypaste-able)

Earlier, the URL contained the entire HF prefix. This commit removes that to ensure that the code block can be copied and run as is.
2025-06-05 13:25:07 -07:00
5399c1d670 docs: fix dark mode logo display. (#38586) 2025-06-05 13:06:59 -07:00
481b953170 Fix return_dict=False giving errors in a few VLM models (#38519)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-05 21:19:07 +02:00
88912b8e95 Remove isort from dependencies (#38616)
Removed isort as a dependency
2025-06-05 16:42:49 +00:00
fa921ad854 fix spelling errors (#38608)
* fix errors test_modeling_mllama.py

* fix error test_modeling_video_llava.py

* fix errors test_processing_common.py
2025-06-05 13:57:23 +01:00
0f833528c9 Avoid overwrite existing local implementation when loading remote custom model (#38474)
* avoid overwrite existing local implementation when loading custom remote model

Signed-off-by: Isotr0py <2037008807@qq.com>

* update comments

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-05 13:54:40 +01:00
8f630651b0 Allow mlm_probability to be set to None when mlm=False in DataCollatorForLanguageModeling (#38522) (#38537)
* mlm_probability in DataCollatorForLanguageModeling should be validated only when mlm is True (#38522)

* Change mlm_probability to Optional in DataCollatorForLanguageModeling (#38537)

---------

Co-authored-by: eak <eak@ivalua.com>
2025-06-05 13:54:12 +01:00
65f5fa71cd Bump torch from 2.6.0 to 2.7.1 in /examples/flax/vision (#38606)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.6.0 to 2.7.1.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.6.0...v2.7.1)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-05 13:38:02 +01:00
8c59cdb3f8 pin pandas (#38605)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-05 11:33:06 +02:00
8cfcfe58c0 Remove custom pytest and pluggy (#38589)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-05 10:23:40 +02:00
0d69fa6dcd [qwen-omni] fix sliding window (#38525)
fix
2025-06-05 10:11:58 +02:00
1fed6166c0 added fast image processor for ZoeDepth and expanded tests accordingly (#38515)
* added fast image processor for ZoeDepth and expanded tests accordingly

* added fast image processor for ZoeDepth and expanded tests accordingly, hopefully fixed repo consistency issue too now

* final edits for zoedept fast image processor

* final minor edit for zoedepth fast imate procesor
2025-06-04 22:59:17 +00:00
a510be20f3 Updated deprecated typing imports with equivalents for Python 3.9+ (#38546)
* Replace deprecated typing imports with collections.abc equivalents for Python 3.9+

* Fixed code quality

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-04 16:57:23 +00:00
8e1266de2b New gpt neo model card (#38505)
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Commit for new_gpt_model_card.

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-04 09:56:47 -07:00
8046aff520 tests/roformer: fix couple roformer tests on gpus (#38570)
Fix "RuntimeError: Expected all tensors to be on the same device,
but found at least two devices, cuda:0 and cpu" error running the
following roformer tests on GPUs (CUDA or XPU):

```
tests/models/roformer/test_modeling_roformer.py::RoFormerSinusoidalPositionalEmbeddingTest::test_basic
tests/models/roformer/test_modeling_roformer.py::RoFormerSelfAttentionRotaryPositionEmbeddingTest::test_apply_rotary_position_embeddings
```

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-06-04 18:45:56 +02:00
b9c17c5dc0 [Dinov2] Enable device_map="auto" support (#38487)
* Fix: resolve import order and duplicate import (ruff I001, F811)

* Format: clean up Dinov2 test file with ruff formatter

* Add _no_split_modules = ['Dinov2Layer'] to enable device_map='auto'

* Revert dinov2_with_registers _no_split_modules to original state

* Remove redundant device_map test as suggested

* Remove unused import after deleting test

* removed import  torch and the redundant test function

* Update tests/models/dinov2/test_modeling_dinov2.py

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-04 15:42:40 +00:00
ae3733f06e feat: add repository field to benchmarks table (#38582)
* feat: add `repository` field to benchmarks table

* fix: remove unwanted `,`
2025-06-04 15:40:52 +02:00
1285aec4cc Docs: fix code formatting in torchao docs (#38504) 2025-06-04 12:35:21 +00:00
6c5d4b1dd2 allow custom head_dim for qwen2_moe (#37188)
allow custom head_dim

Co-authored-by: ryan.agile <ryan.agile@kakaobrain.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-04 12:27:30 +00:00
82fa68ca14 fix(attention_visualizer): add default value for image_seq_length (#38577) 2025-06-04 12:20:31 +00:00
1dc619e59f [FlexAttn] Fix models with unique characteristics (#38433)
* fix

* style

* check

* check 2

* add deepseek workaround
2025-06-04 13:37:28 +02:00
ff3fad61e3 Fix deepseekv3 (#38562)
* fix 1

* fix 2

* fix 3

* fix 4

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-04 11:40:14 +02:00
6085cded38 update utils/notification_service.py for AMD vs Nvidia (#38563)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-04 11:38:25 +02:00
3c995c1fdc Fix chameleon tests (#38565)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-04 10:13:35 +02:00
55736eea99 Add support for MiniMax's MiniMax-Text-01 (#35831)
* end-to-end architecture

* lightning-attn: refactor, clean, optimize

* put minimax_text_01 in other files

* use latest __init__ standards and auto-generate modular

* support attention_mask for lightning-attn

* Revert "use latest __init__ standards and auto-generate modular"

This reverts commit d8d3c409d89e335c98a8cd36f47304a76eac7493.

* fix modular conversion

* pass both attention masks instead of tuple

* formatting

* Updated Dynamic Cache

* created MiniMaxText01Cache

* fix hardcoded slope_rate

* update attn_type_list in config

* fix lightning when use_cache=False

* copy tests from mixtral

* (checkpoint) all tests pass for normal attention

* fix all unittests

* fix import sorting

* fix consistency and formatting tests

* fix config

* update tests, since changes in main

* fix seq_len error

* create dummy docs

* fix checkpoint

* add checkpoint in config docstring

* run modular_conversion

* update docs

* fix checkpoint path and update tests

* fix ruff

* remove repeated expected_slice

* update docs

* rename "minimax-text-01" to "minimax"

* inherit config from mixtral

* remove from docs in other languages

* undo files that should be untouched

* move minimax to end in conversation docs

* use MiniMaxForCausalLM as it is

* ruff fixes

* run modular

* fix docstring example in causallm

* refactor attention loop and decay factors

* refactor config in modular

* run modular

* refactor cache

* rename static_cache to linear_cache

* make positional embeddings necessary

* remove unnecessary layernorms declarations

* fix import in tests

* refactor attention in next tokens

* remove outdated code

* formatting and modular

* update tests

* rename layernorm alpha/beta factors

* register decay factors as buffers

* remove unused declarations of decay factors

* update config for alpha/beta factors

* run modular

* remove head_dim in tests

* remove minimax from fx.py

* remove stuff that is not really needed

* update __init__

* update qkv torch.split

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix qkv torch.split

* quality fixes

* remove mistakenly added dummy

* purge unused ModelTester code

* fix-copies

* run fix-copies

* fix head_dim

* write cache formatting tests

* remove postnorm

* avoid contiguous in attention current states

* update expected_slice

* add generation test for integration

* fix dtype in generation test

* update authors

* update with changes in main

* update graident checkpointing and minor fixes

* fix mutable attn_type_list

* rename: attn_type -> layer_type

* update for layer_types

* update integration tests

* update checkpoint

* clean overview in docs

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-04 09:38:40 +02:00
037acf1d10 [janus] Fix failing tests on mi3XX (#38426)
* Fix multiple devices error on Janus

* Fix AttributeError on Janus BOI token

* Initialize lm first in Janus to get correct device map

* Added expectations for Janus test_model_generate_images

* Fixed JanusVisionEncoderLayer being split across devices

* Code formatting

* Adding modeling file

* Reverted changes out of scope for this PR
2025-06-04 09:38:10 +02:00
78d771c3c2 [docs] Format fix (#38414)
fix table
2025-06-03 09:53:23 -07:00
0f41c41a46 Fix hqq issue (#38551)
* bc

* style
2025-06-03 17:58:31 +02:00
279000bb70 Name change AOPermod -> ModuleFqn (#38456)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-03 15:43:31 +00:00
e8b292e35f Fix utils/notification_service.py (#38556)
* fix

* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-03 13:59:31 +00:00
8cb96787a6 Explicitly setting encoding in tokenization_utils_base.py (#38553)
Update tokenization_utils_base.py

Add encoding explicitly
2025-06-03 12:08:35 +00:00
caf708da1b [TP] Change command in tests to python3 (#38555)
* Fix: change to `python3`

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-03 11:03:33 +00:00
fdf86fb440 [bugfix] [WIP] fix apply_rotary_emb error on Ascend NPU (#38491)
[bugfix] fix apply_rotary_emb error on Ascend NPU
2025-06-03 09:31:49 +00:00
ca0a682796 Update docker image to use av (#38548)
* Update

* Update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-03 11:04:41 +02:00
814432423c update emu3 test (#38543)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-06-03 11:02:01 +02:00
55ec319de6 Don't use default attn if pre-set in sub-config (#38526)
* don't use default attn if pre-set in sib-config

* style

* add a test maybe
2025-06-03 07:53:07 +00:00
bf68dd9e6e [tests] expand flex-attn test for vision models (#38434)
* expand the test for VLMs

* typo

* mark models `supports_flex` + expand test for additional kwargs

* flex attn for refactored vision models

* fix copies

* fix

* unskip

* style

* address comments
2025-06-03 07:40:44 +00:00
de4cf5a38e Fix blip2 tests (#38510)
* fix 1: not sure

* fix 2: _supports_flex_attn = False

* fix 3: embedding_output = self.layernorm(query_embeds.to(self.layernorm.weight.dtype))

* fix 4: query_embeds = query_embeds.to(self.layernorm.weight.dtype)

* fix 5: text_embeds = text_embeds.to(dtype=torch.float16)

* fix 5: question_embeds.to(dtype=torch.float16)

* fix 6: text_embeds = text_embeds.to(dtype=self.itm_head.weight.dtype)

* fix 7: image_embeds and question_embeds

* fix 8: fix other 2 fp16 tests

* fix 9: fix T5 OOM

* fix 10: fix T5 OOM

* fix 11: fix T5

* fix 11: fix T5 beam

* fix 12: _supports_sdpa=False

* fix 12: style and expect

* revert

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-02 22:46:35 +02:00
ccc859620a Fix Gemma2IntegrationTest (#38492)
* fix

* fix

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* update

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-02 22:45:09 +02:00
1094dd34f7 Remove type annotation in Siglip Attention Module (#38503)
* Remove type annotation

* remove print statement
2025-06-02 17:51:07 +02:00
afb35a10ed Num parameters in model.safetensors.index.json (#38531)
Num parameters in index.json
2025-06-02 17:16:31 +02:00
cceab972ba [flax/mistral] support sliding_window: null in config (#37402)
flax/mistral: Allow sliding_window to be set to none
2025-06-02 16:45:02 +02:00
1a25fd2f6d Fix amp deprecation issue (#38100)
apex amp is deprecated
2025-06-02 16:15:41 +02:00
05ad826002 remove unhandled parameter (#38145) 2025-06-02 15:57:32 +02:00
c72ba69441 Add ColQwen2 to 🤗 transformers (#35778)
* feat: add colqwen2 (wip)

* tests: fix test_attention_outputs

* tests: reduce hidden size to accelerate tests

* tests: fix `test_attention_outputs` 🥳

* fix: fix wrong parent class for `ColQwen2ForRetrievalOutput`

* fix: minor typing and style changes

* chore: run `make style`

* feat: remove redundant `max_num_visual_tokens` attribute in `ColQwen2Processor`

* tests: tweak comments

* style: apply ruff formatter

* feat: move default values for `visual_prompt_prefix` and `query_prefix`

* docs: update ColQwen2 model card

* docs: tweak model cards

* docs: add required example config checkpoint

* tests: update expected scores in integration test

* docs: tweak quickstart snippets

* fix: address PR comments

* tests: fix colqwen2 tests + tweak comment in colpali test

* tests: unskip useful tests

* fix: fix bug when `visual_prompt_prefix` or `query_prefix` is an empty string

* fix: fix ColPali outputs when `return_dict == False`

* fix: fix issue with PaliGemma output not being a dict

* docs: set default dtype to bfloat16 in quickstart snippets

* fix: fix error when `return_dict=False` in ColPali and ColQwen2

* tests: fix special tokens not being replaced in input_ids

* style: fix lint

* fix: `ColQwen2Processor`'s `padding_side` is now set from `processor_config.json`

* fix: remove unused `padding_side` in ColQwen2 model

* docs: update ColQwen2's model doc

* fix: fix harcoded vlm backbone class in ColQwen2Config

* fix: remove `padding_side` from ColQwen2Processor as should fed from kwargs

* docs: fix typo in model docstring

* docs: add illuin mention in model docs

* fix: let `padding_size` be handled by `tokenizer_config.json`

* docs: add colpali reference url in colqwen2's model doc

* docs: add Hf mention in model docs

* docs: add late interaction mention in model docs

* docs: tweak colqwen2 model doc

* docs: update reference checkpoint for ColPali to v1.3

* docs: simplify quickstart snippets

* docs: remove redundant `.eval()`

* refactor:  use `can_return_tuple` decorator for ColPali and ColQwen2

* docs: fix copyright date

* docs: add missing copyright in tests

* fix: raise error when `initializer_range` is not in config

* docs: remove redundant `.eval()` in colpali doc

* fix: fix `get_text_config` now that Qwen2VL has a proper `text_config` attribute

See https://github.com/huggingface/transformers/pull/37268 for details about changes in Qwen2VL's config.

* fix: add missing `initializer_range` attribute in `ColQwen2Config`

* fix: use `get_text_config` in `resize_token_embeddings`

* update colwen2 with auto_docstring

* docs: fix wrong copyright year

* chore: remove `raise` as `initializer_range` has a default value in `ColQwen2Config`

* refactor: merge `inner_forward` into `forward`

* Refactor colqwen2 after refactoring of qwen2VL, use modular for modeling code

* protect torch import in modular to protect in processing

* protect torch import in modular to protect in processing

* tests: fix hf model path in ColQwen2 integration test

* docs: clarify `attn_implementation` and add comments

* docs: add fallback snippet for using offline PIL dummy images

* docs: temporarily revert attn_implementation to `None` while sdpa is not fixed

* docs: tweaks in colpali/colqwen2 quick start snippets

* fix: add missing flags to enable SDPA/Flex Attention in ColQwen2 model

* fix: add missing changes in modular file

* fix modeling tests

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-02 12:58:01 +00:00
beaed8ce01 [generate] move SinkCache to a custom_generate repo (#38399)
remove sink cache
2025-06-02 12:13:30 +02:00
fe5bfaa4b5 [generate] add soft deprecations on custom generation methods (#38406)
soft deprecations
2025-06-02 12:11:46 +02:00
a75b9ffb5c Update Loss Functions to Accept Tensor num_items_in_batch (#38029)
* Update Loss Functions to Accept Tensor num_items_in_batch

* Fix device mismatch by moving num_items_in_batch to loss device in fixed_cross_entropy

* fix the ruff check

* delete the unused if stat

* fix the type problem
2025-06-02 11:31:44 +02:00
493cf1554b [seamless_m4t] Skip some tests when speech is not available (#38430)
* Added the require_speech decorator

* Added require_speecj to some seamless_m4t tests

* Changed skip message
2025-06-02 09:17:28 +00:00
64d14ef28d Fix setting FLASH_ATTENTION_DETERMINISTIC after importing (#37185)
transformers.enable_full_determinism enables deterministic
flash attention using `FLASH_ATTENTION_DETERMINISTIC`
800510c67b/src/transformers/trainer_utils.py (L79)

However, current checks use a global variable `deterministic_g`,
which will do the environment variable check as soon as importing,
this will cause issues as users can call
`transformers.enable_full_determinism` after
`transformers.modeling_flash_attention_utils` is imported. This
behavior is introduced in
https://github.com/huggingface/transformers/pull/33932/files#r1806668579
to fix the graph break.

As a result, this PR implement fixes by delaying the environment variable
check to the first time when `_flash_attention_forward` is executed, so
that we can fix this issue and we won't introduce a graph break.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-06-02 11:08:20 +02:00
fde1120b6c Remove deprecated use_flash_attention_2 parameter (#37131)
Signed-off-by: cyy <cyyever@outlook.com>
2025-06-02 11:06:25 +02:00
51d732709e [docs] add xpu environment variable for gpu selection (#38194)
* squash commits

* rename gpu

* rename accelerator

* change _toctree.yml

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: sdp <sdp@a4bf01943ff7.jf.intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-05-30 16:05:07 +00:00
c7f2b79dd8 protect dtensor import (#38496)
protect
2025-05-30 17:36:00 +02:00
051a8acc9a Align TP check (#38328)
align tp check
2025-05-30 17:15:39 +02:00
e0545ef0b8 [Tests] Reduced model size for albert-test model (#38480)
* Reduced model size for albert-test model

* Run checks

* Removed test_save_load

* Removed test skipping functions
2025-05-30 14:22:32 +00:00
f962c862ff Bump torch from 2.2.0 to 2.6.0 in /examples/flax/vision (#37618)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.2.0 to 2.6.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.2.0...v2.6.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.6.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-30 14:04:52 +01:00
98568d1e25 Fix incorrect bbox_embed initialization when decoder_bbox_embed_share=False in GroundingDINO (#38238)
* A shallow copy in groundingdino
Fixes #37333

* Supprimer une ligne vide dans la classe GroundingDinoForObjectDetection

* Translate comments in the GroundingDinoForObjectDetection class from French to English
2025-05-30 15:02:18 +02:00
d0fccbf7ef Fix convert_internvl_weights_to_hf.py to support local paths (#38264)
fix(internvl): add local path support to convert_internvl_weights_to_hf.py
2025-05-30 14:56:32 +02:00
858ce6879a make it go brrrr (#38409)
* make it go brrrr

* date time

* update

* fix

* up

* uppp

* up

* no number i

* udpate

* fix

* [paligemma] fix processor with suffix (#38365)

fix pg processor

* [video utils] group and reorder by number of frames (#38374)

fix

* Fix convert to original state dict for VLMs (#38385)

* fix convert to original state dict

* fix

* lint

* Update modeling_utils.py

* update

* warn

* no verbose

* fginal

* ouft

* style

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-05-30 11:19:42 +02:00
ab5067e7fd fix: handle no scheduler passed by user (#38407) 2025-05-30 11:00:44 +02:00
42ef218b58 [Qwen2.5-Omni] Fix dtype of cos,sin when used with flash attention (#38453)
* Fix dtype of cos,sin when used with flash attention

* Fix dtype of cos,sin when used with flash attention
2025-05-29 18:24:40 +00:00
81cff7ad34 Fix Gemma3IntegrationTest (#38471)
* check

* check

* check

* check

* check

* check

* check

* test style bot

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-29 16:51:12 +02:00
e508965df7 Cleanup BatchFeature and BatchEncoding (#38459)
* Use dict comprehension to create dict

* Fix type annotation

Union[Any] doesn't really make any sense

* Remove methods that are already implemented in the `UserDict` parent
class
2025-05-29 14:13:43 +00:00
8e5cefcb1e Fix TypeError in save_pretrained error handling (fixes #38422) (#38449) 2025-05-29 13:58:16 +00:00
ad9dd3d17b 🔴 [VLM] modeling updates (#38317)
* updates

* fixup

* fix tests

* fix test

* fix

* let it be here for now, till monday

* two more fixes

* persimmon

* fixup

* fix

* fixup

* make sure fuyu runs now that LM has new attn API

* fixup + tests

* qwen vl uses new mask interface as well

* qwen image features format

* update

* remove image_sizes

* address comments

* i am dumb...
2025-05-29 11:08:23 +00:00
a6f7acb603 [Tests] Clean up test cases for few models (#38315)
* Update tests

* revert aria change

* too slow hence revert
2025-05-29 08:21:28 +00:00
8010f3cf61 feat: add cache retention for requests (#38446)
* feat: add cache retention for requests

* fix: propagate `manual_eviction` param & refactor `finish_request`

`finish_request` now only takes `request_id: str` as an input rather
than the full `RequestState`, which was not needed and simplifies
calling from `ContinuousBatchingManager::evict_request_from_cache`

* refactor: pop req from `active_requests`

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-28 18:15:10 +00:00
66da700145 Fix GLM4 checkpoints (#38412)
* fix

* fix

* fix

* fix

* fix

* fix

* test style bot

* Apply style fixes

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-28 16:40:08 +00:00
2872e8bac5 Merge type hints from microsoft/python-type-stubs (post dropping support for Python 3.8) (#38335)
* Merge type hints from microsoft/python-type-stubs (post Python 3.8)

* Remove mention of pylance

* Resolved conflict

* Merge type hints from microsoft/python-type-stubs (post Python 3.8)

* Remove mention of pylance

* Resolved conflict

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Avasam <samuel.06@hotmail.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-05-28 16:21:40 +00:00
942c60956f Model card for mobilenet v1 and v2 (#37948)
* doc: #36979

* doc: update hfoptions

* add model checkpoints links

* add model checkpoints links

* update example output

* update style #36979

* add pipeline tags

* improve comments

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggested changes

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:20:19 -07:00
9a8510572b Updated the model card for ViTMAE (#38302)
* Update vit_mae.md

* badge float:right

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update model_doc/vit_mae.md

* fix

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:19:43 -07:00
c9fcbd5bf9 Updated the Model docs - for the ALIGN model (#38072)
* Updated the Model docs - for the ALIGN model

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated align.md

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update align.md

* fix

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:19:09 -07:00
cba94e9272 Fix handling of slow/fast image processors in image_processing_auto.py (#38161)
Fix wrong error when torchvision is not installed
2025-05-28 16:00:23 +00:00
21b10d9aa4 Fix from_args_and_dict ProcessorMixin (#38296)
* fix-from-args-and-dict-processormixin

* change used_kwargs to valid_kwargs

* remove manual valid_kwargs

* fix copies

* fix modular aria
2025-05-28 11:46:33 -04:00
f844733568 Fix MoE gradient test (#38438) 2025-05-28 16:44:20 +01:00
0ed6f7e6b4 Remove redundant test_sdpa_equivalence test (#38436)
* Remove redundant test

* make fixup
2025-05-28 17:22:25 +02:00
51e0fac29f Trigger doc-builder job after style bot (#38398)
* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-28 17:15:34 +02:00
c24d18bbae Fix convert weights for InternVL (#38233)
Fix internvl convert weights
2025-05-28 11:14:56 -04:00
8850427242 Fix typo in tokenization_utils_base.py docstring (#38418)
Fix typo in tokenization_utils_base.py
2025-05-28 14:52:10 +00:00
bab40c6838 [core] support tensor-valued _extra_state values in from_pretrained (#38155)
Support tensor-valued _extra_state values

TransformerEngine uses the pytorch get/set_extra_state API to store FP8
layer config information as bytes Tensor in the _extra_state entry in
the state dict. With recent changes to from_pretrained, this
functionality has broken and loading a model that uses this API doesn't
appear to work. This PR fixes the save/load pretrained functions for
extra state entries that use a pytorch tensor, and adds a (currently
x-failing) test for a dictionary extra state.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
2025-05-28 15:38:42 +02:00
badc71b9f6 🔴[Attention] Attention refactor for Whisper-based models (#38235)
* start refactoring whisper

* revert for now

* first step

* carry over attn fixes

* check if this works

* whisper has an off by one somewhere - cutting mask in any interface

* make it based on interface

* remove some tests that were skipped but now work

* some fixes for whisper tests

* interface changes

* change the order of fix

* some attention adjustments for eager + TP

* fix scaling

* mask changes

* why does whisper contain those extra seq lens?

* fix from config for fa2 as input_ids is invalid

* fix another test

* another fix

* disable flex attn due to compile issues

* copies and refactor for qwen audio since it somewhat relies on whisper

* fix scaling and smaller things

* retrigger

* new new interface version + more fixups

* adjust qwen

* add comment

* forgot this one

* change copies as whisper cuts on the mask

* add guard

* add flex attention

* switch to new mask function + add skips for torchscript

* remove old api with cache position

* last changes?

* trigger ci
2025-05-28 13:32:38 +02:00
565a0052ed make Llama4TextMoe forward more readable (#37529)
* update forward of Llama4TextMoe

* remove redudant transpose

* fix formatting

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-28 11:54:45 +02:00
defeb04299 Fix CircleCI not triggered when PR is opened from a branch of huggingface/transformers (#38413)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-28 11:25:43 +02:00
593276fe1e Update error when using additional and/or masks (#38429)
update error
2025-05-28 11:08:49 +02:00
3aab6e95cb Disable mi210 scheduled CI (#38411) 2025-05-28 10:35:41 +02:00
fb82a98717 enable large_gpu and torchao cases on XPU (#38355)
* cohere2 done

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* enable torchao cases on XPU

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* rename

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix comments

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
2025-05-28 10:30:16 +02:00
cea254c909 Update CsmForConditionalGenerationIntegrationTest (#38424)
* require_read_token

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-28 10:20:43 +02:00
baddbdd24b [qwen-vl] Look for vocab size in text config (#38372)
fix qwen
2025-05-28 09:32:26 +02:00
a974e3b4e1 Fix an error in verify_tp_plan for keys without '.' (#38420) 2025-05-28 09:30:43 +02:00
b1eae943a2 Change slack channel for mi250 CI (#38410) 2025-05-28 09:20:34 +02:00
5f49e180a6 Add mi300 to amd daily ci workflows definition (#38415) 2025-05-28 09:17:41 +02:00
3b3ebcec40 Updated model card for OLMo2 (#38394)
* Updated OLMo2 model card

* added command line

* Add suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Added suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Indented code block as per suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 16:24:36 -07:00
f5307272f5 Falcon-H1 - Fix auto_docstring and add can_return_tuple decorator (#38260)
Fix auto_docstring and add can_return_tuple
2025-05-27 16:18:05 -04:00
a092f6babf Update granite.md (#37791)
* Update granite.md

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update granite.md

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* minor fixes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 12:55:15 -07:00
be7aa3210b New bart model card (#37858)
* Modified BART documentation wrt to issue #36979.

* Modified BART documentation wrt to issue #36979.

* fixed a typo.

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* blank commit.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:51:41 -07:00
587c1b0ed1 Updated BERTweet model card. (#37981)
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:51:22 -07:00
b73faef52f Updated BigBird Model card as per #36979. (#37959)
* Updated BigBird Model card as per #36979.

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:24:28 -07:00
538e847c06 Updated Zoedepth model card (#37898)
* Edited zoedepth model card according to specifications.

* Edited Zoedepth model file

* made suggested changes.
2025-05-27 10:06:53 -07:00
4f7b0ff8d1 Update Model Card for Mamba-2 (#37951)
* update model page.

* update model page.

* Update docs/source/en/model_doc/mamba2.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* update the model page.

* update.

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Apply the suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add an quantization example and update the toctree.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* remove the additional comma

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 10:06:39 -07:00
9c50576860 [mllama] Allow pixel_values with inputs_embeds (#38334)
* Allow pixel_values and inputs_embeds at the same time

* remove unnecessary overwritten tests
2025-05-27 16:33:56 +00:00
0f5a8243c4 [tests] remove overload for deleted test (test_offloaded_cache_implementation) (#37896)
* remove overload for deleted tests

* make fixup
2025-05-27 16:45:15 +01:00
f85fd90407 [cleanup] delete deprecated kwargs in qwen2_audio 🧹 (#38404)
delete deprecated
2025-05-27 16:08:53 +01:00
b9f8f863d9 [CSM] update model id (#38211)
* update model id

* codec_model eval

* add processor img

* use ungated repo for processor tests
2025-05-27 17:03:55 +02:00
07dd6b2495 Add report_repo_id to mi300 workflow (#38401) 2025-05-27 16:35:07 +02:00
3142bd8592 [CSM] infer codec model with no_grad + audio eos label (#38215)
* infer codec model with no_grad

* codec_model eval

* training labels: add audio eos token
2025-05-27 14:10:17 +00:00
10ae443ec0 Fix Qwen2.5-VL Video Processor (#38366)
* Update processing_qwen2_5_vl.py

* Update processing_qwen2_5_vl.py

* Update modular_qwen2_5_vl.py

* Fix CI

* Update modular_qwen2_5_vl.py

* Update processing_qwen2_5_vl.py

* Update video_processing_utils.py
2025-05-27 13:46:37 +02:00
80902ae9b1 [chat] use the checkpoint's generation_config.json as base parameterization (#38330)
* use model gen config

* unwanted diff
2025-05-27 10:35:33 +00:00
008e0d87c5 Fix convert to original state dict for VLMs (#38385)
* fix convert to original state dict

* fix

* lint

* Update modeling_utils.py
2025-05-27 10:27:59 +00:00
c769483188 [chat] improvements for thinking models and reduce default verbosity (#38322)
misc improvements
2025-05-27 10:20:58 +00:00
55f2333366 guard size mismatch check to only quantized models (#38397)
fix
2025-05-27 11:45:03 +02:00
1a5be2f5c0 [aya vision] fix processor for vLLM (#38371)
accidentally merged two PRs in one (;-_-)
2025-05-27 09:43:53 +00:00
19fdb75cf0 [video utils] group and reorder by number of frames (#38374)
fix
2025-05-27 11:32:33 +02:00
b0735dc0c1 [paligemma] fix processor with suffix (#38365)
fix pg processor
2025-05-27 11:31:56 +02:00
9e1017b479 [transformers x vLLM] standardize processors (#37915)
* standardize

* fix tests

* batch update some processors, not final yet

* oke, now I tested that everything indeed runs. Still needs prettification

* emu3

* fixup

* gemma3 but it doesn't generate anything

* fuyu

* update

* why?

* Update src/transformers/models/aya_vision/processing_aya_vision.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments

* bc

* why do we need to guard import this every time?

* i hate guarded imports

* i am blind

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-27 11:30:30 +02:00
b5ececb900 Fix image token mask in Gemma3 (#38295)
fix mask
2025-05-27 11:15:52 +02:00
c4e71e8fff Add AMD MI300 CI caller leveraging self-hosted runner scale set workflow in hf-workflows (#38132) 2025-05-26 23:13:02 +02:00
706b00928f Stop autoconverting custom code checkpoints (#37751)
* Stop autoconverting custom code checkpoints

* make fixup

* Better auto class detection

* Match the kwarg ordering
2025-05-26 19:15:28 +01:00
07848a8405 update gemma tests (#38384)
* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 19:54:04 +02:00
cd0f3ce73b [cli] cli usable without torch (#38386)
cli without torch
2025-05-26 16:54:18 +00:00
ba6d72226d 🚨 🚨 Fix custom code saving (#37716)
* Firstly: Better detection of when we're a custom class

* Trigger tests

* Let's break everything

* make fixup

* fix mistaken line doubling

* Let's try to get rid of it from config classes at least

* Let's try to get rid of it from config classes at least

* Fixup image processor

* no more circular import

* Let's go back to setting `_auto_class` again

* Let's go back to setting `_auto_class` again

* stash commit

* Revert the irrelevant changes until we figure out AutoConfig

* Change tests since we're breaking expectations

* make fixup

* do the same for all custom classes

* Cleanup for feature extractor tests

* Cleanup tokenization tests too

* typo

* Fix tokenizer tests

* make fixup

* fix image processor test

* make fixup

* Remove warning from register_for_auto_class

* Stop adding model info to auto map entirely

* Remove todo

* Remove the other todo

* Let's start slapping _auto_class on models why not

* Let's start slapping _auto_class on models why not

* Make sure the tests know what's up

* Make sure the tests know what's up

* Completely remove add_model_info_to_*

* Start adding _auto_class to models

* Start adding _auto_class to models

* Add a flaky decorator

* Add a flaky decorator and import

* stash commit

* More message cleanup

* make fixup

* fix indent

* Fix trust_remote_code prompts

* make fixup

* correct indentation

* Reincorporate changes into dynamic_module_utils

* Update call to trust_remote_code

* make fixup

* Fix video processors too

* Fix video processors too

* Remove is_flaky additions

* make fixup
2025-05-26 17:37:30 +01:00
701caef704 Stop TF weight rename reDOS (#38325)
* let's try a non-regex solution

* make fixup

* Slight adjustment

* Let's just use the original code with a check

* slight tweak to conditional

* slight tweak to conditional
2025-05-26 16:58:51 +01:00
0a4e8e2855 fix typo: tokenizer -> tokenize (#38357) 2025-05-26 15:29:16 +00:00
63964b7c67 fix typos (#38336)
* Update video_processor.md

* Update deepseek_v3.md
2025-05-26 14:42:37 +00:00
8b03c8eaf2 Better check in initialize_weights (#38382)
* Update modeling_utils.py

* CIs

* CIs
2025-05-26 16:20:23 +02:00
eb74cf977b Use one utils/notification_service.py (#38379)
* step 1

* step 2

* step 3

* step 4

* step 5

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 16:15:29 +02:00
98328fd9a1 for now disable compile (#38383) 2025-05-26 15:57:11 +02:00
78079abeff Improved cache docs (#38060)
* improved cache docs

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-26 13:53:41 +00:00
7a9b071bfd [Falcon H1] Fix slow path forward pass (#38320)
* Create push-important-models.yml

* feat: add falcon-h1

* fixup

* address comment

* fix

* fix copies

* fix copies

* fix

* fix

* fix

* fix

* fix copies

* fix

* fix copies

* fix test import to at least trigget the cis

* yups

* update

* fix make fix copies

* fix inits?

* fix style

* skip annoying test

* add integration test for Falcon H1

* fix copies

* fix

* fix typo

* make style

* fix slow path generations

* clean debug traces

* debug

* remove debug traces final confirmation

* clean debug traces final

* fix format and lineup

* make style

* debug

* Update src/transformers/models/falcon_h1/modular_falcon_h1.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* adress comments

* fix fix-copies

* fix integration test

* Merge pull request #7 from ydshieh/fix-slow-path

update

* another update (#8)

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: younesbelkada <younes.belkada@tii.ae>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 15:30:35 +02:00
b5b76b5561 Protect get_default_device for torch<2.3 (#38376)
* Update modeling_utils.py

* CIs
2025-05-26 15:00:09 +02:00
bff32678cc Fix incorrect batching audio index calculation for Phi-4-Multimodal (#38103)
* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* add tests

Signed-off-by: Isotr0py <2037008807@qq.com>

* code format

Signed-off-by: Isotr0py <2037008807@qq.com>

* Update src/transformers/models/phi4_multimodal/feature_extraction_phi4_multimodal.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-26 12:41:31 +00:00
9f0402bc4d Fix all import errors based on older torch versions (#38370)
* Update masking_utils.py

* fix

* fix

* fix

* Update masking_utils.py

* Update executorch.py

* fix
2025-05-26 12:11:54 +02:00
d03a3ca692 [OPT] Fix attention scaling (#38290)
* fix opt attention scaling

* add comment to why we do this
2025-05-26 11:02:16 +02:00
a5a0c7b888 switch to device agnostic device calling for test cases (#38247)
* use device agnostic APIs in test cases

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* add one more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xpu now supports integer device id, aligning to CUDA behaviors

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* update to use device_properties

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* update comment

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix comments

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 10:18:53 +02:00
cba279f46c [VLMs] add helpers for get/set embedding (#38144)
* add helpers in VLMs

* fix tied weight key test
2025-05-26 09:50:32 +02:00
6e3063422c Uninstall kernels for AMD docker images (#38354)
Uninstall kernels for AMD docker images

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-25 19:42:25 +02:00
4a03044ddb Hot fix for AMD CI workflow (#38349)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-25 11:15:31 +02:00
d0c9c66d1c new failure CI reports for all jobs (#38298)
* new failures

* report_repo_id

* report_repo_id

* report_repo_id

* More fixes

* More fixes

* More fixes

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-24 19:15:02 +02:00
31f8a0fe8a [docs]: update roformer.md model card (#37946)
* Update roformer model card

* fix example purpose description

* fix model description according to the comments

* revert changes for autodoc

* remove unneeded tags

* fix review issues

* fix hfoption

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 16:27:56 -07:00
36f97ae15b docs(swinv2): Update SwinV2 model card to new standard format (#37942)
* docs(swinv2): Update SwinV2 model card to new standard format

* docs(swinv2): Apply review suggestions

Incorporates feedback from @stevhliu to:
- Enhance the introductory paragraph with more details about scaling and SimMIM.
- Generalize the tip from "image classification tasks" to "vision tasks".

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 13:04:13 -07:00
33d23c39ed Update BioGPT model card (#38214)
* Update BioGPT model card

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* correction for CPU fallback

* added quantization code and method

* fixed transformers-cli call

---------

Co-authored-by: Aguedo <aguedo@fakeemail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 13:03:47 -07:00
dffb118013 Remove duplicate docstring: resample (#38305)
Duplicate of the line above.
2025-05-23 13:02:58 -07:00
e0aad278fe Never fallback to eager implicitly (#38327)
* remove arg everywhere

* Update warnings

* add more models

* Update sdpa_attention.py

* fix style

* fix

* readd warnings but not for flex

* Update test_modeling_common.py

* skip

* fix

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-23 19:48:01 +02:00
e64ed0304c Use Gradient Checkpointing Layer in Jamba & Blip Related Models (#38310)
* Use gradient checkpointing class in blip classes

* Use gradient checkpointing class in jamba/bamba
2025-05-23 19:35:25 +02:00
53fb245eb6 🚨 🚨 Inherited CausalLM Tests (#37590)
* stash commit

* Experiment 1: Try just Gemma

* Experiment 1: Just try Gemma

* make fixup

* Trigger tests

* stash commit

* Try adding Gemma3 as well

* make fixup

* Correct attrib names

* Correct pipeline model mapping

* Add in all_model_classes for Gemma1 again

* Move the pipeline model mapping around again

* make fixup

* Revert Gemma3 changes since it's a VLM

* Let's try Falcon

* Correct attributes

* Correct attributes

* Let's try just overriding get_config() for now

* Do Nemotron too

* And Llama!

* Do llama/persimmon

* Correctly skip tests

* Fix Persimmon

* Include Phimoe

* Fix Gemma2

* Set model_tester_class correctly

* Add GLM

* More models!

* models models models

* make fixup

* Add Qwen3 + Qwen3MoE

* Correct import

* make fixup

* Add the QuestionAnswering classes

* Add the QuestionAnswering classes

* Move pipeline mapping to the right place

* Jetmoe too

* Stop RoPE testing models with no RoPE

* Fix up JetMOE a bit

* Fix up JetMOE a bit

* Can we just force pad_token_id all the time?

* make fixup

* fix starcoder2

* Move pipeline mapping

* Fix RoPE skipping

* Fix RecurrentGemma tests

* Fix Falcon tests

* Add MoE attributes

* Fix values for RoPE testing

* Make sure we set bos_token_id and eos_token_id in an appropriate range

* make fixup

* Fix GLM4

* Add mamba attributes

* Revert bits of JetMOE

* Re-add the JetMOE skips

* Update tests/causal_lm_tester.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add licence

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-23 18:29:31 +01:00
d5f992f5e6 Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag (#36835)
* Get parallel loader working. Include tests.

* Update the tests for parallel loading

* Rename env variables.

* Add docs for parallel model weight loading.

* Touch up parallel model loading docs.

* Touch up parallel model loading docs again.

* Edit comment in test_modeling_utils_parallel_loading.py

* Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modeling_utils.py

* Correct times for parallelized loading, previous times were for a "hot" filesystem

* Update parallel model loading so the spawn method is encapsulated. DRY up the code by leveraging get_submodule.

* Update docs on model loading parallelism so that details on setting the multiprocessing start method are removed, now that the package handles this step internally.

* Fix style on model loading parallelism changes.

* Merge latest version of master's modeling_utils.

* Removed unused variable.

* Fix argument packing for the parallel loader.

* Fix state dict being undefined in the parallel model loader.

* Rename variables used in parallel model loading for clarity. Use get_module_from_name().

* Switch to the use of threads for parallel model loading.

* Update docs for parallel loading.

* Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADING. Prefer simple casting.

* Move parallelized shard loading into its own function.

* Remove use of is_true(). Favor checking env var true values for HF_ENABLE_PARALLEL_LOADING.

* Update copyright to 2025 in readme for paralell model loading.

* Remove garbage collection line in load_shard_file, implicit garbage collection already occurs.

* Run formatter on modeling_utils.py

* Apply style fixes

* Delete tests/utils/test_modeling_utils_parallel_loading.py

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-05-23 16:39:47 +00:00
1ed19360b1 [FlexAttention] Reenable flex for encoder-decoder and make the test more robust (#38321)
* reenable most flex attention test cases

* style

* trigger

* trigger
2025-05-23 18:16:43 +02:00
bb567d85a4 refactor can_save_slow_tokenizer (#37722)
* refactor to rm property can_save_slow_tokenizer, it can be done within the if of save_vocab

* move property to fast

* revert if

* check if vocab_file is attr

* fix check for sp

* fix if condition

* fix if condition

* fix if condition
2025-05-23 17:29:38 +02:00
3c289e2104 [performance_optim] reduce frequency of declaring attention_mask in Ascend NPU flash attention (#38278)
[performance_optim] reduce frequency of declaring attention_mask in ASCEND NPU flash attention
2025-05-23 17:24:51 +02:00
f5d45d89c4 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288)
* Protect ParallelInterface

* early error out on output attention setting for no wraning in modeling

* modular update

* fixup

* update model tests

* update

* oups

* set model's config

* more cases

* ??

* properly fix

* fixup

* update

* last onces

* update

* fix?

* fix wrong merge commit

* fix hub test

* nits

* wow I am tired

* updates

* fix pipeline!

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-05-23 17:17:38 +02:00
896833c183 Fix some tests (especially compile with fullgraph=True on Python<3.11) (#38319)
* fix tests

* better fix for python<3.11

* fixes

* style
2025-05-23 17:11:40 +02:00
a63bc17416 add vasqu to self-comment-ci.yml (#38324)
add vasqu

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-23 17:09:44 +02:00
54cd86708d [custom_generate] don't forward custom_generate and trust_remote_code (#38304)
* prevent infinite loops

* docs

* more links to custom generation methods
2025-05-23 14:49:39 +00:00
135163e9c5 Expose AutoModelForTimeSeriesPrediction for import (#38307)
* expose AutoModelForTimeSeriesPrediction for import

* add in docs
2025-05-23 13:09:29 +00:00
a6b51e7341 [Whisper + beam search] fix usage of beam_indices (#38259)
* tmp

* fix test_tiny_token_timestamp_batch_generation

* better comments

* test

* comments

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-05-23 10:05:44 +00:00
3e960e032d [tf/flax] handle forced_decoder_ids deletion (#38316)
fix tf/flax, attr checks
2025-05-23 09:44:58 +00:00
9eb0a37c9e Adds use_repr to model_addition_debugger_context (#37984)
* Adds use_repr to model_addition_debugger_context

* Updating docs for use_repr option
2025-05-23 09:35:13 +00:00
38f9c5b15b Fix typo: change 'env' to 'environment' in .circleci/config.yml (#38273)
* Fix typo: change 'env' to 'environment' in .circleci/config.yml

* Remove CIRCLE_TOKEN environment variable from artifact retrieval step

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-23 10:45:27 +02:00
11b670a282 Fix run_slow (#38314)
Signed-off-by: cyy <cyyever@outlook.com>
2025-05-23 10:18:30 +02:00
b01984a51d [emu3] fix conversion script (#38297)
* fix conversion script and update weights

* fixup

* remove commented line
2025-05-23 09:49:56 +02:00
2b585419b4 [Tests] Cleanup Janus Testcase (#38311)
* Cleanup janus testcase

* shift code to setup
2025-05-23 09:29:16 +02:00
b59386dc0a Oups typo for HybridChunkedCache (#38303)
typo
2025-05-22 17:52:37 +02:00
211f2b0875 Add CB (#38085)
* stash for now

* initial commit

* small updated

* up

* up

* works!

* nits and fixes

* don't loop too much

* finish working example

* update

* fix the small freeblocks issue

* feat: stream inputs to continuous batch

* fix: update attn from `eager` to `sdpa`

* refactor: fmt

* refactor: cleanup unnecessary code

* feat: add `update` fn to `PagedAttentionCache`

* feat: broken optimal block size computation

* fix: debugging invalid cache logic

* fix: attention mask

* refactor: use custom prompts for example

* feat: add streaming output

* fix: prefill split

refactor: add doc strings and unsound/redundant logic
fix: compute optimal blocks logic

* fix: send decoded tokens when `prefilling_split` -> `decoding`

* refactor: move logic to appropriate parent class

* fix: remove truncation as we split prefilling anyways

refactor: early return when we have enough selected requests

* feat: add paged attention forward

* push Ggraoh>

* add paged sdpa

* update

* btter mps defaults

* feat: add progress bar for `generate_batch`

* feat: add opentelemetry metrics (ttft + batch fill %age)

* feat: add tracing

* Add cuda graphs (#38059)

* draft cudagraphs addition

* nits

* styling

* update

* fix

* kinda draft of what it should look like

* fixes

* lol

* not sure why inf everywhere

* can generate but output is shit

* some fixes

* we should have a single device synch

* broken outputs but it does run

* refactor

* updates

* updates with some fixes

* fix mask causality

* another commit that casts after

* add error

* simplify example

* update

* updates

* revert llama changes

* fix merge conflicts

* fix: tracing and metrics

* my updates

* update script default values

* fix block allocation issue

* fix prefill split attnetion mask

* no bugs

* add paged eager

* fix

* update

* style

* feat: add pytorch traces

* fix

* fix

* refactor: remove pytorch profiler data

* style

* nits

* cleanup

* draft test file

* fix

* fix

* fix paged and graphs

* small renamings

* cleanups and push

* refactor: move tracing and metrics logic to utils

* refactor: trace more blocks of code

* nits

* nits

* update

* to profile or not to profile

* refactor: create new output object

* causal by default

* cleanup but generations are still off for IDK what reason

* simplifications but not running still

* this does work.

* small quality of life updates

* nits

* updaet

* fix the scheduler

* fix warning

* ol

* fully fixed

* nits

* different generation parameters

* nice

* just style

* feat: add cache memory usage

* feat: add kv cache free memory

* feat: add active/waiting count & req latency

* do the sampling

* fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager

* fix on mps

* feat: add dashboard & histogram buckets

* perf: improve waiting reqs data structures

* attempt to compile, but we should only do it on mps AFAIK

* feat: decouple scheduling logic

* just a draft

* c;eanup and fixup

* optional

* style

* update

* update

* remove the draft documentation

* fix import as well

* update

* fix the test

* style doomed

---------

Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>
2025-05-22 17:43:48 +02:00
73286d8e29 Fix HybridChunedCache & Llama4 (#38299)
* Update cache_utils.py

* Update cache_utils.py
2025-05-22 17:25:51 +02:00
d95c864a25 🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models (#38108)
* starting attn refactor for encoder decoder models via bart (eager + sdpa)

* flash attention works, remove unnecessary code

* flex attention support for bart!, gotta check if the renaming is not too aggressive

* some comments

* skip flex grad test for standalone as done with the other test

* revert flex attn rename (for now), sdpa simplify, and todos

* more todos

* refactor mask creation for reuse

* modular attempt at biogpt

* first batch of other models

* fix attn dropout

* fix autoformer copies

* hubert

* another batch of models

* copies/style + last round of bart models --> whisper next?

* remove unnecessary _reshape function and remove copy to whisper

* add skip for decoder-only models out of enc-dec (same as in bart)

* bring back licences

* remove comment, added to pr read instead

* mostly docs

* disable sew flex attn as it's unclear attn mask for now

* oops

* test fixes for enc-dec

* torch fx fixes + try at flex attn

* skip on mbart

* some more fixes

* musicgen skip / delete old attn class logic + sdpa compose compile skip

* disable flex attn for musicgen, not worth the effort

* more fixes and style

* flex attention test for dropout and encoder decoder that dont have main input names

* informer fixes

* the weirdest thing I've encountered yet...

* style

* remove empty tensor attempt, found core root in previous commits

* disable time series due to tests being very text centric on inputs

* add speech to text to be ignoring the other attns, also due to tests

* update docs

* remaining issues resolved ?

* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D

* some models have not set the is_causal flag...

* change dtype in softmax tol old behaviour + some modular fixes

* I hate it but it is what it is

* fixes from main for bart

* forgot this one

* some model fixes

* style

* current status

* marian works now

* fixing some copies

* some copy fixes + time series x informer

* last models possibly and fixes on style/copies

* some post merge fixes

* more fixes

* make attention interface callable and move warnings there

* style lol

* add comment to "unsupported"

* remove callable interface and change interface warnings + some copies

* fix

* ternary is ugly af, make it simpler

* how did that happen

* fix flex attn test

* failing the test

* no more fallback! fixing copies next

* style + attn fixed

* fixing copies and mask creation

* wrong copy

* fixup tests and disable flex attn for now

* fixup last tests?
2025-05-22 17:12:58 +02:00
9895819514 Update CI Docker base image for AMD tests (#38261)
use newer Pytorch base image for AMD CI tests
2025-05-22 16:38:40 +02:00
dfbee79ca3 refine transformers env output (#38274)
* refine `transformers env` output

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-22 15:22:18 +02:00
1234683309 More typing in src/transformers/training_args.py (#38106)
* Annotate `framework` in src/transformers/training_args.py

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typing

Signed-off-by: cyy <cyyever@outlook.com>

* Revert framework change

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-05-22 13:14:33 +02:00
03a4c024dc Fix tp error when torch distributed is already initialized (#38294)
fix tp error
2025-05-22 12:34:05 +02:00
dcaf47dde5 add liger-kernel to docker file (#38292)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-22 11:58:17 +02:00
163138a911 🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866)
* start

* start having a clean 4d mask primitive

* Update mask_utils.py

* Update mask_utils.py

* switch name

* Update masking_utils.py

* add a new AttentionMask tensor class

* fix import

* nits

* fixes

* use full and quandrants

* general sdpa mask for all caches

* style

* start some tests

* tests with sliding, chunked

* add styling

* test hybrid

* Update masking_utils.py

* small temp fixes

* Update modeling_gemma2.py

* compile compatible

* Update masking_utils.py

* improve

* start making it more general

* Update masking_utils.py

* generate

* make it work with flex style primitives!

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* improve

* Update cache_utils.py

* Update masking_utils.py

* simplify - starting to look good!

* Update masking_utils.py

* name

* Update masking_utils.py

* style

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* small fix for flex

* flex compile

* FA2

* Update masking_utils.py

* Escape for TGI/vLLM!

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* General case without cache

* rename

* full test on llama4

* small fix for FA2 guard with chunk

* Update modeling_gemma2.py

* post rebase cleanup

* FA2 supports static cache!

* Update modeling_flash_attention_utils.py

* Update flex_attention.py

* Update masking_utils.py

* Update masking_utils.py

* Update utils.py

* override for export

* Update executorch.py

* Update executorch.py

* Update executorch.py

* Update executorch.py

* Update masking_utils.py

* Update masking_utils.py

* output attentions

* style

* Update masking_utils.py

* Update executorch.py

* Add doicstring

* Add license and put mask visualizer at the end

* Update test_modeling_common.py

* fix broken test

* Update test_modeling_gemma.py

* Update test_modeling_gemma2.py

* Use fullgraph=False with FA2

* Update utils.py

* change name

* Update masking_utils.py

* improve doc

* change name

* Update modeling_attn_mask_utils.py

* more explicit logic based on model's property

* pattern in config

* extend

* fixes

* make it better

* generalize to other test models

* fix

* Update masking_utils.py

* fix

* do not check mask equivalence if layer types are different

* executorch

* Update modeling_gemma2.py

* Update masking_utils.py

* use layer_idx instead

* adjust

* Update masking_utils.py

* test

* fix imports

* Update modeling_gemma2.py

* other test models

* Update modeling_llama4.py

* Update masking_utils.py

* improve

* simplify

* Update masking_utils.py

* typos

* typo

* fix

* Update masking_utils.py

* default DynamicCache

* remove default cache

* simplify

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* simplify

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* export

* Update executorch.py

* Update executorch.py

* Update flex_attention.py

* Update executorch.py

* upstream to modular gemma 1 & 2

* Update modular_mistral.py

* switch names

* use dict

* put it in the Layer directly

* update copy model source for mask functions

* apply so many modular (hopefully 1 shot)

* use explicite dicts for make style happy

* protect import

* check docstring

* better default in hybrid caches

* qwens

* Update modular_qwen2.py

* simplify core logic!

* Update executorch.py

* qwen3 moe

* Update masking_utils.py

* Update masking_utils.py

* simplify a lot sdpa causal skip

* Update masking_utils.py

* post-rebase

* gemma3 finally

* style

* check it before

* gemma3

* More general with newer torch

* align gemma3

* Update utils.py

* Update utils.py

* Update masking_utils.py

* Update test_modeling_common.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* test

* executorch

* Update test_modeling_common.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update executorch.py

* Update test_modeling_common.py

* fix copies

* device

* sdpa can be used without mask -> pass the torchscript tests in this case

* Use enum for check

* revert enum and add check instead

* remove broken test

* cohere2

* some doc & reorganize the Interface

* Update tensor_parallel.py

* Update tensor_parallel.py

* doc and dummy

* Update test_modeling_paligemma2.py

* Update modeling_falcon_h1.py

* Update masking_utils.py

* executorch patch

* style

* CIs

* use register in executorch

* final comments!

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-05-22 11:38:26 +02:00
f8630c778c [Whisper] handle deprecation of forced_decoder_ids (#38232)
* fix

* working saved forced_decoder_ids

* docstring

* add deprecation message

* exception message ordering

* circular import comment
2025-05-22 09:16:38 +00:00
aa02a5d902 [whisper] move processor test into processor test file 🧹 (#38266)
move processor tests
2025-05-22 10:07:11 +01:00
b26157d64c add XPU info print in print_env (#38282)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-22 11:03:56 +02:00
b369a65480 docs(swin): Update Swin model card to standard format (#37628)
* docs(swin): Update Swin model card to standard format

* docs(swin): Refine link to Microsoft organization for Swin models

Apply suggestion from @stevhliu in PR #37628.

This change updates the link pointing to the official Microsoft Swin Transformer checkpoints on the Hugging Face Hub.

The link now directs users specifically to the Microsoft organization page, filtered for Swin models, providing a clearer and more canonical reference compared to the previous general search link.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(swin): Clarify padding description and link to backbone docs

Apply suggestion from @stevhliu in PR #37628.

This change introduces two improvements to the Swin model card:

1.  Refines the wording describing how Swin handles input padding for better clarity.
2.  Adds an internal documentation link to the general "backbones" page when discussing Swin's capability as a backbone model.

These updates enhance readability and improve navigation within the Transformers documentation.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(swin): Change Swin paper link to huggingface.co/papers as suggested

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-21 16:16:43 -07:00
28d3148b07 Update Model Card for Mamba (#37863)
* update model card.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update quantization example.

* update example.

* update

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-21 10:58:23 -07:00
7b7bb8df97 Protect ParallelInterface (#38262)
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-21 17:45:38 +02:00
5c13cc0f94 Remove Japanese sequence_classification doc and update references (#38246) 2025-05-21 08:33:41 -07:00
71009e4b68 assign the correct torchao data layout for xpu (#37781)
* assign the correct data layout for xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check torch version before using torchao xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix the log

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix zero point type

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check torch version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-05-21 17:21:55 +02:00
d6c34cdcd0 Fix: missing else branch to handle "--load_best_model_at_end" in training_args.py (#38217)
Update training_args.py
2025-05-21 14:28:56 +00:00
ae3e4e2d97 Improve typing in TrainingArgument (#36944)
* Better error message in TrainingArgument typing checks

* Better typing

* Small fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-05-21 13:54:38 +00:00
174684a9b6 Simplify DTensor Check for modeling_utils.py (#38245)
Update modeling_utils.py
2025-05-21 13:35:44 +00:00
e4decee9c0 [whisper] small changes for faster tests (#38236) 2025-05-21 14:11:08 +01:00
ddf67d2d73 Clearer error on import failure (#38257)
Clearer error
2025-05-21 14:32:29 +02:00
9a962dd9ed Add tearDown method to Quark to solve OOM issues (#38234)
fix
2025-05-21 14:26:44 +02:00
101b3fa4ea fix multi-image case for llava-onevision (#38084)
* _get_padding_size module

* do not patchify images when processing multi image

* modify llava onevision image processor fast

* tensor to list of tensors

* backward compat

* reuse pad_to_square in llave & some clarification

* add to doc

* fix: consider no image cases (text only or video)

* add integration test

* style & repo_consistency
2025-05-21 11:50:46 +02:00
a21f11fca2 [compile] re-enable for Qwen-VL models (#38127)
* compile qwen models

* delete TODO comment

* fix embeds test

* fix assisted decoding

* add comments
2025-05-21 09:50:39 +00:00
4542086db7 [Falcon H1] Fix Typo in Integration Test (#38256)
* Create push-important-models.yml

* feat: add falcon-h1

* fixup

* address comment

* fix

* fix copies

* fix copies

* fix

* fix

* fix

* fix

* fix copies

* fix

* fix copies

* fix test import to at least trigget the cis

* yups

* update

* fix make fix copies

* fix inits?

* fix style

* skip annoying test

* add integration test for Falcon H1

* fix copies

* fix

* fix typo

* make style

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: younesbelkada <younes.belkada@tii.ae>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-05-21 11:25:26 +02:00
6829936ee0 [MODEL] Add Falcon H1 (#38249)
* Create push-important-models.yml

* feat: add falcon-h1

* fixup

* address comment

* fix

* fix copies

* fix copies

* fix

* fix

* fix

* fix

* fix copies

* fix

* fix copies

* fix test import to at least trigget the cis

* yups

* update

* fix make fix copies

* fix inits?

* fix style

* skip annoying test

* add integration test for Falcon H1

* fix copies

* fix

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
2025-05-21 10:43:11 +02:00
e288ee00d8 tp plan should not be NONE (#38255)
* accept custom device_mesh

* fix device_map

* assert that num_heads % tp_size == 0

* todo.

* ReplicateParallel

* handle tied weights

* handle dtensor in save_pretrained with safe_serialization

* tp test works

* doesnt work

* fix shard_and_distribute_module's rank should be local_rank

* tp=4 is correct

* dp+tp is broken

* todo allreduce with dtensors on another dim is annoying

* workaround to sync dp grads when using dtensors

* loading a checkpoint works

* wandb and compare losses with different tp/dp

* cleaning

* cleaning

* .

* .

* logs

* CP2 DP2 no mask works after commenting attn_mask and is_causal from scaled_dot_product_attention

* DP=2 TP=2 now works even with tied embeddings

* model.parameters() and model.module.parameters() are empty..

* reformat sanity_check_tensor_sync

* set atol=1e-4 for CP to pass

* try populate _parameters from named_modules

* refactors
TP2 DP2 works
CP2 DP2 works

* is_causal=True and pack sequences, no attn mask, and preshuffle dataset

* fix packing

* CP=4 doesn't work

* fix labels and position_ids for CP

* DP CP works with transformers 🥳🥳🥳

* refactor

* add example cp

* fixup

* revert sdpa changes

* example cleared

* add CP, DP to the mesh init

* nit

* clean

* use `ALL_PARALLEL_STYLES`

* style

* FSDP works

* log on 1 rank

* .

* fix?

* FSDP1 also has .parameters() bug

* reported gradnorm when using FSDP1 is wrong, but loss is correct so it's okay

* .

* style and fixup

* move stuff around

* fix tests

* style

* let's make it a check

* add missing licences

* warning should be an info

* tp plan should not be NONE

* test all

* god damn it

* test all

---------

Co-authored-by: nouamanetazi <nouamane98@gmail.com>
2025-05-21 10:22:38 +02:00
711d78d104 Revert parallelism temporarily (#38240)
* Revert "Protect ParallelInterface"

This reverts commit cb513e35f9c096d60558bd43110837cbb66611ce.

* Revert "parallelism goes brrr (#37877)"

This reverts commit 1c2f36b480e02c9027d2523746d34e27b39e01a4.

* Empty commit
2025-05-20 22:43:04 +02:00
feec294dea CI reporting improvements (#38230)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-20 19:34:58 +02:00
cb513e35f9 Protect ParallelInterface 2025-05-20 18:27:50 +02:00
f4ef41c45e v4.53.0.dev0 2025-05-20 18:12:56 +02:00
f834d368f6 [gemma3] fix bidirectional attention mask (#38080)
* fix attn mask

* attn viz doesn't show yello cubes between images

* bucketize made it hard with different number of crops

* fixup
2025-05-20 17:35:04 +02:00
2edb0e4b4d [mllama] fix loading and inference (#38223)
fix loading
2025-05-20 17:34:55 +02:00
390f153469 Add padding-free to bamba (#35861)
* add seq_idx and fa kwargs

* update tests

* docs and grad ckpt support

* fmt

* better names

* test_raise_missing_padding_free_kwarg_errs

* + seq_idx in doc strings

* padding free training docs

* add link to pr plots

* raise err on attn_mask with padding free

* rm raising missing padding free err test

* BambaFlashAttentionKwargs

* run modular util for modular_granitemoehybrid.py
2025-05-20 17:13:59 +02:00
2a79471318 Fixing Bitnet after use_rms_norm introduction (#38229)
* fix

* make style
2025-05-20 17:13:21 +02:00
9661896083 Enable Quantize KV Cache for Mistral Model (#35042)
fix #35041
2025-05-20 16:50:26 +02:00
1c2f36b480 parallelism goes brrr (#37877)
* accept custom device_mesh

* fix device_map

* assert that num_heads % tp_size == 0

* todo.

* ReplicateParallel

* handle tied weights

* handle dtensor in save_pretrained with safe_serialization

* tp test works

* doesnt work

* fix shard_and_distribute_module's rank should be local_rank

* tp=4 is correct

* dp+tp is broken

* todo allreduce with dtensors on another dim is annoying

* workaround to sync dp grads when using dtensors

* loading a checkpoint works

* wandb and compare losses with different tp/dp

* cleaning

* cleaning

* .

* .

* logs

* CP2 DP2 no mask works after commenting attn_mask and is_causal from scaled_dot_product_attention

* DP=2 TP=2 now works even with tied embeddings

* model.parameters() and model.module.parameters() are empty..

* reformat sanity_check_tensor_sync

* set atol=1e-4 for CP to pass

* try populate _parameters from named_modules

* refactors
TP2 DP2 works
CP2 DP2 works

* is_causal=True and pack sequences, no attn mask, and preshuffle dataset

* fix packing

* CP=4 doesn't work

* fix labels and position_ids for CP

* DP CP works with transformers 🥳🥳🥳

* refactor

* add example cp

* fixup

* revert sdpa changes

* example cleared

* add CP, DP to the mesh init

* nit

* clean

* use `ALL_PARALLEL_STYLES`

* style

* FSDP works

* log on 1 rank

* .

* fix?

* FSDP1 also has .parameters() bug

* reported gradnorm when using FSDP1 is wrong, but loss is correct so it's okay

* .

* style and fixup

* move stuff around

* fix tests

* style

* let's make it a check

* warning should be an info

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-05-20 16:22:52 +02:00
b591d925be Fix Llama4 (#38222)
Update modeling_llama4.py
2025-05-20 16:00:46 +02:00
3f0b7d0fac Mamba2 remove unecessary test parameterization (#38227) 2025-05-20 13:54:04 +00:00
9cde2f5d42 Minor llama4 fixes (#38123)
* fix wrong scaling value/default Cache init

* style

* fix various issues on integration tests

* change expected outputs

* fixup

* fix config access

* protect default scaling
2025-05-20 13:15:54 +00:00
856f034f45 fix dead flax links modeling_flax_pytorch_utils.py (#38212) 2025-05-20 13:03:41 +00:00
bb3c6426d8 Make train_dataset attribute in _get_train_sampler optional (#38226)
make it optional
2025-05-20 12:59:53 +00:00
2ad152f84c In Llama4 fix wrongly inverted causal attention mask when using SDPA implementation (#38094)
When preparing the causal attention mask at this point the mask comes
in as a float tensor with min value as a masked value.
It is not correct to convert it to bool and treat it as a bool mask as
this inverts the mask.
`torch.nn.functional.scaled_dot_product_attention` expects that a masked value is `False`.

I suspect that the `sdpa` implementation variant may not have been
thoroughly tested and that is why this error was not caught earlier.

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-20 14:47:59 +02:00
de70c8426e Disable torchscript tests for AriaForConditionalGenerationModelTest (#38225)
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-20 14:37:55 +02:00
8ea61c4530 Add support to Marimo Notebooks and Enverge.ai (#38210)
* Add support to Marimo notebooks

* Consice logic

* Simplify logic

* Ruff fixes
2025-05-20 12:26:34 +00:00
d34e21e7dd New cache tests and refactored Hybrid Cache (#37972) 2025-05-20 12:46:13 +02:00
183fb3637c Add Llama4TextModel to AutoModel mapping (#38162)
Add Llama4TextModel to AutoModel mapping

using Llama4TextConfig on AutoModel.from_config raises a ValueError when it is expected to instantiate a Llama4TextModel
2025-05-20 10:01:00 +00:00
f022bf9322 Remove trust_remote_code=True tests from bnb quantization tests (MPT now integrated) (#38206)
bnb quant tests: remove obsolete trust_remote_code test

The MPT model is now natively integrated in Transformers and no longer requires trust_remote_code=True. This removes the failing test_get_keys_to_not_convert_trust_remote_code and related usage, which depended on remote code and caused CI issues due to missing dependencies (e.g., triton_pre_mlir).
2025-05-20 11:43:11 +02:00
0a52bd2403 [fix] sliding window attention mask (#38045)
* fix sliding attn

* make style

* Update tests/test_modeling_common.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* no a second throught, should default to `True` fo BC

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-05-20 09:32:19 +00:00
555715f418 Fix broken example generation script for Llama3 (#38062)
Fix broken example generation script for llama3
2025-05-20 10:53:43 +02:00
7a611f0afd Fix: make docs work better with doc builder (#38213) 2025-05-20 08:23:03 +00:00
3bd1c20149 enable misc cases on XPU & use device agnostic APIs for cases in tests (#38192)
* use device agnostic APIs in tests

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* more

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* add reset_peak_memory_stats API

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-20 10:09:01 +02:00
dbc4b91db4 Qwen2.5-Omni: Update modeling_qwen2_5_omni.py to fix error when loading quantized weights with AutoAWQ. (#38013)
* Update modular_qwen2_5_omni.py

fix the error when loading quantized model by AuotAWQ.

* Update modeling_qwen2_5_omni.py

sync code to modular_qwen2_5_omni.py
2025-05-20 09:53:51 +02:00
46a4b7c909 Feat: save_pretrained for tensor parallel (and other parallelisms) models (#37919)
* tmp: initial save pretrained with dtensors

* Feat: add correctness tests

* Refactor: version checks

* Temp: 1:1 checkpoint llama4

* refactor

* Tests

* Feat: works

* Style

* Feat: version checks + minor fixes

* Style

* Fix: version checks in tests

* Feat: move more stuff into tensor_parallel.py
2025-05-19 18:16:21 +00:00
9ecee14378 [doc] fix bugs in how_to_hack_models.md (#38198)
fix several bugs
2025-05-19 10:37:54 -07:00
f524439cc5 Translating model_doc/bert.md to Chinese (#37806)
* Translated model_doc/bert.md

* Revise grammatical errors

* Changed _toctree.yml

* Revise some errors
2025-05-19 10:14:57 -07:00
6e738411e1 Tensor parallel docs (#38178)
* Feat: initial docs

* Feat: update doc

* Final typos/changes

* Refactor: reorder top to bottom.
2025-05-19 17:05:01 +00:00
9c500015c5 🚨🚨🚨 [pipelines] update defaults in pipelines that can generate (#38129)
* pipeline generation defaults

* add max_new_tokens=20 in test pipelines

* pop all kwargs that are used to parameterize generation config

* add class attr that tell us whether a pipeline calls generate

* tmp commit

* pt text gen pipeline tests passing

* remove failing tf tests

* fix text gen pipeline mixin test corner case

* update text_to_audio pipeline tests

* trigger tests

* a few more tests

* skips

* some more audio tests

* not slow

* broken

* lower severity of generation mode errors

* fix all asr pipeline tests

* nit

* skip

* image to text pipeline tests

* text2test pipeline

* last pipelines

* fix flaky

* PR comments

* handle generate attrs more carefully in models that cant generate

* same as above
2025-05-19 18:02:06 +01:00
6f9da7649f [image-text-to-text pipeline] Accept a chat as a positional arg (#38204)
accept chat as a positional arg
2025-05-19 17:26:09 +01:00
7c9b0ca08c [SAM-HQ] Update names in the docs (#38058)
Update names
2025-05-19 09:21:14 -07:00
04282a9ef5 Remove Deprecated verbose arg in LayerWiseDummyScheduler (#38197)
Remove Deprecated args in LayerWiseDummyScheduler
2025-05-19 13:49:11 +00:00
aef12349b6 Make HF implementation match original OLMo 2 models for lower precisions (#38131)
* Make HF implementation match OLMo models for lower precisions

* Add test of 1B logits in bfloat16

* Run make fixup
2025-05-19 15:35:23 +02:00
9644acb7cb [docs] add Audio import (#38195)
add Audio import
2025-05-19 13:16:35 +00:00
7d93f93f83 [docs] minor fixes in models.md (#38193)
minor gix
2025-05-19 13:14:21 +00:00
47f8578d96 Pass eps to Mistral3RMSNorm (#38026)
Pass eps to Mistral3RMSNorm

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-19 15:09:25 +02:00
6c6302817d Resolve Python logger warnings (#38183)
* Resolve Python logger warnings

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* Apply style fixes

---------

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-19 12:53:07 +00:00
003deb16f1 Support for transformers explicit filename (#38152)
* Support for transformers explicit filename

* Tests

* Rerun tests
2025-05-19 14:33:47 +02:00
dbb9813dff [generation] Less verbose warnings by default (#38179)
* tmp commit (imports broken)

* working version; update tests

* remove line break

* shorter msg

* dola checks need num_beams=1; other minor PR comments

* update early trainer failing on bad gen config

* make fixup

* test msg
2025-05-19 10:03:37 +00:00
656e2eab3f Add adam_kwargs for Apollo Optimizer (#38168)
Add adam_kwargs for Apollo
2025-05-19 08:59:49 +00:00
6bb6821d93 Refactor get_XXX_dataloader from Trainer (#38090)
* Remove test_dataloader

* refactor
2025-05-19 10:43:27 +02:00
40a493c7ed [tests] remove test_sdpa_equivalence (redundant) (#37911)
* rm test_sdpa_equivalence

* make fixup

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-16 18:37:27 +01:00
ea29f61ed9 fix bug in distributed loss test (#38166)
* fix bug in distributed loss test and change some config to pass at both 2&8 gpus

* fix doc
2025-05-16 16:21:35 +00:00
a4389494c7 Fix import torchao.prototype.low_bit_optim since torchao v0.11 (#38174)
* Fix ModuleNotFoundError torchao.prototype.low_bit_optim since torchao v 0.11.0

* Fix space on blank line

* update torchao's AdamW4bit and AdamW8bit import for v0.11.0

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-16 18:02:33 +02:00
0ba95564b7 Add args support for fast image processors (#37018)
* add args support to fast image processors

* add comment for clarity

* fix-copies

* Handle child class args passed as both args or kwargs in call and preprocess functions

* revert support args passed as kwargs in overwritten preprocess

* fix image processor errors
2025-05-16 12:01:46 -04:00
d69945e5fc [ESM] Add flash-attention-2 backend for ESM-2 (#38023)
* Add flash-attention-2 backend for ESM-2

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* update extended_attention_mask for fa2

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* add test_flash_attn_2_equivalence test

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
2025-05-16 14:11:56 +01:00
7b5e327c6e Feat: add warnings for unused keys and rules in tensor parallel (#37893)
Feat: tensor parallel plan verification
2025-05-16 14:52:47 +02:00
120935234f remove some commands from fetch_tests CircleCI job (#38176)
delete

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-16 14:42:50 +02:00
91f6fa00f4 Disable convert to draft workflow (#38177)
delete

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-16 14:42:14 +02:00
5036ec8872 Disable Trigger CircleCI by ready for review (#38171)
delete

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-16 14:02:48 +02:00
7f28da2850 clean autoawq cases on xpu (#38163)
* clean autoawq cases on xpu

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-16 13:56:43 +02:00
01ad9f4b49 Bart: new cache format (#35314)
* bart compile

* add mbart

* some more models touched by fix-copies

* more

* more models

* even more models

* fix copies

* fix tests

* fix copies

* fix

* biogpt accepts position ids now (breaking?)

* fix failing non-slow tests

* fix some tests

* should not be removed

* small update

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update for last `main`

* fix copies

* clone `update_causal_mask` from llama

* tmp

* fixup

* why? how?

* fix bart tests

* dont skip test

* address comments

* fix tests

* fix

* fixup and delete the file

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-05-16 13:26:54 +02:00
3ab47b6ce3 [VLMs] add helpers to get multimodal encodings (#37743)
* add helpers in VLMs

* fix tests and copies

* fix blip tests

* make fix-copies

* fix copies

* fixup
2025-05-16 13:20:10 +02:00
1e921a3a9c Add optional RMSNorm support to BitNet quantization (config + layers) (#38087)
* enable optional RMS in BitLinear

* Fix naming

* Import RMS from Llama using config.*

* make fix-copies

* ran CI loop

* remove default BitNetQuantConfig values

* Fix BitNetQuantConfig to be Optional

* Fix config docstrings to match Optoinal

* Edit docstrings to match standards

---------

Co-authored-by: steinmetzc <codysteinmetz7@gmail.com>
Co-authored-by: codys12 <steinmetzc@dh-mgmt4.hpc.msoe.edu>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-05-16 12:38:06 +02:00
57a79f51b2 Fix Qwen2.5 Omni SinusoidsPositionEmbedding precision (#38151)
* Fix Qwen2.5 Omni `SinusoidsPositionEmbedding` precision

fixes https://github.com/QwenLM/Qwen2.5-Omni/issues/271

* Update modular_qwen2_5_omni.py
2025-05-16 12:24:50 +02:00
44fa04ae8d Include output embedding as well with include_embedding flag (#37935)
* Include output embedding as well with `include_embedding` flag

Summary:
att

Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding

Reviewers:

Subscribers:

Tasks:

Tags:

* format

* rename include_embedding to include_input_output_embeddings

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-05-16 12:06:11 +02:00
34c1e29cdd enable autoround cases on XPU (#38167)
* enable autoround cases on XPU

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-16 09:08:35 +00:00
0f77ca72ca [FIX] Save speed metrics to logs (#38136)
Previously, we calculated speed metrics and did not do anything with the result.
2025-05-15 16:58:50 +02:00
27ef46e846 Omit creation of positional IDs within ESM if applicable (#38089)
* omit pos emb creation

* rft

---------

Co-authored-by: sgottreich <sgottreich@absci.com>
2025-05-15 14:09:21 +00:00
fe9426f12d disable deepspeed when setting up fake trainer (#38101)
* disable deepspeed when setting up fake trainer

* Apply style fixes

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-15 15:34:04 +02:00
7caa57e85e enable trainer test cases on xpu (#38138)
* enable trainer test cases on xpu

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-15 12:17:44 +00:00
b11b28cc4e Hotfix: Flash Attention 2 support in Pixtral (#38146)
setting attention_mask to None when flash_attention_2 is selected

Co-authored-by: aurelien.lac <aurelien.lac@lighton.ai>
2025-05-15 11:45:35 +02:00
0e0e5c1044 [generate] Run custom generation code from the Hub (#36405)
* mvp

* remove trust_remote_code

* generate_from_hub

* handle requirements; docs

* english

* doc PR suggestions

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* changed remote code path to generate/generate.py

* model repo has custom generate -> override base generate

* check for proper inheritance

* some doc updates (missing: tag-related docs)

* update docs to model repo

* nit

* nit

* nits

* Update src/transformers/dynamic_module_utils.py

* Apply suggestions from code review

* Update docs/source/en/generation_strategies.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* trust remote code is required

* use new import utils for requirements version parsing

* use  org examples

* add tests

* Apply suggestions from code review

Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>

* ascii file structure; tag instructions on readme.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
2025-05-15 10:35:54 +01:00
955e61b0da Remove head mask in generative models (#35786)
* just squash into one commit

* delete print
2025-05-15 10:44:19 +02:00
0173a99e73 enable csm integration cases on xpu, all passed (#38140)
* enable csm test cases on XPU, all passed

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-15 09:46:29 +02:00
e5a48785d9 [Qwen3] Qwen3 MoE add tp plan for expert mlps (#38135)
fix tp plan
2025-05-15 09:12:39 +02:00
4005e30c80 Fix incorrect attention mask truncate in WhisperFlashAttention2 (#36477)
* Fix incorrect attention mask truncate in whisper flash attention

* also fix incorrect attention mask truncate in qwen2 audio

* Nit attention mask truncate modeling_qwen2_audio.py

* Nit attention mask truncate modeling_whisper.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-05-14 20:08:31 +00:00
aa27fa75cd enable d_fine finetuning properly (#37962)
add pre_output in the front

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-05-14 16:53:04 +01:00
e021bf6bf8 Add manueldeprada to run_slow whitelist (#38126)
Add manueldeprada to run_slow allowed users
2025-05-14 15:16:58 +02:00
ef27b2bc22 [docs] add uv installation instructions for source builds (#37968) 2025-05-14 13:09:41 +00:00
4a2decd192 Update trainer.md (#38113)
Fix typo in torch.compile method parameters
2025-05-14 12:40:00 +00:00
935bbbc711 Add config validation and style tweaks (#37589)
* Add config validation and style tweaks

* Fix style issues

* Fix style issues

* style

* Small fixes for copy/paste errors

---------

Co-authored-by: Cyrile <cyrile.delestre@arkea.com>
2025-05-14 12:22:10 +00:00
1b00966395 Fix auto batch size finder test (#38125)
Ensure --auto_find_batch_size is the last test arg so indexing is correct
2025-05-14 12:12:04 +00:00
fe918d13b9 Fix temporal padding in Qwen2VLImageProcessor when the number of frames is not divisible by temporal_patch_size (#38076)
Qwen2VL: Fix temporal padding in Qwen2VLImageProcessor when frames are not divisible by temporal_patch_size
2025-05-14 12:28:21 +02:00
aaf224d570 [video processor] fix tests (#38104)
* fix tests

* delete

* fix one more test

* fix qwen + some tests are failing irrespective of `VideoProcessor`

* delete file
2025-05-14 10:24:07 +00:00
9b5ce556aa enable finegrained_fp8 and granite_speech cases on XPU (#38036)
* enable finegrained_fp8 cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* change back to auto

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* rename per comments

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-05-14 08:58:40 +00:00
b311a3f506 Fix description and formatting errors in code docs (#38074)
* Update stopping_criteria.py

Fix description and formatting errors.

* Update stopping_criteria.py

Align formatting with existing files for consistency.
2025-05-13 17:17:15 +00:00
b499a14b17 Add style bot (#38102)
add style bot
2025-05-13 19:07:17 +02:00
e0f225cb10 [CSM] update test for t4 runners (#38110)
update test for t4 runners
2025-05-13 11:59:26 -04:00
342961f669 Add Fast Image Processor for vilt (#37304)
* init vilt image processor fast

* Refactor image processor tests to use loop for all processors

* Add ViltImageProcessorFast with PyTorch-based optimized image processing

* Change made automatically by make fixup command

* Change made automatically by make fix-copies command

* Fix type hints in ViltImageProcessorFast for Python compatibility

* Define constants for image resizing based on COCO dataset aspect ratio

* Add missing property initializations to ViltImageProcessorFast

* Extract resize logic into dedicated method in ViltImageProcessorFast

* Extract padding logic into dedicated method

* Implement shape-based image grouping for optimized processing in Vilt

* Update test suite to verify ViltImageProcessorFast attributes

* Move variable declarations to _preprocess method parameters

* Remove unused parameters

* Rename _resize method to resize to override existing function

* Remove whitespace

* Remove unnecessary type check and conversion for stacked_images

* Remove redundant loop and apply padding directly to stacked images

* Refactor pad function to return images and mask as tuple instead of dict

* Add tests comparing padding masks in slow and fast implementations

* Update ViltImageProcessor tests to ensure compatibility between slow and fast implementations

* Replace add_start_docstrings with auto_docstring in ViltImageProcessorFast

* Move docstrings of custom args to ViltFastImageProcessorKwargs

* Use reorder_images function for both masks and images

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-05-13 15:40:53 +00:00
8771766a70 Fix InternVL interpolate_pos_encoding and add to video_processing_auto (#38092)
* fix InternVL interpolate_pos_encoding

* fix modular and auto_video_processor for internvl
2025-05-13 11:18:40 -04:00
582d5e0e11 fix check_bad commit.py gives wrong results (#38107)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-13 16:58:22 +02:00
a5cc7a67d7 [bug] fix llava processor to calculate unpadding size correctly (#37988)
* fix llava processor to calculate unpad size correctly

* repo consistency

* Revert "repo consistency" & "setUp in llava family"

This reverts commit 26a50af8db5b15bb6b700db3d53342fe69579d8e.

* add edge case test for padding & unpadding

* compute unpadding size from original size

* make test config explicit

* Revert "compute unpadding size from original size"

This reverts commit 752cd27ad9710ab056c17a9986760c4651975540.

* Revert "add edge case test for padding & unpadding"

This reverts commit ccbd094d69c3f8f6a259159164284f60ba835bce.

* revert unpad logic

* remove irrelevant tests

* model test

* remove processor from model test

---------

Co-authored-by: jaycha <jaycha@ncsoft.com>
2025-05-13 13:49:09 +00:00
67b3d45eb6 Fix past_key_values type hint in model output types (#37953)
* F: Fix type hint.

* F: Use Cache type.

* F: Sort import.

* U: Format.

* U: Address reviews.
2025-05-13 13:36:49 +00:00
07feaad8fb Fix bug in prefill_chunk_size that ignores disable_compile flag (#38067)
Fix bug in prefill_chunk_size implementation that ignores disable_compile flag
2025-05-13 13:23:23 +00:00
e40f301f1f [smolvlm] skip the test (#38099)
skip the test
2025-05-13 12:50:43 +00:00
e27d230ddd Disable report callbacks for certain training tests (#38088)
* Disable report callbacks for certain training tests

* Disable report callbacks for test_auto_batch_size_finder
2025-05-13 14:49:55 +02:00
ab65ba47ad fix: Propagate lr_scheduler_kwargs options to create LR Scheduler when LayerWiseDummyOptimizer is used (#34559)
fix: fix get_scheduler
2025-05-13 13:56:45 +02:00
8fb60bf6be add timeout for downloading the librispeech_asr dataset (#38073)
* add timeout

* change 10 to 60
2025-05-13 11:50:12 +01:00
3ad35d0bca update require_read_token (#38093)
* update require_read_token

* new repo

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-13 12:07:07 +02:00
e3b70b0d1c Refactor image processor phi4 (#36976)
* refactor image processor phi4

* nits fast image proc

* add image tests phi4

* Fix image processing tests

* update integration tests

* remove revision and add comment in integration tests
2025-05-12 15:13:40 -04:00
4143f94d51 uninstall kernels from docker images (#38083)
uninstall kernels

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-12 18:03:47 +02:00
a63cb7578e update seed_worker to set seed based on worker_id and rank (#37980)
* update seed_worker to set seed based on worker_id and rank

* test case

* set output_dir as remove tmp dir
2025-05-12 15:59:16 +00:00
e387821a96 Fix tot update in trainer (#37923)
* fix total updates in epoch

* add test; fix max_steps

* replace with multi-gpu decorator
2025-05-12 17:45:24 +02:00
f0e975c6cf fix the inconsist docstring in apply_chat_template (#38069)
The commit (5cf11e5ab9) fixed the type hints for the parameter `tools` in apply_chat_template, but the docstring was not changed.
2025-05-12 16:32:01 +01:00
31791b16a1 chore(qwen2): display warning log only when sliding window attention … (#36316)
* chore(qwen2): display warning log only when sliding window attention is enabled

* Align modeling_qwen2.py and modular_qwen2.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-05-12 16:31:44 +01:00
8ea72d12a2 Fix mt5 test on AMD devices (#38081) 2025-05-12 16:59:00 +02:00
5c85018072 docs: fix md style (#38057) 2025-05-12 15:56:31 +01:00
7eaa90b87b Add AMD expectation to test_gpt2_sample (#38079) 2025-05-12 16:51:21 +02:00
4220039b29 Fix OneFormer integration test (#38016)
* Fix integration tests

* format
2025-05-12 16:02:41 +02:00
8efe3a9d77 [chat] generate parameterization powered by GenerationConfig and UX-related changes (#38047)
* accept arbitrary kwargs

* move user commands to a separate fn

* work with generation config files

* rm cmmt

* docs

* base generate flag doc section

* nits

* nits

* nits

* no <br>

* better basic args description
2025-05-12 14:04:41 +01:00
a5c6172c81 [VLM] fix loading issues (#38051)
* fix qwen2-vl loading

* fix a few nore models

* delete print

* fix copies
2025-05-12 10:14:04 +00:00
a31fa218ad 🔴 Video processors as a separate class (#35206)
* initial design

* update all video processors

* add tests

* need to add qwen2-vl (not tested yet)

* add qwen2-vl in auto map

* fix copies

* isort

* resolve confilicts kinda

* nit:

* qwen2-vl is happy now

* qwen2-5 happy

* other models are happy

* fix copies

* fix tests

* add docs

* CI green now?

* add more tests

* even more changes + tests

* doc builder fail

* nit

* Update src/transformers/models/auto/processing_auto.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* small update

* imports correctly

* dump, otherwise this is getting unmanagebale T-T

* dump

* update

* another update

* update

* tests

* move

* modular

* docs

* test

* another update

* init

* remove flakiness in tests

* fixup

* clean up and remove commented lines

* docs

* skip this one!

* last fix after rebasing

* run fixup

* delete slow files

* remove unnecessary tests + clean up a bit

* small fixes

* fix tests

* more updates

* docs

* fix tests

* update

* style

* fix qwen2-5-vl

* fixup

* fixup

* unflatten batch when preparing

* dump, come back soon

* add docs and fix some tests

* how to guard this with new dummies?

* chat templates in qwen

* address some comments

* remove `Fast` suffix

* fixup

* oops should be imported from transforms

* typo in requires dummies

* new model added with video support

* fixup once more

* last fixup I hope

* revert image processor name + comments

* oh, this is why fetch test is failing

* fix tests

* fix more tests

* fixup

* add new models: internvl, smolvlm

* update docs

* imprt once

* fix failing tests

* do we need to guard it here again, why?

* new model was added, update it

* remove testcase from tester

* fix tests

* make style

* not related CI fail, lets' just fix here

* mark flaky for now, filas 15 out of 100

* style

* maybe we can do this way?

* don't download images in setup class

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-05-12 11:55:51 +02:00
716819b830 fix(conversion): Fix size mismatch error during TF->PT model loading (#38014) 2025-05-10 11:11:07 +00:00
8f08318769 enable generation fsdp/utils cases on XPU (#38009)
* enable generation fsdp/utils test cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* use backend_xx APIs

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-05-09 20:52:41 +00:00
87e971e14d Fix linalg.norm for CovnNextV2 (#38015)
Fix norm
2025-05-09 17:44:28 +01:00
aaed2f5577 Fix cache update! (#38046)
* fix slicing

* better fix
2025-05-09 17:54:48 +02:00
7f1a97bae3 Fix reduce-labels in BEIT Fast Image Processor (#38042)
* Fixed reduce-labels

* Little doc fix

* Change docstring
2025-05-09 11:51:46 -04:00
9f9020fed3 Re-Enable Trigger CircleCI via GitHub Actions when "ready for review" (#37885) (#38041)
* check actions

* trigger CI

* check actions

* finally

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-09 16:57:54 +02:00
23d79cea75 Support for version spec in requires & arbitrary mismatching depths across folders (#37854)
* Support for version spec in requires & arbitrary mismatching depths

* Quality

* Testing
2025-05-09 15:26:27 +02:00
774dc274ac Do not erase a cache_position passed explicitly to generate(), if there is one (#37986)
Do not erase a cache_position initialization passed explicitly to generate(), if there is one.

But: Let initialization replace cache_position if it's set to None. I assume that if the value is explicitly passed but None, we should initialize anyway.
2025-05-09 10:56:21 +00:00
0010b41524 Disable Trigger CircleCI via GitHub Actions when ready for review` (#38038)
disable

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-09 12:27:53 +02:00
d498528800 Trigger CircleCI via GitHub Actions when ready for review (#37885)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-09 11:45:03 +02:00
66e696ee15 [Temporary] Log some information in some pytest/pluggy internal places (#37996)
log pytest info

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-09 11:06:37 +02:00
a72cb31434 enable utils test cases on XPU (#38005)
* enable utils test cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* Update tests/utils/test_skip_decorators.py

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

* fix comment

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-05-09 08:45:01 +02:00
1dfad4beb2 make mistral3 pass on xpu (#37882)
* enabled mistral3 test cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* calibrate A100 expectation

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update

* update

* update

* update

* update

* update

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-09 06:41:11 +00:00
121f7037c7 fix document masking for chunked attention (#37429)
* fix document masking for chunked attention

* remove accidental debugging sum
2025-05-09 08:22:00 +02:00
5f5ccfdc54 [AutoDocstring] Based on inspect parsing of the signature (#33771)
* delete common docstring

* nit

* updates

* push

* fixup

* move stuff around fixup

* no need for dataclas

* damn nice modular

* add auto class docstring

* style

* modular update

* import autodocstring

* fixup

* maybe add original doc!

* more cleanup

* remove class do cas well

* update

* nits

* more celanup

* fix

* wups

* small check

* updatez

* some fixes

* fix doc

* update

* nits

* try?

* nit

* some updates

* a little bit better

* where ever we did not have help we are not really adding it!

* revert llama config

* small fixes and small tests

* test

* fixup

* more fix-copies

* updates

* updates

* fix doc building

* style

* small fixes

* nits

* fix-copies

* fix merge issues faster

* fix merge conf

* nits jamba

* ?

* working autodoc for model class and forward except returns and example

* support return section and unpack kwargs description

* nits and cleanup

* fix-copies

* fix-copies

* nits

* Add support for llava-like models

* fixup

* add class args subset support

* add examples inferred from automodel/pipelines

* update ruff

* autodocstring for Aria, Albert + fixups

* Fix empty return blocks

* fix copies

* fix copies

* add autodoc for all fast image processors + align, altclip

* fix copies

* add auto_doc for audio_spectrogram, auto_former, bark, bamba

* Drastically improve speed + add bart beit bert

* add autodoc to all bert-like models

* Fix broken doc

* fix copies

* fix auto_docstring after merge

* add autodoc to models

* add models

* add models

* add models and improve support for optional, and custom shape in args docstring

* update fast image processors

* refactor auto_method_docstring in args_doc

* add models and fix docstring parsing

* add models

* add models

* remove debugging

* add models

* add fix_auto_docstrings and improve args_docs

* add support for additional_info in args docstring

* refactor (almost) all models

* fix check docstring

* fix -copies

* fill in all missing docstrings

* fix copies

* fix qwen3 moe docstring

* add documentation

* add back labels

* update docs and fix can_return_tuple in modular files

* fix LongformerForMaskedLM docstring

* add auto_docstring to _toctree

* remove auto_docstring tests temporarily

* fix copyrights new files

* fix can_return_tuple granite hybrid

* fix fast beit

* Fix empty config doc

* add support for COMMON_CUSTOM_ARGS in check_docstrings and add missing models

* fix code block not closed flava

* fix can_return_tuple sam hq

* Fix Flaubert dataclass

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-05-08 17:46:07 -04:00
d231f5a7d4 update bnb tests (#38011)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-05-08 20:35:24 +00:00
b3db4ddb22 enable mamba2 integration cases on xpu (#38006)
* enable mamba2 integration cases on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-05-08 19:48:09 +00:00
c7c2f08994 make test_speculative_decoding_non_distil device-agnostic (#38010)
* make device-agnostic

* use condition

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-08 19:19:47 +00:00
d23aae2b8c [VLMs] support attention backends (#37576)
* update models

* why rename

* return attn weights when sdpa

* fixes

* fix attn implementation composite

* fix moshi

* add message

* add typings

* use explicitly all flags for each attn type

* fix some tests

* import what is needed

* kosmos on main has ew attention already, yay

* new models in main, run fixup

* won't fix kosmos yet

* fix-copies

* clean up after rebasing

* fix tests

* style

* dont cast attns to fp32

* did we update ruff? oke, let's just do what it asks

* fix pixtral after rebase
2025-05-08 18:18:54 +02:00
e296c63cd4 Fix wording in torchscript.md (#38004)
Fix wording in torchscript.md
2025-05-08 16:47:45 +01:00
1c65aef923 Fix incorrect installation instructions (for issue #37476) (#37640)
* debugging issue 36758

* debugging issue 36758

* debugging issue 36758

* updated attn_mask type specification in _flash_attention_forward

* removed pdb

* added a blank line

* removed indentation

* update constants

* remove unnecessary files

* created installation script, modified README

* modified requirements and install.sh

* undo irrelevant changes

* removed blank line

* fixing installation guide

* modified README, python requirements, and install script

* removed tests_otuput

* modified README

* discarded installation script and python<3.13 requirement
2025-05-08 16:32:58 +01:00
f2909e024c Skip test_push_to_hub_with_saves_each_epoch for now (#38022)
* update

* trigger CI

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-08 16:26:24 +02:00
f2b59c6173 [caches] Raise exception on offloaded static caches + multi device (#37974)
* skip tests on >1 gpu

* add todo
2025-05-08 14:37:36 +01:00
4279057d70 [CI] remove duplicated message on GH comment to run slow tests (#37970)
duplicated msg
2025-05-08 14:35:54 +01:00
3390534f36 Print commit SHA on slack message for new model notification. (#38019)
add commit info

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-08 15:26:19 +02:00
9f8fffed3c Fix Optional typing (#38018)
* Fix

* trigger
2025-05-08 14:51:45 +02:00
06c16de3d3 Enable RUF013 to enforce optional typing (#37266)
* Enable RUF013 for Optional typing

Signed-off-by: cyy <cyyever@outlook.com>

* Add Optional to types

* Format code

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-05-08 12:39:56 +02:00
f6664ee713 Add ALL_ATTENTION_FUNCTIONS compatibility for Pixtral model (#37960)
* Add ALL_ATTENTION_FUNCTIONS compatibility for Pixtral model

* Fix invalid operand type

* Allow image_sizes to be optional in forward pass to fit tests

Disallow using sdpa and output_attentions

* Disallow using sdpa with output_attentions

* Delete useless comments, use eager attention from smolvlm, use pattern from mistral

* add _supports_attention_backend

* use kwargs instead of position_ids

---------

Co-authored-by: aurelien.lac <aurelien.lac@lighton.ai>
2025-05-08 12:13:13 +02:00
015b6dfbf8 Fix pad image transform for batched inputs (#37544)
* fix

* add batch dimension to expected output
2025-05-08 10:51:15 +01:00
5c47d08b0d Add Swin2SR ImageProcessorFast (#37169)
* Add fast image processor support for Swin2SR

* Add Swin2SR tests of fast image processing

* Update docs and remove unnecessary test func

* Fix docstring formatting

* Skip fast vs slow processing test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-05-07 12:20:16 -04:00
17742bd9c8 🔴 [VLM] Add base model without head (#37033)
* i guessreverted all CdGen classes

* style

* llava onevision

* fix copies

* fix some tests

* some more tests

* dump

* skip these

* nevermind, i am dumb

* revert fix not needed

* fixup

* fixup

* another fixup

* more fixup to make ci finally happy

* fixup after rebasing

* fix qwen tests

* add internVL + typos here and there

* image token index -> id

* style

* fix init weights

* revert blip-2 not supported

* address comments

* fix copies

* revert blip2 test file as well

* as discussed internally, revert back CdGen models

* fix some tests

* fix more tests for compile

* CI red

* fix copies

* enumerate explicitly allowed models

* address comments

* fix tests

* fixup

* style again

* add tests for new model class

* another fixup ( x _ x )

* [fixup] unused attributes can be removed post-deprecation
2025-05-07 17:47:51 +02:00
3fa8d9c20e [CSM] tiny fix on generation (#38001)
nit
2025-05-07 11:45:23 -04:00
798f948e88 Add CSM model (#36719)
* draft structure

* depth decoder with forward pre hook

* full model forward draft

* draft update

* depth decoder update

* ConversationalSpeechModelForCausalLM udpates

* add generate

* max length criteria small fix

* udpate

* updates

* generation update

* update in loss compute

* conversion script

* update for correct input embeddings

* handle interleaved rope

* update

* update

* update

* support compile

* update training

* add doc

* update doc

* correct inits

* ConversationalSpeechModel -> Csm

* conf update

* name update

* tests CsmForCausalLMTest

* convert use cached_file

* conf + modeling updates

* generate utils handle third dim shape

* integration test

* modeling + conf updates

* common test handle more than 2 dims

* add nested audio list utils

* processing handle nested audio list

* csm processing draft

* mimi util

* init updates

* modular update

* convert modular

* processing update

* csm tests update

* generate tests handle third dim

* generate utils handle third dim

* propagate _get_initial_cache_position update

* tied_weight_keys update + convert correctly

* fix inputs_embeds

* revert audio nested list

* batch inference update + return audio

* audio_utils update

* processor update

* some more integration tests

* remove old test

* porcessing output labels

* improve

* fix

* update rope values with equivalent ones

* conversion update

* udpate tests

* handle depth decoder generation config

* remove default eos_token_id

* make style

* revert modeling_mimi

* add default generation_config

* remove sdpa since handled by default

* make

* fix conflict

* fix conflicts

* correct naming

* correct imports

* make

* causal -> conditional naming

* causal -> conditional naming

* auto update

* make

* make

* add doc

* test update

* fix weight init

* audio tokens offsets as buffer

* 4d mask in conditional class

* make

* doc update

* fix causal mask

* fix causal mask

* doc update

* doc update

* add processor doc

* update doc

* fix 4d causal mask

* update make_list_of_audio

* do not default to mutable

* remove duplicates

* remove useless reset_parameters

* use GradientCheckpointingLayer

* use can_return_tuple

* formatting

* prepend placeholder in _sample

* torch compile fix

* some more fixies

* convert modular

* fix

* default max_length in convert

* handle depth decoder generation config correctly

* clearer formulation

* handle output_loading_info

* handle softmax warning

* add doc

* propagate _get_initial_cache_position changes

* generation in its own module

* add processor tests

* fix compile witu cuda graphs

* fix compile with cuda graphs

* add csm.md

* include CSM loss

* doc nit

* doc nit

* doc nit

* Update docs/source/en/model_doc/csm.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add save_audio to processor

* Update src/transformers/models/csm/modular_csm.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* doc update

* simplify audio_codes_mask computation

* doc update

* simplify loss computation

* fix static cache test

* fix

* remove comment

* simplify encoded length computation

* use hf-internal-testing

* doc update

* cast to float before numpy

* nit

* mem efficient codebook head

* nit

* cat input values with cutoffs

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-07 10:20:13 -04:00
c8607a17cb Add a check to import_utils.py to allow for use of faiss_gpu installation (#37997)
Adding check to import_utils.py for faiss_gpu
2025-05-07 14:27:41 +01:00
fb1e3a4daa remove duplicate code (#37991)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-05-07 13:46:45 +01:00
8a9441d26d [chat template] separate jinja logic from tokenizers (#37602)
* split oit jinja

* raise error
2025-05-07 14:18:03 +02:00
038f8fc159 make aya vision 5 integration tests pass on xpu (#37990)
* 5 aya vision integration pass on XPU

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-07 11:16:38 +02:00
a9384f849a [offload] respect max_memory argument when factoring in unused reserved memory (#37982) 2025-05-07 09:49:31 +01:00
0b037fd425 Fix Qwen models export with torch 2.7 (#37985)
Co-authored-by: Guang Yang <guangyang@fb.com>
2025-05-07 09:13:08 +02:00
3c0796aaea [Fast Processor] BEiT (#37005)
* adding fast processor for beit

* adding resample

* address review issues and add segmentation maps logic

* style

* chore: adding tests

* reduce label test

* adding batched tests

* Update src/transformers/models/beit/image_processing_beit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* fix imports and make segmentation masks

* fix tests

* build segmentation maps

* all tests pass

* style

* style fix

* style

* chore: delete demo.py file

* review suggestions

* Update docs/source/en/model_doc/beit.md

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-05-06 17:40:28 -04:00
ebbe9b12dd Fix donut backtracking (#37788)
* Fix donut backtracking

* make fixup

* Trigger tests

* Remove old line

* Update code

* Fix reversed slice
2025-05-06 17:39:04 +01:00
06c4d05fe6 Enable granite speech 3.3 tests (#37560)
* Enable granite speech 3.3 tests

* skip sdpa test for granite speech

* Explicitly move model to device

* Use granite speech 2b in tests

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-06 17:56:18 +02:00
031ef8802c fix FSDP + torch.compile bug when saving pretrained model (#37725)
* args keep_torch_compile=False in _save and _wwrap_method

* Fix FSDP execution on evaluation  for torch_compile mode

* add test trainer FSDP + Torch Compile

* fix quality code

* make style

* Revert " make style"

This reverts commit 77e797f8829c50992cc21496be3d9a3e480e1c97.

* make style
2025-05-06 17:51:28 +02:00
5534b80b7f enable xpu in test_trainer (#37774)
* enable xpu in test_trainer

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enhance _device_agnostic_dispatch to cover value

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* add default values for torch not available case

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-05-06 17:13:35 +02:00
7db5d5b9ea Fix typo (#37964) 2025-05-06 14:59:00 +01:00
af2866a8b1 [speech2text] fix init of sinusoidal embeddings (#37931)
* fix init (meta device -> bad numbers)

* fast test

* dont init sinusoidal twice

* make fixup
2025-05-06 14:49:00 +01:00
274e79b326 Fix typos (#37978)
fix typos
2025-05-06 14:45:20 +01:00
057ae00504 Small typo lines 47 and 199 perf_infer_gpu_one.md (#37938)
* Small typo line 199 perf_infer_gpu_one.md

* Typo l. 47 perf_infer_gpu_one.md
2025-05-06 14:32:55 +01:00
cc68070d41 fix docs serving typos. (#37936)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-05-06 14:32:44 +01:00
b1375177fc add job links to new model failure report (#37973)
* update for job link

* stye

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-06 15:10:29 +02:00
acded47fe7 [llava] one pixel is missing from padding when length is odd (#37819)
* [fix] one pixel should be added when length is odd

* [fix] add vision_aspect_ratio args & typo

* [fix] style

* [fix] do not fix fast file directly

* [fix] convert using modular

* remove duplicate codes

* match unpad logic with pad logic

* test odd-sized images for llava & aria

* test unpad odd-sized padding for llava family

* fix style

* add kwarg to onvision modular

* move vision_aspect_ratio from image_processor to processor
(llava_onevision)
2025-05-06 13:11:26 +02:00
9981214d32 [tests] Smaller model in slow cache tests (#37922) 2025-05-06 11:15:25 +01:00
ff5ef95db7 add xpu memory check (#37969)
add xpu check
2025-05-06 11:57:49 +02:00
7cc78804ba 🚨🚨🚨 Fix forward of Dinov2ForImageClassification for models with registers (#37836)
* add num_tokens_to_discard to the forward of Dinov2ForImageClassification

* redefine forward in modular file, remove change to modeling_dinov2 file

* run make fixup

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-05-06 11:55:53 +02:00
471958b620 Add GraniteMoeHybrid support for 4.0 (#37658)
* initial config and MLA layer

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* first pass at decoder

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* completion of layers

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* modeling class

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* adding hybrid class to imports

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix imports granitemoehybrid

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix granitehybrid imports

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix granitehybrid import

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add some comments

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* minor fixes in layers

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add sharedMLP layer

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* correct layer names

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fixes in mamba config

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix mamba config

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* change name of MLP layer

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix seq mizer layers

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* correct mamba config

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fixes in param names

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* enable hybrid model

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update config

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix config granite hybrid

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix attention layer

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* cleanup to re-use mamba code

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* keep layer types

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* attention bias cleanup

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update mamba layer name

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* first pass at tests

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* first pass at tests

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* use granite attention

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix: self attn weights

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* pass at making pos_emb optional

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* initialize self_attn only as needed

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* overwrite forward to create HybridMambaCache

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* Log invalid layer types

* Add attention outputs test

* Only emit attentions/logits if not None

* Fix config test hidden size divisibility

* mark granitmoehybrid as stateful

* Initialize mamba convolutional layers

* Formatting fixes

* config docstring, removed some unused attrs

* Fix missing arg in models test

* Fix create and check decoder model test

* support logits to keep in granitemoe

* regen to pass logits_to_keep

* Allow None or rope

* Fix gradient checkpointing

* Add granitemoehybrid as special cache for generate check

* Remove unused MLA refs

* Fix mamba layer mask

* Remove logits to keep from config

* Minor docstring nits

* Update licenses

* Enable cache by default

* map layer types to layer block type

* First pass at granite moe hybrid docs

* Ignore granite moe hybrid in valid checkpoint check

* Align attention interfaces

* regenerate modular granitemoeshared attention interface

* Align granite moe hybrid attn interface

* run formatting

* Handle mamba initialization

* avoid conditional attr defs

* Move hybrid layer validation to config

* Add placeholder integration tests

* Docs nits / Update model names

* Clean up forward conditions

* Use gradient checkpointing layer

* Remove some copied bamba tests + inherit

align test init

delete more tests

Use common layer init with bamba tests

finish test consolidation

* avoid redundant intermediate std var

* use @can_return_tuple

* Remove unused moe state

* make skipped test names consistent

* Fix docstring order

* Add missing toc

* Always create the shared mlp

* Fix name in docstring

* link preview model in docs

---------

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-05-06 06:47:43 +02:00
fe29b8c487 [Ready to Merge][HFQuantizer] Squelch pydantic warnings (#37726)
replace dict with model_dump

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-05-05 20:38:49 +02:00
46c0e1ff80 Fix incorrect type annotation in get_auxiliary_logits (#37955)
Correct type annotation from Dict(str, Tensor) to Dict[str, Tensor]
2025-05-05 19:00:49 +01:00
d80f53fa50 [generate] Fix vocab_size access for multimodal models (#37937)
Implements last migrations for generation from `config.vocab_size` to `config.get_text_config().vocab.size`

In doing so, we enable multimodal models to fully leverage all existing generation features.
2025-05-05 15:56:56 +01:00
7819911b0c Use T4 single GPU runner with more CPU RAM (#37961)
larger T4 single GPU

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-05 16:17:45 +02:00
3b067a15dd [core] reuse unused reserved cuda memory when loading models (#37920) 2025-05-05 15:14:05 +01:00
afbc293e2b More fault tolerant notification service (#37924)
* Let notification service succeed even when artifacts and reported jobs on github have mismatch

* Use default trace msg if no trace msg available

* Add pop_default helper fn

* style
2025-05-05 15:19:48 +02:00
36ca58bf4f [D-FINE] Update names (#37957)
* Update names

* Fix modular

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-05-05 13:05:46 +01:00
2932f318a2 [docs] logits docstring (#37929) 2025-05-02 16:38:35 +01:00
fa3c3f9cab Break weight tying when quantizing input embedding (#37905)
Summary:
Currently when we try to quantize input_embedding for some models, the output embedding
(lm_head) will also be quantized the same way, since they are tied, and this may not be what
we want. To break the tie, we added the option to allow people to
1. load unquantized weight
2. tie weights
3. quantize

so that the tie will be broken

Test Plan:
```
from transformers import (
  AutoModelForCausalLM,
  AutoProcessor,
  AutoTokenizer,
  TorchAoConfig,
)
from torchao.quantization.quant_api import (
    IntxWeightOnlyConfig,
    Int8DynamicActivationIntxWeightConfig,
    AOPerModuleConfig
)
from torchao.quantization.granularity import PerGroup, PerAxis
import torch

model_id = "microsoft/Phi-4-mini-instruct"

embedding_config = IntxWeightOnlyConfig(
    weight_dtype=torch.int8,
    granularity=PerAxis(0),
)
linear_config = Int8DynamicActivationIntxWeightConfig(
    weight_dtype=torch.int4,
    weight_granularity=PerGroup(32),
    weight_scale_dtype=torch.bfloat16,
)
quant_config = AOPerModuleConfig({"_default": linear_config, "model.embed_tokens": embedding_config})
quantization_config = TorchAoConfig(quant_type=quant_config, include_embedding=True, untie_embedding_weights=True)
quantized_model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32, device_map="auto", quantization_config=quantization_config)
tokenizer = AutoTokenizer.from_pretrained(model_id)

print(quantized_model)
print("embed_tokens.weight:", quantized_model.model.embed_tokens.weight)
print("lm head weight:", quantized_model.lm_head.weight)
from transformers.modeling_utils import find_tied_parameters
print(find_tied_parameters(quantized_model))
```
Reviewers:

Subscribers:

Tasks:

Tags:

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-05-02 10:53:23 +02:00
8a0a508f2b Aligning modling code for GPT2 to work with vLLM (fallback) (#36934)
* aligning for vllm

* using input shape rather than attn outputs

* remove demo

* revert Conv1D

* style

* style

* Update src/transformers/models/gpt2/modeling_gpt2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix copies

* Apply suggestions from code review

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* adding docs about vllm

* chore: style

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-02 09:55:16 +02:00
e94a4807df Add usage example for DINOv2 (#37398)
* Add usage example for DINOv2

* More explicit shape names

* More verbose text

* Moved example to Notes section

* Indentation
2025-05-01 08:54:22 -07:00
d20aa68193 🌐 [i18n-KO] Translated gpu_selection.md to Korean (#36757)
* Add _toctree.yml

* feat: serving.md draft

* Add _toctree.yml

* feat: gpu_selection.md nmt draft

* fix: TOC edit

* Update docs/source/ko/serving.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/gpu_selection.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/serving.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update _toctree.yml

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-01 08:44:12 -07:00
ee25d57ed1 Improve performance of load_state_dict (#37902)
Improve performance of load_state_dict
2025-05-01 16:35:17 +02:00
410aa01901 [chat] clean code and add base help (#37892) 2025-05-01 15:12:18 +01:00
5b573bebb9 Fix typos in strings and comments (#37910) 2025-05-01 14:58:58 +01:00
c80f65265b 🚨 rm already deprecated pad_to_max_length arg (#37617)
* rm already deprecated padding max length

* truncate_strategy AS AN ARG is already deprecated for a few years

* fix

* rm test_padding_to_max_length

* rm pad_to_max_length=True in other tests

* rm from common

* missed fnet
2025-05-01 15:21:55 +02:00
7a3e208892 fixed gemma3 collection path pointing to llama 2 collection. (#37899) 2025-04-30 12:50:54 -07:00
86777b5e2f Support AOPerModuleConfig and include_embedding (#37802)
* Support `AOPerModuleConfig` and include_embedding

Summary:
This PR adds support per module configuration for torchao
Also added per module quantization examples:

1. Quantizing different layers with different quantization configs
2. Skip quantization for certain layers

Test Plan:
python tests/quantization/torchao_integration/test_torchao.py -k test_include_embedding
python tests/quantization/torchao_integration/test_torchao.py -k test_per_module_config_skip

Reviewers:

Subscribers:

Tasks:

Tags:

* format

* format

* inlcude embedding remove input embedding from module not to convert

* more docs

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_torchao.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_torchao.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-30 20:16:29 +02:00
c3aeaa8060 Enhance documentation to explain chat-based few-shot prompting (#37828)
* Enhance documentation to explain chat-based few-shot prompting

Updates the documentation on few-shot prompting to illustrate how to structure examples using the chat-based format for instruction-tuned models.

* Update docs/source/en/tasks/prompting.md

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update docs/source/en/tasks/prompting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/prompting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/prompting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/prompting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix typos

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-30 11:00:10 -07:00
36e2e33bbe Fix Qwen3 tp plan with FP8 (#37871)
* update for qwen 3

* fix style

* rm print
2025-04-30 18:14:10 +02:00
8e8025b384 [tests] reset logs in torch.compile test (#37894) 2025-04-30 16:04:28 +01:00
1b222903c3 [tests] Test all cache implementations (#37873) 2025-04-30 15:37:00 +01:00
2c1155519f Support FlaxPreTrainedModel to load model checkpoint from local subfolder safetensors (#37732)
Support FlaxPreTrainedModel to load model checkpoint from subfolder in local directory as safetensors format

Signed-off-by: Yan Zhao <zhao.y4@northeastern.edu>
2025-04-30 16:13:23 +02:00
5b223bbc8c update comment in image_processing_base.py to reference image_process… (#37864)
update comment in image_processing_base.py to reference image_processing_utils_fast
2025-04-30 14:31:29 +01:00
0dffcb0967 Fix: reassign in qwen3 moe model (#37848)
* Fix: reassign in qwen3 moe model

Fix: reassign in qwen3 moe model

* Remove redundant assignment to self.mlp

* make fix-copies

* Revert unwanted style change

* Revert unwanted style change

---------

Co-authored-by: li.ding <int.li.ding@enflame-tech.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
2025-04-30 13:49:59 +01:00
6c5d374d56 uniformize kwargs for VisionTextDualEncoder (#34563)
* Make kwargs uniform for VisionTextDualEncoder

* Add bc for flipped args
2025-04-30 14:32:59 +02:00
4fc976779e Fix qwen2-vl-docs. (#37879)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-30 13:32:21 +01:00
4eb6acc896 make sure lr is not a tensor (#37881)
* make sure lr is not a tensor

* revert change from #37704

* clean up to reduce extra LoC

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-30 14:23:39 +02:00
7be92f9a94 fix error for _register_pytree_node in torch2.1.0 and fix bf16 assertion in xpu and npu (#37839)
* fix error for _register_pytree_node and bf16 assertion

* fix format

* update xpu available assert function
2025-04-30 14:22:53 +02:00
455c3a33b0 update Clean_up_tokenization_spaces typos. (#37865)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-30 13:04:49 +01:00
d538293f62 Transformers cli clean command (#37657)
* transformers-cli -> transformers

* Chat command works with positional argument

* update doc references to transformers-cli

* doc headers

* deepspeed

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
2025-04-30 12:15:43 +01:00
63cd4c76f3 Llama Guard updates (#37872)
* Unhardcode use_chunked_attention, fix no_rope_layers

* Go back to exhaustive list of bools

* Conversion and modeling updates

* Fix rope

* Unhardcode rope

* Fix context length

* style

* Minor updates to conversion

* Use StaticCache

* Minor simplification

* DynamicCache 🤦

* Style

* Style
2025-04-30 10:34:43 +02:00
34f26e2c3e enable internvl UTs on XPU (#37779)
* enable internvl UTs on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style per comments

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Yao Matrix <matrix.yao@intel.com>
2025-04-30 10:29:40 +02:00
a57274466f Allow override inputs to export recipe (#37508)
Add option to specify dynamic shapes during export

Co-authored-by: Guang Yang <guangyang@fb.com>
2025-04-30 10:19:27 +02:00
481de7204c Skip is_flaky tests in the CI (#37723)
* No more red flaky tests in the CI!

* Remove the CircleCI logic as well

* Revert most changes including is_flaky behaviour

* make fixup

* Move to a more sensible place

* Mark a flaky test that failed on this PR!

* correct import

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-30 09:52:21 +02:00
5f8d17268c Update modeling_llama4.py (#37841)
* Update modeling_llama4.py

* Update modeling_llama4.py

* do not pass device

---------

Co-authored-by: raushan <raushan@huggingface.co>
2025-04-30 00:36:02 +02:00
50f8caaa48 🌐 [i18n-KO] Translated electra.md to Korean (#36763)
* docs: ko: electra.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits
2025-04-29 14:03:39 -07:00
91f3e9422f Add Intel Gaudi doc (#37855)
* Add Intel Gaudi doc

* Use "TIP" instead of "NOTE"

* Address comments from reviews
2025-04-29 13:28:06 -07:00
c34afa5957 Processor chat template: pass custom kwargs (#37852) 2025-04-29 21:22:10 +02:00
66ad8b2db0 docs: Details for ambigious channel dimension assignment (#37600)
* docs: Details for ambigious channel dimension inference

* Update src/transformers/image_utils.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-29 08:12:38 -07:00
096f25ae1f Fix Bitnet tokenizer in pipeline (#37861)
add tokenizer
2025-04-29 15:35:02 +02:00
da7ae467c4 Fix cache get item return type hints (#37847)
F: Fix cache return hints

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-29 14:23:52 +01:00
aa6b79db43 Fix check of unecessary packages (issue #37626) (#37825)
* Fix check of unecessary packages (issue #37626)

* Reformat using ruff

* And a condition to avoind the risk of matching a random object in `import_utils`

* Reformat
2025-04-29 14:21:05 +01:00
517367fe9a Revert change that breaks on Torch 2.1 (#37531)
* Revert change that breaks on Torch 2.1

* Add TODO

* Trigger tests

* Trigger tests
2025-04-29 13:27:09 +01:00
755b0fa2fe [tests] reorganize cache tests and clean memory between tests (#37684) 2025-04-29 12:21:14 +01:00
3a1acc36ed [tests] fix flaky pattern in test_generate_continue_from_past_key_values (#37724) 2025-04-29 12:20:42 +01:00
4abeb50f6e Add D-FINE Model into Transformers (#36261)
* copy the last changes from broken PR

* small format

* some fixes and refactoring after review

* format

* add config attr for loss

* some fixes and refactoring

* fix copies

* fix style

* add test for d-fine resnet

* fix decoder layer prop

* fix dummies

* format init

* remove extra print

* refactor modeling, move resnet into separate folder

* fix resnet config

* change resnet on hgnet_v2, add clamp into decoder

* fix init

* fix config doc

* fix init

* fix dummies

* fix config docs

* fix hgnet_v2 config typo

* format modular

* add image classification for hgnet, some refactoring

* format tests

* fix dummies

* fix init

* fix style

* fix init for hgnet v2

* fix index.md, add init rnage for hgnet

* fix conversion

* add missing attr to encoder

* add loss for d-fine, add additional output for rt-detr decoder

* tests and docs fixes

* fix rt_detr v2 conversion

* some fixes for loos and decoder output

* some fixes for loss

* small fix for converted modeling

* add n model config, some todo comments for modular

* convert script adjustments and fixes, small refact

* remove extra output for rt_detr

* make some outputs optionsl, fix conversion

* some posr merge fixes

* small fix

* last field fix

* fix not split for hgnet_v2

* disable parallelism test for hgnet_v2 image classification

* skip multi gpu for d-fine

* adjust after merge init

* remove extra comment

* fix repo name references

* small fixes for tests

* Fix checkpoint path

* Fix consistency

* Fixing docs

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-04-29 12:17:55 +01:00
4602059aae [modular] Fix the prefix-based renaming if the old and new model share a common name suffix (#37829)
* first try

* Fix and set examples

* style

* fix

* Update modular_test_detr.py

* Update image_processing_new_imgproc_model.py

* Update modular_model_converter.py
2025-04-29 10:43:23 +02:00
a847d4aa6b Fast image processor for VitMatte added and bug in slow version fixed (#37616)
* added fast image processor for VitMatte including updated and new tests, fixed a bug in the slow image processor that processed images incorrectly for input format ChannelDimension.FIRST in which case the trimaps were not added in the correct dimension, this bug was also reflected in the tests through incorretly shaped trimaps being passed

* final edits for fast vitmatte image processor and tests

* final edits for fast vitmatte image processor and tests

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-28 14:51:50 -04:00
65e940208c Samhq model addition (#35147)
* added the configuartion for sam_hq

* added the modeelling for sam_hq

* added the sam hq mask decoder with hq features

* added the code for the samhq

* added the code for the samhq

* added the code for the samhq

* Delete src/transformers/models/sam_hq/modelling_sam_hq.py

* added the code for the samhq

* added the code for the samhq

* added the chnages for the modeelling

* added the code for sam hq for image processing

* added code for the sam hq model

* added the required changes

* added the changes

* added the key mappings for the sam hq

* adding the working code of samhq

* added the required files

* adding the pt object

* added the push to hub account

* added the args for the sam maks  decoder

* added the args for the sam hq vision config

* aded the some more documentation

* removed the unecessary spaces

* all required chnages

* removed the image processor

* added the required file

* added the changes for the checkcopies

* added the code for modular file

* added the changes for the __init file

* added the code for the interm embeds

* added the code for sam hq

* added the changes for modular file

* added the test file

* added the changes required

* added the changes required

* added the code for the

* added the cl errors

* added the changes

* added the required changes

* added the some code

* added the code for the removing image processor

* added the test dimensins

* added the code for the removing extra used variables

* added the code for modeluar file hf_mlp for a better name

* removed abbrevaation in core functionality

* removed abbrevaation in core functionality

* .contiguous() method is often used to ensure that the tensor is stored in a contiguous block of memory

* added the code which is after make fixup

* added some test for the intermediate embeddings test

* added the code for the torch support in sam hq

* added the code for the updated modular file

* added the changes for documentations as mentioned

* removed the heading

* add the changes for the code

* first mentioned issue resolved

* added the changes code to processor

* added the easy loading to init file

* added the changes to code

* added the code to changes

* added the code to work

* added the code for sam hq

* added the code for sam hq

* added the code for the point pad value

* added the small test for the image embeddings and intermediate embedding

* added the code

* added the code

* added the code for the tests

* added the code

* added ythe code for the processor file

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code for tests and some checks

* added some code

* added the code

* added the code

* added some code

* added some code

* added the changes for required

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added the code

* added some changes

* added some changes

* removed spaces and quality checks

* added some code

* added some code

* added some code

* added code quality checks

* added the checks for quality checks

* addded some code which fixes test_inference_mask_generation_no_point

* added code for the test_inference_mask_generation_one_point_one_bb

* added code for the test_inference_mask_generation_one_point_one_bb_zero

* added code for the test_inference_mask_generation_one_box

* added some code in modelling for testing

* added some code which sort maks with high score

* added some code

* added some code

* added some code for the move KEYS_TO_MODIFY_MAPPING

* added some code for the  unsqueeze removal

* added some code for the  unsqueeze removal

* added some code

* added some code

* add some code

* added some code

* added some code

* added some testign values changed

* added changes to code in sam hq for readbility purpose

* added pre commit checks

* added the fix samvisionmodel for compatibilty

* added the changes made on sam by cyyever

* fixed the tests for samhq

* added some the code

* added some code related to init file issue during merge conflicts

* remobved the merge conflicts

* added changes mentioned by aruther and mobap

* added changes mentioned by aruther and mobap

* solving quality checks

* added the changes for input clearly

* added the changes

* added changes in mask generation file rgearding model inputs and  sam hq quargs  in processor file

* added changes in processor file

* added the  Setup -> setupclass conversion

* added the code mentioned for processor

* added changes for the code

* added some code

* added some code

* added some code

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-04-28 19:07:09 +02:00
9c5b1319d0 [config] revert #37603 (#37821)
revert
2025-04-28 16:28:30 +02:00
9e730689c3 change XLA deprecated api (#37741)
* deprecated api

* fix
2025-04-28 16:27:41 +02:00
2933894985 Fix error of HPU TP (#37782)
* Fix error of HPU TP

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add the init distrubuted for hpu

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix error of make style

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-04-28 15:47:16 +02:00
da4ff2a5f5 Add Optional to remaining types (#37808)
More Optional typing

Signed-off-by: cyy <cyyever@outlook.com>
2025-04-28 14:20:45 +01:00
1a9188a54e FIX: Faulty PEFT tests (#37757)
Two PEFT tests are actually failing:

tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_delete_adapter
tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_peft_pipeline_no_warning

This must have been going on for some time but was apparently never
noticed. The cause is that the tests themselves are faulty, the PEFT
integration is correct in these cases.

test_delete_adapter

The first faulty test was introduced by #34650. AFAICT, it should never
have passed in the first place, the PEFT integration logic was not
changed in the meantime. At this point, the logs for the PR CI are gone,
so I'm not sure if the test passed back then or not.

test_peft_pipeline_no_warning

This test was introduced in #36783 and should also never have passed, as
the self.assertNoLogs context manager only returns None, thus the assert
should never have worked (mea culpa for suggesting this code snippet).
Here too, the CI logs are deleted by now, so I can't check if the test
already failed back then.
2025-04-28 15:10:46 +02:00
b262680af4 Add Bitnet model (#37742)
* Adding BitNet b1.58 Model

* Add testing code for BitNet

* Fix format issues

* Fix docstring format issues

* Fix docstring

* Fix docstring

* Fix: weight back to uint8

* Fix

* Fix format issues

* Remove copy comments

* Add model link to the docstring

* Fix: set tie_word_embeddings default to false

* Update

* Generate modeling file

* Change config name for automatically generating modeling file.

* Generate modeling file

* Fix class name

* Change testing branch

* Remove unused param

* Fix config docstring

* Add docstring for BitNetQuantConfig.

* Fix docstring

* Update docs/source/en/model_doc/bitnet.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update docs/source/en/model_doc/bitnet.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update bitnet config

* Update explanation between online and offline mode

* Remove space

* revert changes

* more revert

* spaces

* update

* fix-copies

* doc fix

* fix minor nits

* empty

* small nit

* empty

---------

Co-authored-by: Shuming Ma <shumingma@pku.edu.cn>
Co-authored-by: shumingma <shmingm@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-28 15:08:46 +02:00
82862ce443 [RT-DETR] Improve docs (#37814)
Fix docs
2025-04-28 13:19:24 +02:00
97e57b2545 Fix: Correct tensor shape comment in Mamba modeling (#37801)
* Fix: Correct tensor shape comment in Mamba modeling

* Update src/transformers/models/mamba/modeling_mamba.py

* Update src/transformers/models/mamba/modeling_mamba.py

---------

Co-authored-by: ShadyPi <11342288+shadypi@user.noreply.gitee.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-04-28 11:56:42 +01:00
33493542aa [doc] fix the code examples in qwen doc (#37803) 2025-04-28 11:56:32 +01:00
d5fa7d2d19 Fix typos in strings and comments (#37799) 2025-04-28 11:39:11 +01:00
f466603963 Define warmup allocator for torchao quantization (#37764)
* torchao allocator

* add comment

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-28 10:45:55 +02:00
a41b6d9b5c Fix the fsdp config cannot work issue. (#37549)
* Fix the fsdp config cannot work issue.

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Check the fsdp_config type

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add the accelerate_fsdp_config test

Signed-off-by: yuanwu <yuan.wu@intel.com>

* fix error of make style

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add key check

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-28 10:44:51 +02:00
816b37010c Gemma3 is Torch Exportable (#37728)
* Gemma3 is Torch Exportable

* Expand the support to other mdoels using HybridCache

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
2025-04-28 09:36:46 +02:00
SR
397a5ede33 Fix error message in hub.py (#37796)
Fix error message
2025-04-25 14:03:06 -07:00
6ce675ee81 fix performance issue in convert_ids_to_tokens (#37773) 2025-04-25 22:00:50 +02:00
57c620bf8a chore: update SigLIP2 model card (#37624)
* update siglip2 model card

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* address comments

* separate naflex and fixres variant

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-25 12:46:17 -07:00
eb4afdd1fb [i18n-KO] Translated keypoint_detection.md to Korean (#36649)
* fix: manual edits

* fix: manual edits

* fix: manual edits

* Update docs/source/ko/tasks/keypoint_detection.md

Anchor lower modify

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

connect letter

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify to usual words

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify extension word

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify to usual words

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify to usual words

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/tasks/keypoint_detection.md

modify to usual representation

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-25 12:24:12 -07:00
555693fbfa fix mpt test of different outputs from cuda (#37691)
* fix mpt test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix mpt tests with Expectations

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix output

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-25 18:04:56 +02:00
0cfbf9c95b Force torch>=2.6 with torch.load to avoid vulnerability issue (#37785)
* fix all main files

* fix test files

* oups forgot modular

* add link

* update message
2025-04-25 16:57:09 +02:00
eefc86aa31 Fix tensor parallel with non-floating dtypes (#37790)
fix
2025-04-25 15:48:16 +02:00
214062201e Fix typos in strings and comments (#37784)
* Fix typos in strings and comments

* Fix
2025-04-25 13:47:25 +01:00
ba3bd37253 Align gpt2 mask preparation to #37612 (#37787)
Update modeling_gpt2.py
2025-04-25 12:50:30 +02:00
50d231a806 unpin pytest<8 (#37768)
* pytest 8

* pytest 8

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-25 12:34:33 +02:00
79d4bc761d [causal mask] fix preparation with multi-gpu (#37612)
* fix multi-gpu

* forgot non-copied models

* fixup
2025-04-25 09:34:18 +02:00
7bb619d710 🌐 [i18n-KO] Translated roberta.md to Korean (#37069)
* docs: ko: roberta.md

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-04-24 10:00:24 -07:00
cfe666919e Update model card for Gemma (#37674)
* Update Gemma model card

* Updated after review

* Update following review
2025-04-24 09:58:46 -07:00
b2d70e9c49 Fix auto-round hfoption (#37759)
fix
2025-04-24 18:19:38 +02:00
acdbe627e3 Guard DeepSpeed imports (#37755)
* Guard DeepSpeed imports

* Fix import

* Import deepspeed consistently

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-24 18:16:34 +02:00
af6d2756d9 [deps] pin max torch version (#37760)
pin max pt version :(
2025-04-24 16:18:25 +01:00
0302aa1c6e Fix typos in comments (#37694)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-04-24 15:59:56 +01:00
af000ceb92 Fix load of rng state for resuming training from checkpoint (#37162)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-24 16:55:34 +02:00
0af0a5f969 Fix tied weight loading with TP and loading sub state_dicts (#37758)
Update modeling_utils.py
2025-04-24 16:47:40 +02:00
3af24f7e27 Refine parameter type annotations (#37666) 2025-04-24 15:37:13 +01:00
22e3da92b7 Fix wrong input shapes in doc-string of models (#37729)
* Fix wrong position_ids shape in doc

Supported by ClvpDecoder.forward, line 1212--1215:

src/transformers/models/clvp/modeling_clvp.py:
  1212	        if inputs_embeds is None:
  1213	            inputs_embeds = self.input_embeds_layer(input_ids)
  1214	        position_embeds = self.position_embeds_layer(position_ids)
  1215	        inputs_embeds = inputs_embeds + position_embeds

* Fix possibly wrong input_ids shape in doc

Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead.

* Fix possibly wrong inputs_embeds shape in doc

Supported by CTRLModel.forward, line 448--449:

src/transformers/models/ctrl/modeling_ctrl.py:
   448	        if inputs_embeds is None:
   449	            inputs_embeds = self.w(input_ids)

This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a.

* Fix possibly wrong token_type_ids shape in doc

Supported by CTRLModel.forward, line 441--460:

src/transformers/models/ctrl/modeling_ctrl.py:
   441	        if token_type_ids is not None:
   442	            token_type_ids = token_type_ids.view(-1, input_shape[-1])
   443	            token_type_embeds = self.w(token_type_ids)
   444	            token_type_embeds *= np.sqrt(self.d_model_size)
   445	        else:
   446	            token_type_embeds = 0
   447
   448	        if inputs_embeds is None:
   449	            inputs_embeds = self.w(input_ids)
   450	        # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded
   451	        seq_len = input_shape[-1]
   452	        mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device)
   453
   454	        inputs_embeds *= np.sqrt(self.d_model_size)
   455
   456	        # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually.
   457	        self.pos_encoding = self.pos_encoding.to(device)
   458	        pos_embeds = self.pos_encoding[position_ids, :]
   459
   460	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a.

* Fix possibly wrong position_ids shape in doc

Supported by CTRLModel.forward, line 448--460:

src/transformers/models/ctrl/modeling_ctrl.py:
   448	        if inputs_embeds is None:
   449	            inputs_embeds = self.w(input_ids)
   450	        # inputs_embeds = embedded.unsqueeze(0) if len(input_ids.shape)<2 else embedded
   451	        seq_len = input_shape[-1]
   452	        mask = torch.triu(torch.ones(seq_len + past_length, seq_len + past_length), 1).to(device)
   453
   454	        inputs_embeds *= np.sqrt(self.d_model_size)
   455
   456	        # `self.pos_encoding` won't be sent to the correct device along the model, so we do it manually.
   457	        self.pos_encoding = self.pos_encoding.to(device)
   458	        pos_embeds = self.pos_encoding[position_ids, :]
   459
   460	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

This commit is introduced due to commit 6f36b56497828642b65f54ea26aa4064186de57a.

* Fix wrong token_type_ids shape in doc

Supported by TFCTRLMainLayer.call, line 376--394:

src/transformers/models/ctrl/modeling_tf_ctrl.py:
   376	        if token_type_ids is not None:
   377	            token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]])
   378	            token_type_embeds = self.w(token_type_ids)
   379	            token_type_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, dtype=token_type_embeds.dtype))
   380	        else:
   381	            token_type_embeds = tf.constant(0.0)
   382	        position_ids = tf.reshape(position_ids, [-1, shape_list(position_ids)[-1]])
   383
   384	        if inputs_embeds is None:
   385	            check_embeddings_within_bounds(input_ids, self.w.input_dim)
   386	            inputs_embeds = self.w(input_ids)
   387	        seq_len = input_shape[-1]
   388	        mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0)
   389
   390	        inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype))
   391
   392	        pos_embeds = tf.gather(self.pos_encoding, position_ids)
   393	        pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype)
   394	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

* Fix wrong position_ids shape in doc

Supported by TFCTRLMainLayer.call, line 384--394:

src/transformers/models/ctrl/modeling_tf_ctrl.py:
   384	        if inputs_embeds is None:
   385	            check_embeddings_within_bounds(input_ids, self.w.input_dim)
   386	            inputs_embeds = self.w(input_ids)
   387	        seq_len = input_shape[-1]
   388	        mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0)
   389
   390	        inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype))
   391
   392	        pos_embeds = tf.gather(self.pos_encoding, position_ids)
   393	        pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype)
   394	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

* Fix wrong inputs_embeds shape in doc

Supported by TFCTRLMainLayer.call, line 384--394:

src/transformers/models/ctrl/modeling_tf_ctrl.py:
   384	        if inputs_embeds is None:
   385	            check_embeddings_within_bounds(input_ids, self.w.input_dim)
   386	            inputs_embeds = self.w(input_ids)
   387	        seq_len = input_shape[-1]
   388	        mask = 1 - tf.linalg.band_part(tf.ones((seq_len, seq_len)), -1, 0)
   389
   390	        inputs_embeds *= tf.math.sqrt(tf.cast(self.d_model_size, inputs_embeds.dtype))
   391
   392	        pos_embeds = tf.gather(self.pos_encoding, position_ids)
   393	        pos_embeds = tf.cast(pos_embeds, dtype=token_type_embeds.dtype)
   394	        hidden_states = inputs_embeds + pos_embeds + token_type_embeds

* Fix wrong inputs_embeds shape in doc

Supported by ClvpDecoder.forward, line 1212--1213:

src/transformers/models/clvp/modeling_clvp.py:
  1212	        if inputs_embeds is None:
  1213	            inputs_embeds = self.input_embeds_layer(input_ids)

* Fix wrong position_ids shape in doc

Supported by FlaxGemmaPreTrainedModel.__call__, line 502--508:

src/transformers/models/gemma/modeling_flax_gemma.py:
   502	        batch_size, sequence_length = input_ids.shape
   503
   504	        if position_ids is None:
   505	            if past_key_values is not None:
   506	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   507
   508	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

* Fix wrong position_ids shape in doc

Supported by FlaxGPT2PreTrainedModel.__call__, line 482--488:

src/transformers/models/gpt2/modeling_flax_gpt2.py:
   482	        batch_size, sequence_length = input_ids.shape
   483
   484	        if position_ids is None:
   485	            if past_key_values is not None:
   486	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   487
   488	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

* Fix wrong position_ids shape in doc

Supported by GPT2Model.forward, line 918--921:

src/transformers/models/gpt2/modeling_gpt2.py:
   918	        if inputs_embeds is None:
   919	            inputs_embeds = self.wte(input_ids)
   920	        position_embeds = self.wpe(position_ids)
   921	        hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device)

* Fix wrong inputs_embeds shape in doc

Supported by GPT2Model.forward, line 918--919:

src/transformers/models/gpt2/modeling_gpt2.py:
   918	        if inputs_embeds is None:
   919	            inputs_embeds = self.wte(input_ids)

* Fix wrong labels shape in doc

Supported by GPT2LMHeadModel.forward, line 1156--1157:

src/transformers/models/gpt2/modeling_gpt2.py:
  1156	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
  1157	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

* Fix wrong labels shape in doc

Supported by GPT2DoubleHeadsModel.forward, line 1314--1315:

src/transformers/models/gpt2/modeling_gpt2.py:
  1314	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
  1315	            `labels = input_ids`. Indices are selected in `[-100, 0, ..., config.vocab_size - 1]`. All labels set to

* Fix wrong token_type_ids shape in doc

Supported by TFGPT2MainLayer.call, line 486--500:

src/transformers/models/gpt2/modeling_tf_gpt2.py:
   486	        if inputs_embeds is None:
   487	            check_embeddings_within_bounds(input_ids, self.config.vocab_size)
   488	            inputs_embeds = self.wte(input_ids)
   489
   490	        position_embeds = self.wpe(position_ids)
   491
   492	        if token_type_ids is not None:
   493	            token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]])
   494	            token_type_embeds = self.wte(token_type_ids)
   495	        else:
   496	            token_type_embeds = tf.constant(0.0)
   497
   498	        position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype)
   499	        token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype)
   500	        hidden_states = inputs_embeds + position_embeds + token_type_embeds

* Fix wrong position_ids shape in doc

Supported by TFGPT2MainLayer.call, line 486--500:

src/transformers/models/gpt2/modeling_tf_gpt2.py:
   486	        if inputs_embeds is None:
   487	            check_embeddings_within_bounds(input_ids, self.config.vocab_size)
   488	            inputs_embeds = self.wte(input_ids)
   489
   490	        position_embeds = self.wpe(position_ids)
   491
   492	        if token_type_ids is not None:
   493	            token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]])
   494	            token_type_embeds = self.wte(token_type_ids)
   495	        else:
   496	            token_type_embeds = tf.constant(0.0)
   497
   498	        position_embeds = tf.cast(position_embeds, dtype=inputs_embeds.dtype)
   499	        token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype)
   500	        hidden_states = inputs_embeds + position_embeds + token_type_embeds

* Fix wrong inputs_embeds shape in doc

Supported by TFGPT2MainLayer.call, line 486--488:

src/transformers/models/gpt2/modeling_tf_gpt2.py:
   486	        if inputs_embeds is None:
   487	            check_embeddings_within_bounds(input_ids, self.config.vocab_size)
   488	            inputs_embeds = self.wte(input_ids)

* Fix wrong position_ids shape in doc

Supported by GPTBigCodeModel.forward, line 962--965:

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:
   962	        if inputs_embeds is None:
   963	            inputs_embeds = self.wte(input_ids)
   964	        position_embeds = self.wpe(position_ids)
   965	        hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device)

* Fix wrong inputs_embeds shape in doc

Supported by GPTBigCodeModel.forward, line 962--963:

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:
   962	        if inputs_embeds is None:
   963	            inputs_embeds = self.wte(input_ids)

* Fix wrong labels shape in doc

Supported by GPTBigCodeForCausalLM.forward, line 1158--1159:

src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:
  1158	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
  1159	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

* Fix wrong position_ids shape in doc

Supported by FlaxGPTNeoModule.__call__, line 549--552:

src/transformers/models/gpt_neo/modeling_flax_gpt_neo.py:
   549	        input_embeds = self.wte(input_ids.astype("i4"))
   550	        position_embeds = self.wpe(position_ids.astype("i4"))
   551
   552	        hidden_states = input_embeds + position_embeds

* Fix wrong position_ids shape in doc

Supported by GPTNeoModel.forward, line 685--720:

src/transformers/models/gpt_neo/modeling_gpt_neo.py:
   685	        if inputs_embeds is None:
   686	            inputs_embeds = self.wte(input_ids)
   687
   688	        # kept for BC (non `Cache` `past_key_values` inputs)
   689	        return_legacy_cache = False
   690	        if use_cache and not isinstance(past_key_values, Cache):
   691	            return_legacy_cache = True
   692	            if past_key_values is None:
   693	                past_key_values = DynamicCache()
   694	            else:
   695	                past_key_values = DynamicCache.from_legacy_cache(past_key_values)
   696	                logger.warning_once(
   697	                    "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and "
   698	                    "will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class "
   699	                    "(https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)"
   700	                )
   701
   702	        seq_length = inputs_embeds.shape[1]
   703	        if cache_position is None:
   704	            past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
   705	            cache_position = torch.arange(past_seen_tokens, past_seen_tokens + seq_length, device=inputs_embeds.device)
   706
   707	        if position_ids is None:
   708	            position_ids = cache_position.unsqueeze(0)
   709
   710	        causal_mask = self._update_causal_mask(
   711	            attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
   712	        )
   713
   714	        # Prepare head mask if needed
   715	        # 1.0 in head_mask indicate we keep the head
   716	        # attention_probs has shape bsz x num_heads x N x N
   717	        # head_mask has shape n_layer x batch x num_heads x N x N
   718	        head_mask = self.get_head_mask(head_mask, self.config.num_layers)
   719	        position_embeds = self.wpe(position_ids)
   720	        hidden_states = inputs_embeds + position_embeds

* Fix wrong inputs_embeds shape in doc

Supported by GPTNeoModel.forward, line 685--686:

src/transformers/models/gpt_neo/modeling_gpt_neo.py:
   685	        if inputs_embeds is None:
   686	            inputs_embeds = self.wte(input_ids)

* Fix wrong labels shape in doc

Supported by GPTNeoForCausalLM.forward, line 968--969:

src/transformers/models/gpt_neo/modeling_gpt_neo.py:
   968	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   969	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

* Fix wrong position_ids shape in doc

Supported by FlaxGPTJPreTrainedModel.__call__, line 455--461:

src/transformers/models/gptj/modeling_flax_gptj.py:
   455	        batch_size, sequence_length = input_ids.shape
   456
   457	        if position_ids is None:
   458	            if past_key_values is not None:
   459	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   460
   461	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

* Fix wrong token_type_ids shape in doc

Supported by TFGPTJMainLayer.call, line 482--493:

src/transformers/models/gptj/modeling_tf_gptj.py:
   482	        if inputs_embeds is None:
   483	            check_embeddings_within_bounds(input_ids, self.wte.vocab_size)
   484	            inputs_embeds = self.wte(input_ids, mode="embedding")
   485
   486	        if token_type_ids is not None:
   487	            token_type_ids = tf.reshape(token_type_ids, [-1, shape_list(token_type_ids)[-1]])
   488	            token_type_embeds = self.wte(token_type_ids, mode="embedding")
   489	        else:
   490	            token_type_embeds = tf.constant(0.0)
   491
   492	        token_type_embeds = tf.cast(token_type_embeds, dtype=inputs_embeds.dtype)
   493	        hidden_states = inputs_embeds + token_type_embeds

* Fix wrong position_ids shape in doc

Supported by TFGPTJMainLayer.call, line 434--449:

src/transformers/models/gptj/modeling_tf_gptj.py:
   434	        elif input_ids is not None:
   435	            input_shape = shape_list(input_ids)
   436	            input_ids = tf.reshape(input_ids, [-1, input_shape[-1]])
   437	        elif inputs_embeds is not None:
   438	            input_shape = shape_list(inputs_embeds)[:-1]
   439	        else:
   440	            raise ValueError("You have to specify either input_ids or inputs_embeds")
   441
   442	        if past_key_values is None:
   443	            past_length = 0
   444	            past_key_values = [None] * len(self.h)
   445	        else:
   446	            past_length = shape_list(past_key_values[0][0])[-2]
   447
   448	        if position_ids is None:
   449	            position_ids = tf.expand_dims(tf.range(past_length, input_shape[-1] + past_length), axis=0)

* Fix wrong inputs_embeds shape in doc

Supported by TFGPTJMainLayer.call, line 482--484:

src/transformers/models/gptj/modeling_tf_gptj.py:
   482	        if inputs_embeds is None:
   483	            check_embeddings_within_bounds(input_ids, self.wte.vocab_size)
   484	            inputs_embeds = self.wte(input_ids, mode="embedding")

* Fix wrong labels shape in doc

Supported by TFGPTJForCausalLM.call, line 812--813:

src/transformers/models/gptj/modeling_tf_gptj.py:
   812	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   813	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

* Fix possibly wrong input_ids shape in doc

Since 'input_ids_length' was mentioned immediately after the shape `(batch_size, sequence_length)`, it doesn't make sense to me for `input_ids` to have such shape---IMO it ought to have shape `(batch_size, input_ids_length)` instead.

* Fix possibly wrong token_type_ids shape in doc

Supported by ImageGPTModel.forward, line 773--780:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   773	        if inputs_embeds is None:
   774	            inputs_embeds = self.wte(input_ids)
   775	        position_embeds = self.wpe(position_ids)
   776	        hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device)
   777
   778	        if token_type_ids is not None:
   779	            token_type_embeds = self.wte(token_type_ids)
   780	            hidden_states = hidden_states + token_type_embeds

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix possibly wrong position_ids shape in doc

Supported by ImageGPTModel.forward, line 773--776:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   773	        if inputs_embeds is None:
   774	            inputs_embeds = self.wte(input_ids)
   775	        position_embeds = self.wpe(position_ids)
   776	        hidden_states = inputs_embeds + position_embeds.to(inputs_embeds.device)

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix possibly wrong inputs_embeds shape in doc

Supported by ImageGPTModel.forward, line 773--774:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   773	        if inputs_embeds is None:
   774	            inputs_embeds = self.wte(input_ids)

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix possibly wrong labels shape in doc

Supported by ImageGPTForCausalImageModeling.forward, line 923--924:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   923	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   924	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix possibly wrong labels shape in doc

Supported by ImageGPTModel.forward, line 665--666:

src/transformers/models/imagegpt/modeling_imagegpt.py:
   665	            Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
   666	            `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

This commit is introduced due to commit 8e594a4143cca79f165b99e4ed4c9f3a90047bf3.

* Fix wrong position_ids shape in doc

Supported by FlaxLlamaPreTrainedModel.__call__, line 484--490:

src/transformers/models/llama/modeling_flax_llama.py:
   484	        batch_size, sequence_length = input_ids.shape
   485
   486	        if position_ids is None:
   487	            if past_key_values is not None:
   488	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   489
   490	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))

* Fix wrong position_ids shape in doc

Supported by FlaxMistralPreTrainedModel.__call__, line 478--484:

src/transformers/models/mistral/modeling_flax_mistral.py:
   478	        batch_size, sequence_length = input_ids.shape
   479
   480	        if position_ids is None:
   481	            if past_key_values is not None:
   482	                raise ValueError("Make sure to provide `position_ids` when passing `past_key_values`.")
   483
   484	            position_ids = jnp.broadcast_to(jnp.arange(sequence_length)[None, :], (batch_size, sequence_length))
2025-04-24 15:36:03 +01:00
4d64c38593 [generate] fix default autocompile case on gpu (#37756) 2025-04-24 15:08:38 +01:00
43bb4c0456 Fix qwen2_5 get_rope_index tensor device locations (#37597)
* Fix qwen2_5 get_rope_index tensor device locations

* simpler fix

* edit right file for modular model

* add a test

* try normalizing type to fix non-video

* fix some imports

* add a video forward test with dummy input
2025-04-24 16:04:38 +02:00
dd2649fa98 updated hidden_features for FlaxDinov2SwiGLUFFN in Dinov2 (#37747)
Flax Dinov2: updated hidden_features in FlaxDinov2SwiGLUFFN

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-24 14:30:31 +01:00
8bdd4f2acd [generate] skip compilation on cpu offload (#37709)
* skip compilation on cpu offload

* add test

* better logic

* docstring

* boolean logic

* add disk offload check

* warn users if compilation options are set but compilation doesn happen

* fix test

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-24 14:08:17 +01:00
7c62e69326 GPT2Model StaticCache support (#35761)
* initial GPT2 changes

* causal_mask support

* return_legacy_cache

* cleanup

* fix1

* outputs shape fixes

* gpt2 return fix

* pkv, attn fixes

* fix dual_head

* is_causal arg fix

* decision transformer updated

* style fix

* batch_size from inputs_embeds

* DecisionTransformerModel fixes

* cross-attn support + cache warning

* x-attn @decision

* EDCache proper init

* simplified logic in `if use_cache:` for GPT2Model

* @deprecate_kwarg for DecisionTr attn fwd

* @deprecate_kwarg in gpt2

* deprecation version updated to 4.51

* kwargs in gradient_checkpointing_fn

* rename next_cache to past_key_values

* attention_mask prep

* +cache_position in GPT2DoubleHeadsModel

* undo kwargs in gradient checkpointing

* moved up `if self.gradient_checkpointing`

* consistency in decision_transformer

* pastkv, cache_pos in grad_checkpt args

* rm _reorder_cache

* output_attentions streamlined

* decision_transformer consistency

* return_legacy_cache improved

* ClvpForCausalLM used for legacy cache test now

* is_causal fixed

* attn_output cleanup

* consistency @ decision_transformer

* Updated deprecation notice version to 4.52

* upd deprecation

* consistent legacy cache code in decision transformers\

* next_cache -> past_kv in decision_tr

* cache support flags in decision_transf

* rm legacy cache warning

* consistency in cache init for decision transf

* no Static Cache for Decision Transformer

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-04-24 14:46:35 +02:00
9f927c8250 [cache] fix HybridCache init when device is passed (#37718)
fix device init
2025-04-24 13:36:52 +01:00
4fee320926 Expand quantized data type support for tensor parallelism (#37719)
Update tensor_parallel.py

Co-authored-by: Xiao YU <Xiao.YU@xilinx.com>
2025-04-24 14:34:32 +02:00
0f7940bb3f Update MllamaForConditionalGenerationIntegrationTest (#37750)
* fix 1

* fix 2

* fix 3

* fix 4

* fix 5

* fix 6

* trigger CI

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-24 14:29:46 +02:00
7e6f36cd38 Skip all AriaForConditionalGenerationIntegrationTest on T4 (#37746)
* skip

* ruff

* trigger CI

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-24 14:11:56 +02:00
0327d0f7f2 [performance_optim] define flash attention mask on NPU device directly (#37698)
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-24 14:06:47 +02:00
14e28bd721 Correctly raise errors when downloading tokenizer files (#37740)
* first try

* Update tokenization_utils_base.py

* Update tokenization_utils_base.py

* standardize
2025-04-24 12:53:07 +02:00
0ec0495967 Fix embeds_to_talker device in Qwen2.5-Omni (#37739)
Fix `embeds_to_talker` device

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-04-24 12:49:57 +02:00
72e4844059 fix: learning_rate logged as tensor causing save issue with deepspeed (#37704)
* fix: learning_rate logged as tensor causing save issue with deepspeed

* chore: lint

---------

Co-authored-by: NanoCode012 <chanvichet@Chanvichets-MacBook-Pro.local>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-24 12:20:47 +02:00
1cfcbfcab8 [VLMs] fix flash-attention tests (#37603)
* fix one test

* fa2 ln test

* remove keys from config recursively

* fix

* fixup
2025-04-24 11:48:11 +02:00
02baa61fab Make sure torch_is_available before using torch.distributed (#37693)
fix
2025-04-24 11:31:35 +02:00
864e9636ff [tests] fix test_nemotron_8b_generation_sdpa (#37665)
add max_new_tokens
2025-04-24 11:28:35 +02:00
9b3bf4a206 Fix torchao doc examples (#37697)
fix

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-24 11:10:27 +02:00
3ed56bea0f Fix inference bugs in Qwen2.5 Omni (#37701)
* Init `SinusoidsPositionEmbedding` with float to avoid precision problem

* fix hidden_state for talker

* Update modular_qwen2_5_omni.py

* Move hidden processing out from thinker

* fixup

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-04-24 10:51:44 +02:00
b7f7aa78a0 Fix Aria tests (#37444)
* update aria tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add cuda tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check outputs for cpu and cuda and xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check output for each device

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix style

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix style

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu output

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add comments and use assert list equal

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm pad token assign

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-24 10:51:29 +02:00
b6d65e40b2 Add Fast Image Processor for MobileNetV1 (#37111)
* fast image processor template for MobileNetV1 via transformers-cli

* Add fast image processors and unify tests for slow/fast image processor classes

* added loop over image_processor_list for all tests and removed boilerplate comments.

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-23 15:55:41 -04:00
dea1919be4 Add Fast Image Processor for PoolFormer (#37182)
* support poolformer fast image processor

* support test for crop_pct=None

* run make style

* Apply suggestions from code review

* rename test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-23 15:55:33 -04:00
b491f128d6 Add Fast PVT Processor (#37204)
* Add Fast PVT Processor

* Update image_processing_pvt_fast.py

* Update image_processing_pvt_fast.py

* remove kwargs

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-23 15:55:20 -04:00
19e9079dc1 enable 4 test_trainer cases on XPU (#37645)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-23 21:29:42 +02:00
5cd6b64059 Process inputs directly in apply_chat_template in image-text-to-text pipeline (#35616)
* tokenize inputs directly in apply_chat_template

* refactor processing

* revert changes processing llava

* Update docs

* fix issue with str being iterable

* add test chat text only

* change function name
2025-04-23 13:31:33 -04:00
80ea2c05c2 [tests, qwen2_5_omni] fix flaky tests (#37721) 2025-04-23 17:54:12 +01:00
63c6331387 Qwen 2.5 Omni: apply video defaults (#37660)
* Apply video defaults for min_pixels and max_pixels

* fps kwarg should not be a list

* Update test to account for new resizing
2025-04-23 17:08:11 +02:00
1e9087368c [internvl] fix chat template (#37656)
* fix chat template

* update

* update conversion

* rename `fake_image_token` in tests
2025-04-23 16:56:36 +02:00
9ec8be56dd TransfoXL is deprecated, don't keep it in tested examples! (#37707)
* TransfoXL is deprecated, so we should remove it from examples that get tested

* Remove the tokenizer too

* Trigger tests
2025-04-23 14:59:38 +01:00
be9b0e8521 [CI] add back sacrebleu (and document why) (#37700)
* example test

* add back dep

* dev-ci

* dev-ci
2025-04-23 14:45:00 +01:00
1d7d7a942e Add maintainers for ROCm/Intel XPU/Ascend NPU (#37678)
* Add maintainers for ROCm/Intel XPU/Ascend NPU

* Correct capitalization for usernames

* Update .github/ISSUE_TEMPLATE/bug-report.yml

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

* Update .github/ISSUE_TEMPLATE/bug-report.yml

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

* Trigger tests

---------

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-04-23 14:28:32 +01:00
cc9a245e6d [cleanup] remove /model_cards 🧹 🧹 (#37685)
rm model_cards
2025-04-23 12:45:27 +01:00
ca790303f7 Pin torch == 2.6 on PR CI docker images for now (#37695)
pin 2.6 on CircleCi images

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-23 11:47:23 +02:00
12f65ee752 enable cpu offloading for Bark on xpu (#37599)
* enable cpu offloading of bark modeling on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* remove debug print

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix review comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enhance test

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update

* add deprecate message

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update

* update

* trigger CI

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-23 11:37:15 +02:00
4f9893cbbc fix: remove classmethod from Qwen2_5OmniConfig.get_text_config (#37690)
- Since the `get_text_config` references an instance variable within
    the class (`self.thinker_config`), the `get_text_config` method
    should not be a classmethod.

  - Before this fix, users were getting the following error:

    '''
    AttributeError: type object 'Qwen2_5OmniConfig' has no attribute 'thinker_config'
    '''
2025-04-23 09:30:57 +02:00
1d9743edc2 Updated model card for mbart and mbart50 (#37619)
* new card for mbart and mbart50

* removed comment BADGES

* Update mBart overview

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix typo (MBart to mBart)

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* maybe fix typo

* update typo and combine notes

* changed notes

* changed the example sentence

* fixed grammatical error and removed some lines from notes example

* missed one word

* removed documentation resources and added some lines of example code back in notes.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-22 12:26:47 -07:00
fbfa1dd4db 🌐 [i18n-KO] Translated siglip.md to Korean (#37145)
* docs: ko: siglip.md

* feat: nmt draft

* fix: manual edits

* chore: Correct document title to kebab-case format

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Convert unnatural language to natural Korean

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-04-22 12:23:19 -07:00
ece79b0688 enable blip2 and emu3 cases on XPU (#37662)
* enable blip2 and emu3 modeling cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* remove extra new line

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-22 18:37:09 +02:00
ca4c114dc4 Add counters for dataset classes (#37636)
* add counters for dataset classes

* fix failed code style
2025-04-22 17:30:43 +01:00
d47cdae27e [Docs] Move models to appropriate section (#37338)
* Move models

* update

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-22 18:23:14 +02:00
dbfccd3c92 typo update in the parameter name (#37655)
See L118 and L143 for the class attribute `hidden_dim`
2025-04-22 18:14:20 +02:00
de8916dde6 [docs] only build en docs in push CI (#37677) 2025-04-22 17:05:11 +01:00
0f8c34b0a0 [cleanup] remove old scripts in /scripts 🧹 🧹 (#37676)
* rm old files

* not this one
2025-04-22 16:59:03 +01:00
6673081b21 enable 6 granite cases on xpu (#37569)
* enable 6 granite cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* make them all pass on A100

Signed-off-by: N <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: N <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-22 17:55:02 +02:00
9167461a7d enable mllama cases on xpu (#37644)
* enable mllama testing on xpu

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* more mllama cases enabling

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* make cases pass on A100

Signed-off-by: N <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: N <matrix.yao@intel.com>
2025-04-22 17:39:10 +02:00
de182ba269 Refactor bitsandbytes doc (#37668)
* doc

* torch ops

* fix

* nits

* Update docs/source/en/quantization/bitsandbytes.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-22 16:13:25 +02:00
dde9b03e3b Fix no_split_modules for Llama4 pretrained models (#37673) 2025-04-22 16:05:12 +02:00
9481e9e9f1 Fix autoround docs (#37675)
* fix

* empty
2025-04-22 15:33:13 +02:00
38c406844e Fixing quantization tests (#37650)
* fix

* style

* add capability check
2025-04-22 13:59:57 +02:00
b3492ff9f7 Add AutoRound quantization support (#37393)
* add auto-round support

* Update src/transformers/quantizers/auto.py

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

* fix style issue

Signed-off-by: wenhuach <wenhuach87@gmail.com>

* tiny change

* tiny change

* refine ut and doc

* revert unnecessary change

* tiny change

* try to fix style issue

* try to fix style issue

* try to fix style issue

* try to fix style issue

* try to fix style issue

* try to fix style issue

* try to fix style issue

* fix doc issue

* Update tests/quantization/autoround/test_auto_round.py

* fix comments

* Update tests/quantization/autoround/test_auto_round.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/autoround/test_auto_round.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* update doc

* Update src/transformers/quantizers/quantizer_auto_round.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* update

* update

* fix

* try to fix style issue

* Update src/transformers/quantizers/auto.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update docs/source/en/quantization/auto_round.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update docs/source/en/quantization/auto_round.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update docs/source/en/quantization/auto_round.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* update

* fix style issue

* update doc

* update doc

* Refine the doc

* refine doc

* revert one change

* set sym to True by default

* Enhance the unit test's robustness.

* update

* add torch dtype

* tiny change

* add awq convert test

* fix typo

* update

* fix packing format issue

* use one gpu

---------

Signed-off-by: wenhuach <wenhuach87@gmail.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Shen, Haihao <haihao.shen@intel.com>
2025-04-22 13:56:54 +02:00
9608908639 Correct warm-up with fp8 (#37670)
* start clean warmup for quantizers

* style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-22 13:12:49 +02:00
6614209b96 Fix duplicated weights in fp8 quantization (#37667)
* fix fp8

* Update quantizer_finegrained_fp8.py

* fix circular import

* Update quantizer_finegrained_fp8.py
2025-04-22 13:12:27 +02:00
dcf6df5b0d [qwen-omni] fix training (#37517)
* fix

* add text config

* fixup

* fix docs
2025-04-22 12:36:07 +02:00
9167fadab9 Introduce GradientCheckpointingLayer (#37223)
* GradientCheckpointingLayer

* trigger

* Move GC layer to a separate file

* Update import

* Expose and document GC layer

* Fix dummy

* Apply to llama-based models

* Update modulars

* Update a few more models for consistency

* Update glm4

* Update Janus
2025-04-22 11:33:31 +01:00
413f9bbf80 Fixes #37219 : RecurrentGemma crashes for inputs longer than sliding window length (#37613)
* fix: RecurrentGemma crashes during inference for inputs longer than sliding window width

* fix recurrentgemma tests; add long test bigger than context window
2025-04-22 12:21:16 +02:00
964a1b6b7d Fix ValueError when eval_do_concat_batches=False with examples (#37621)
https://github.com/huggingface/transformers/issues/37593

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-22 12:13:25 +02:00
85665a4263 [tests] Stricter generate + compilation test -- no recompilations allowed (#37629)
* tmp commit

* stricter compilation test

* trigger tests

* rm todo
2025-04-22 11:12:18 +01:00
362fa37da2 [test] update test_past_key_values_format (#37614)
allow custom shapes
2025-04-22 11:07:34 +01:00
1cd110c6cb Add test to ensure unknown exceptions reraising in utils/hub.py::cached_files() (#37651)
* add test to ensure unknown exceptions are reraised in utils/hub.py::cached_files()
2025-04-22 11:38:10 +02:00
c69e23455d Support loading Gemma3 QAT GGUF models (#37649)
* fix gemma3 qat gguf support

Signed-off-by: isotr0py <2037008807@qq.com>

* update test

Signed-off-by: isotr0py <2037008807@qq.com>

* make ruff happy

Signed-off-by: isotr0py <2037008807@qq.com>

---------

Signed-off-by: isotr0py <2037008807@qq.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-22 11:23:17 +02:00
7eb1107cc2 Restructure torchao quantization examples (#37592)
* Restructure torchao quantization examples

Summary:
Mainly structured the examples by hardwares and then listed
the recommended quantization methods for each hardware H100 GPU, A100 GPU and CPU

Also added example for push_to_hub

Test Plan:
not required

Reviewers:

Subscribers:

Tasks:

Tags:

* update

* drop float8 cpu

* address comments and simplify

* small update

* link update

* minor update
2025-04-22 11:20:34 +02:00
006530d285 [fix gemma] Set default value for output_attentions parameter in Gemma2 and Gemma… (#37633)
* Set default value for output_attentions parameter in Gemma2 and Gemma3 models

* update

* fix

* fix

---------

Co-authored-by: chenin <wangzhichen@encosmart.com>
2025-04-22 11:18:17 +02:00
31ea547b7a [fix] make legacy bnb code work (#37331)
* [fix] make legacy bnb code work

* [fix] use get with default instead of getter

* add test for bnb 8bit optim skip embed

* [fix] style

* add require annotation of bnb

---------

Co-authored-by: jaycha <jaycha@ncsoft.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-22 11:17:29 +02:00
5f791281c3 Fix Qwen2.5-Omni get_chunked_index chunking functionality (#37631)
* fix: qwen2.5 omni modular get_rope_index

* test: add test for qwen2.5 omni rope index (video with audio input)

* style

* expected_position_ids readability

* fix: use spatial_merge_size = 1 in unit test
2025-04-22 11:15:37 +02:00
fee1190601 Refactor phi doc (#37583)
* Added documentation for phi model

* Update phi.md

* Update phi.md

* Update phi.md

* Update docs/source/en/model_doc/phi.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated model card

* Update phi.md

* Update phi.md

* Update phi.md

* Update docs/source/en/model_doc/phi.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Jihad <jihadhammoud_@hotmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-21 10:31:04 -07:00
b2db54f66b Update longformer.md (#37622)
* Update longformer.md

* Update longformer.md

* Update docs/source/en/model_doc/longformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/longformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update longformer.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-21 10:30:51 -07:00
2c60a442f3 fix link in kv_cache.md (#37652)
fix typo in kv_cache.md
2025-04-21 09:01:11 -07:00
a42ba80fa5 Allow Exclusion of Input IDs from RepetitionPenaltyLogitsProcessor (#37625)
* Allow exclusion of input IDs for repetition penalty

* Add logit proc tests for rep penalty exclusion

* Expose rep pen flag through generate

* Only slice if needed

* keep current rep pen default behavior

* Revert exposing reppen changes through generate

* Fix test arg

* Update src/transformers/generation/logits_process.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Rename to rep penalty kwarg

* Add custom repetition penalty processor example

* Validate prompt_ignore_length

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-21 15:46:05 +01:00
1077603410 Remove torchvision requirement from AutoImageProcessor (#37457) 2025-04-21 14:59:33 +02:00
1930e750e4 [kernels] use original forward at compile time (#37604) 2025-04-21 13:22:47 +01:00
6daa3eeba5 Fix InternVL attention when using qk_norm (38B and 78B) (#37620)
* fix internvlvision attention when using qk_norm

* nit

* modular
2025-04-19 21:39:08 +02:00
27a25bee4f chore: update model card for SigLIP (#37585)
* edit siglip model card

* fix syntax

* Update docs/source/en/model_doc/siglip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/siglip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* address comments

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-18 13:30:41 -07:00
e1f379bb09 Fixing the example in generation strategy doc (#37598)
Update generation_strategies.md

The prompt text shown in the example does not match what is inside the generated output. As the generated output always include the prompt, the correct prompt should be "Hugging Face is an open-source company".
2025-04-18 12:50:17 -07:00
4f58fc9c82 Deprecate modeling_utils.py classes (#37298)
* Move utils classes into models

* Add deprecation warnings

* Remove from docs

* Update config attributes check
2025-04-18 18:47:34 +01:00
a245011252 Add InternVL (2.5 MPO) (#35968)
* initial commit

* add convert internvl

* add first end-to-end working internvl

* nit prompt and image proc

* add working chat template

* add conversion llama-based models

* add tests

* pass all tests

* fix isort

* fix modular after main merge

* add video processing for internvl

* add support for interlaced images and videos

* Remove processing and config from modular, add more tests

* add llama model tests

* Modify processor for compatibility with refactored got ocr image processor

* add comments in processor

* Add docs and nits

* change video processing to use custom sample_indices_fn

* rebase and fix tests

* add processor tests

* Add changes Raushan review

* Use the new attention interface for the vision model

* nits

* add support for custom video_load_backend

* remove mention to InternVLTokenizer

* refactor vision model to simplify logic

* refactor processor for better readibility

* fix copies

* fix require av processor test

* refactor internVL vision

* Update processor and fix processing tests

* fix docstring

* update convert_weights for internvl3

* change image processor to fast by default

* remove do_center_crop=True in convert_weights

* force use_cache to True

* push_to_hub before reloading

* fix internVLVision for larger models

* update convert weight for qk norm

* fix convert_weights

* fix eos_token_id in convert

* update docs and integration tests

* make modifs after review

* fix wrong k_norm and reduce modular

* change image_token_index to image_token_id

* change checkpoint to OpenGVLab org

* last nits

* explicitely del self.num_key_value_groups

* add extra special tokens
2025-04-18 18:57:33 +02:00
b0c6ff5e13 fix issue that some example with no trainer use accelerator.end_train… (#37435)
* fix issue that some example with no trainer use accelerator.end_training in a wrong way

* reformat code

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-18 17:59:42 +02:00
6f5014ac31 fix 2 encoder_decoder issues on XPU (#37572)
* fix 2 encoder_decoder issues on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fmt

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-18 17:49:24 +02:00
2ba6b92a6f [VLMs] use only xxx_token_id for multimodal tokens (#37573)
* use only `xxx_token_id` for multimodal tokens

* update modeling files as well

* fixup

* why fixup doesn't fix modular docstring first?

* janus, need to update configs in the hub still

* last fixup
2025-04-18 17:03:39 +02:00
4afd3f4820 Model debugger upgrades (#37391)
* debugging improvements

* add debugging details

* add more debugging details

* debug more

* clean up layers + output

* add summary json file

* cleanup

* copies 👀

* remove hooks + add documentation

* draft a small test, why not

* respect the format (respect it)

* fixup imports

* nit

* add tests and configurable pruning of layers
2025-04-18 16:45:54 +02:00
e5ac23081e [Gemma3] compile (#37447) 2025-04-18 14:55:43 +01:00
a1b82563f1 enable 6 modeling cases on XPU (#37571)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-18 12:28:08 +02:00
3cd6627cd7 enable 6 gemma2 cases on XPU (#37564)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-18 12:10:34 +02:00
049b75ea72 Flag SpeechT5 flaky test (#37587)
flag flaky test
2025-04-18 11:35:46 +02:00
aa17cfb4d5 [Bugfix] Fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU (#37575)
[Bugfix] fix flash-attention func param mismatch and softmax_scale default value mistake on Ascend NPU

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-18 11:34:17 +02:00
14b3dbcf3b remove _run_third_party_device_tests (#37445)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-18 11:19:56 +02:00
f974214353 Fix some GPU OOM after #37553 (#37591)
* fix

* trigger CI

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-18 10:09:19 +02:00
438324c9cf Gaudi: Add the bf16 support for hpu (#37568)
* Fix: hpu can support the bf16

Signed-off-by: yuanwu <yuan.wu@intel.com>

* hpu is not integrated into torch.

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Gaudi1 cannot support bf16

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update src/transformers/utils/import_utils.py

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-04-18 08:00:26 +02:00
bb2a44ad4b Fix Quark quantization config (#37578)
fix
2025-04-18 07:23:39 +02:00
4acf692ace Update Phi4 converter (#37594)
* fix converter

* Update phi4_multimodal.md
2025-04-17 23:08:24 +02:00
40cba20e87 Ensure positive warm-up size (#37581)
ensure > 0
2025-04-17 16:11:54 +02:00
346f1eebbd docs: fix typo (#37567)
Co-authored-by: Anthony <anthony.song@capitalone.com>
2025-04-17 14:54:44 +01:00
48dd89cf55 [phi4] update conversion (#37579)
* update conversion

* update
2025-04-17 15:43:04 +02:00
58e5e976e0 Small fix on context manager detection (#37562)
* small fixes

* Update modeling_utils.py

* test

* Update test_modeling_common.py

* Update test_modeling_timm_backbone.py

* more general

* simpler
2025-04-17 15:39:44 +02:00
c7d3cc67a1 Fix qwen2audio wanr -> warn (#37559)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-04-17 14:34:58 +01:00
dc06e7cecd [TimesFM] use the main revison instead of revision for integration test (#37558)
* use the main revison instead of revision

* test prediction

* check larger time steps
2025-04-17 11:26:03 +02:00
3bc44eaaee [qwen-vl] Standardize config (#37268)
* update

* fix tests

* fixup

* update

* skip this one

* fixup

* fix
2025-04-17 09:38:12 +02:00
4f96081aad [chat template] fix security vulnerability (#37523)
* fix security issues

* nit
2025-04-17 09:21:37 +02:00
a2ef3cf537 Add Janus model (#36053)
* Iterative generation using input embeds

* Add Janus model

* discard changes

* Janus imports

* Refactor config and processor

* Added Vision tower of Janus

* Import Janus Image processor

* Vision tower fixes

* Refactor code

* Added VQ Model

* Complete model integration

* temp conversion script

* processor refactor

* Adding files to facilitate pulling

* Fixes after debugging

* Skip test for these models

* Add Janus Model

* discard changes

* Janus imports

* Refactor config and processor

* Added Vision tower of Janus

* Import Janus Image processor

* Vision tower fixes

* Refactor code

* Added VQ Model

* Complete model integration

* temp conversion script

* processor refactor

* Adding files to facilitate pulling

* Fixes after debugging

* Refactor to Text config

*  Added generate function

* Saving intermediate convert file. Still need to read configs from the hub and convert them to our format.

* Adding version that reads from the JSON files. Still have to tweak some parameters manually.

* relative imports

* Initial tests

* Refactor image processor

* Seemingly working version of the conversion script, will need to test further.

* Adding command message

* Fixing conflicting JanusTextConfig class

* Incorporating some of the discussed changes.

* Small fix to create dir.

* Removing system from JINJA template

* Adding draft processor tests

* style fixes

* Minor fixes and enhancement

* added generation config

* Initial tests

* Small modifications, tests are now passing.

* Small changes I noticed while reading code.

* more fixes

* Added JanusModel class

* Small merge adaptations

* Small merge adaptations

* Image processing tests passing

* More tests and fixes

* Convert script updated and refactored

* Tests and cleanup

* make style

* Postprocessing for image generation

* generate refactor

* fixes

* - Passing tests that write a part of the model to cpu (e.g. test_cpu_offload)
- Passing tests of dispatching SDPA
- Only gradient checkpointing tests are left.

* Removing temporary code

* Changes

* Writing change to modular

* Added JanusVisionModel. SDPA dispatch tests pass more robustly. Gradient checkpoint tests are next

* Gradient checkpoint tests passing

* Removing debug code

* Major generate refactor 😮‍💨

* Temp changes for testing

* Green quality CI

* 2 out of 4 integration tests passing

* breadcrumbs

* Usage Examples

* Regenerate modeling after merge

* dirty code

* JanusIntegrationTest are passing

* breadcrumbs

* happy CI

* fixes

* Changing template

* nits

* Text generation logits matching original codebase at 100% precision

* Remove ./tmp from git tracking

* Remove ./tmp from git tracking

* Checkpointing changes after reviewing

* Fixing code in docstrings

* CHanging comments and small bug in convert file

* Fixing bug in image_token_id for 7B version

* Removing line that was added by both of us

* Pushing changes after discussion. Only one left is to change the key mapping for convert file.

* Updating module file

* New convert file using dict. Tested that it is equivalent to the old one by:
- comparing keys in a script
- comparing checksums of the output files between version generated with the current convert script and those generated with the old script. This is a more reliable test.

* revert changes

* mistake

* consistency change for CI

* make style

* doc fixes

* more fixes

* experimenting with masking out pad token

* checkpoint

* Batched generation with multi-images working for 1B models. Will test 7B next.

* Device fix.

* Writing changes to modular, previous ones were written to modeling just for quick testing.

* Using passed processor attention mask (only in modeling for now)

* Matching performance done in the non-standard way

* Working version of batched generation. Will change how some args are passed to make it more similar to language case

* More compliant version of the code

* Removed duplicated `_prepare_4d_causal_attention_mask_with_cache_position`

* Updating modular file, making masked filling with paddings more efficient

* Slightly more efficient version

* Modifying JanusVisionModel to be a wrapper

* Fixing test to comply with new names

* Modular overhaul

* More refactoring

* - Changing JanusVisionModel back
- Changing forward pass
- Adding boi token to the comparison

* - Removing whole context model_ids
- Using inherited implementation of prepare_inputs_for_generation

* Moving the way boi token is passed to the model

* Fixing sdpa test

* Minor changes

* testing changes

* Minor fix

* - Adding postprocessing test
- checking values of generated image on integration test

* changes

* Removing pooled attention vision module, fixing convert script as a consequence

* More changes

* Fixes

* Draft after merge

* Bug fixes

* More bug fix

* Fixing docs

* Nits

* Refactor return dict

* Moving image post processing test to main processor post process

* Passing guidance_scale as kwarg

* make style

* 🔥 refactor

* make style

* Update and green CI

* Nits and tests update

* up

* Added MID block

* fix

* Dead code

* update testcase

* update

* model_id change

* init_weight changes

---------

Co-authored-by: hsilva664 <metallic-silver@hotmail.com>
2025-04-17 09:18:51 +02:00
688f4707bf All models can be initialized on meta device (#37563)
* Update test_modeling_common.py

* fix all

* more fixes
2025-04-16 23:26:44 +02:00
0a83588c51 Bridgetower fast image processor (#37373)
* add support for fast tokenizer

* make style

* fix according to reviews

* make style

* relax slow_fast_equivalence mean diff

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-04-16 22:39:18 +02:00
4005730044 Fix Mamba2 Grouped SSD Support in the torch_forward Path (#37533)
* Fix mamba2 grouped support in bamba torch path

* patch zamba2 and mamba2

* Add a unit test for grouped SSD

* add comment for the new unit test

* add output_size arg value to repeat_interleave calls

* Add comment
2025-04-16 22:16:01 +02:00
a7d2bbaaa8 Add EfficientNet Image PreProcessor (#37055)
* added efficientnet image preprocessor but tests fail

* ruff checks pass

* ruff formatted

* properly pass rescale_offset through the functions

* - corrected indentation, ordering of methods
- reshape test passes when casted to float64
- equivalence test doesn't pass

* all tests now pass
- changes order of rescale, normalize acc to slow
- rescale_offset defaults to False acc to slow
- resample was causing difference in fast and slow. Changing test to bilinear resolves this difference

* ruff reformat

* F.InterpolationMode.NEAREST_EXACT gives TypeError: Object of type InterpolationMode is not JSON serializable

* fixes offset not being applied when do_rescale and do_normalization are both true

* - using nearest_exact sampling
- added tests for rescale + normalize

* resolving reviews

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-16 21:59:24 +02:00
32eca7197a [vlm] adjust max length for special tokens (#37342)
* update

* apply suggestion

* fix tests for main branch

* remove unused logger

* add special tokens in tests

* nit

* fix more tests

* fix test

* pg also
2025-04-16 20:49:20 +02:00
c94c59fc47 Fix pixel attention mask padding in smolvlm (#37497)
* fix bad init

* also modif smolvlm

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-04-16 20:48:46 +02:00
5a6de703a7 Run test_can_load_with_global_device_set using a subprocess (#37553)
* fix

* fix

* fix

* Update tests/test_modeling_common.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-04-16 19:48:30 +02:00
9a4ce64770 🔴 Update CLIP vision attention to new attention interface (#37498)
* update attention interface

* fix test

* propagate attention changes

* revert weird changes

* fix modular

* what?

* ruff is mocking me

* ruff being ruff

* simplify test suite + fix FA2

* fixup tests  + propagate FA2 fixes

* add Copied From where relevant

* fix conflict between copies and modular

* recover FA2 training for CLIP + handle quantization

* don't ditch the warning

* tiny import fix

* code review (FA2 support, copied from)

* fix style

* modularity

* wrong copies

* future-proofing for TP

* mlcd inherits from CLIP
2025-04-16 18:15:22 +02:00
dc8227827d Fix TimesFm doc issue (#37552)
* fix doc

* code block
2025-04-16 16:28:42 +02:00
2f517200c1 Make Ignored Columns ValueError More Informative (#33299)
Make Ignored Columns Value Error More Informative

Included forward method signature columns in the ValueError so end users will know what columns are expected to be passed to the model in addition to those which are ignored.
2025-04-16 16:14:55 +02:00
0577cae808 Fix device issue for tapas (with as_tensor) (#37551)
* fix 1

* fix 2

* fix 3

* fix 4

* fix 5

* fix 6

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-16 16:02:53 +02:00
b33edf1b9b docs(typo): Update ISSUES.md, fix a small typo (#37542)
Update ISSUES.md
2025-04-16 15:01:04 +01:00
503541d7ef add FlashAttentionKwargs and seq_idx to flat collator (#36456)
* add flash attn kwargs to flattening collator

* add return_seq_idx option

* doc string edits

* cleaner max len updates

* various fixes

* temp testing code

* return int32 seq_idx and FlashAttnKwargs

* DataCollatorIntegrationTest impl

* fix batch dims and dtypes

* fill out remaining collator tests

* test name change and fmt

* rm unused var

* fmt

* minor change

* fmt

* add missing pos_ids check

* consistent {np,pt,tf} tests

* split pt tests into 3, like np/tf tests

* mv comment, rename fa test

* remove batch dim comment

* simply wrapping

* compute cu_seq_len/max_length once

* fmt

* remove tf code

* rm warning

* move separator_id back to 2nd pos

* use cleaner lists in tests

* ret -> batch

* fmt

* attr ordering

* use py ints for max_length_{k,q}
2025-04-16 15:45:03 +02:00
9ddcf5fce5 Update quantization docs (#37439) 2025-04-16 15:44:53 +02:00
a91020aed0 Add TimesFM Time Series Forecasting Model (#34082)
* initial documentation

* rename mask to attention_mask

* smaller tests

* fixup

* fix copies

* move to time series section

* sort docs

* isort fix

* batch_size is not a configuration

* rename to TimesFMModelForPrediction

* initial script

* add check_outputs

* remove dropout_rate

* works with torch.Tensor inputs

* rename script

* fix docstrings

* fix freq when window_size is given

* add loss

* fix _quantile_loss

* formatting

* fix isort

* add weight init

* add support for sdpa and flash_attention_2

* fixes for flash_attention

* formatting

* remove flash_attention

* fix tests

* fix file name

* fix quantile loss

* added initial TimesFMModelIntegrationTests

* fix formatting

* fix import order

* fix _quantile_loss

* add doc for SDPA

* use timesfm 2.0

* bug fix in timesfm decode function.

* compare mean forecasts

* refactor type hints, use CamelCase

* consolidate decode func

* more readable code for weight conversion

* fix-copies

* simpler init

* renaem TimesFmMLP

* use T5LayerNorm

* fix tests

* use initializer_range

* TimesFmModel instead of TimesFmDecoder

* TimesFmPositionalEmbedding takes config for its init

* 2.0-500m-pytorch default configs

* use TimesFmModel

* fix formatting

* ignore TimesFmModel for testing

* fix docstring

* override generate as its not needed

* add doc strings

* fix logging

* add docstrings to output data classes

* initial copy from t5

* added config and attention layers

* add TimesFMPositionalEmbedding

* calcuate scale_factor once

* add more configs and TimesFMResidualBlock

* fix input_dims

* standardize code format with black

* remove unneeded modules

* TimesFM Model

* order of imports

* copy from Google official implementation

* remove covariate forecasting

* Adapting TimesFM to HF format

* restructing in progress

* adapted to HF convention

* timesfm test

* the model runs

* fixing unit tests

* fixing unit tests in progress

* add post_init

* do not change TimesFMOutput

* fixing unit tests

* all unit tests passed

* remove timesfm_layers

* add intermediate_size and initialize with config

* initial documentation

* rename mask to attention_mask

* smaller tests

* fixup

* fix copies

* move to time series section

* sort docs

* isort fix

* batch_size is not a configuration

* rename to TimesFMModelForPrediction

* initial script

* add check_outputs

* remove dropout_rate

* works with torch.Tensor inputs

* rename script

* fix docstrings

* fix freq when window_size is given

* add loss

* fix _quantile_loss

* formatting

* fix isort

* add weight init

* add support for sdpa and flash_attention_2

* fixes for flash_attention

* formatting

* remove flash_attention

* fix tests

* fix file name

* fix quantile loss

* added initial TimesFMModelIntegrationTests

* fix formatting

* fix import order

* fix _quantile_loss

* add doc for SDPA

* use timesfm 2.0

* bug fix in timesfm decode function.

* compare mean forecasts

* refactor type hints, use CamelCase

* consolidate decode func

* more readable code for weight conversion

* fix-copies

* simpler init

* renaem TimesFmMLP

* use T5LayerNorm

* fix tests

* use initializer_range

* TimesFmModel instead of TimesFmDecoder

* TimesFmPositionalEmbedding takes config for its init

* 2.0-500m-pytorch default configs

* use TimesFmModel

* fix formatting

* ignore TimesFmModel for testing

* fix docstring

* override generate as its not needed

* add doc strings

* fix logging

* add docstrings to output data classes

* add _CHECKPOINT_FOR_DOC

* fix comments

* Revert "fix comments"

This reverts commit 8deeb3e191b3671bc1d74dbfe77b736a066c3d34.

* add _prepare_4d_attention_mask

* we do not have generative model classes

* use Cache

* return past_key_values

* modules initialized with config only

* update year

* Update docs/source/en/model_doc/timesfm.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add layer_idx to cache

* modular timesfm

* fix test

* unwrap sequential class

* fix toctree

* remove TimesFmOnnxConfig

* fix modular

* remove TimesFmStackedDecoder

* split qkv layer into individual layers

* rename projection layers

* use ALL_ATTENTION_FUNCTIONS

* is_causal is True

* rename config

* does not support flash_attn_2

* formatting

* fix typo in docsstring

* rename inputs

* add time series mapping

* Update src/transformers/models/olmo2/modeling_olmo2.py

* Update src/transformers/models/moonshine/modeling_moonshine.py

* use updated arguments

* fix class name

* add MODEL_FOR_TIME_SERIES_PREDICTION_MAPPING

* isort

* consolidate _preprocess into forward

* fix a typo

* fix a typo

* fix toc

* fix modular

* remove aaserts

* use self.config._attn_implementation

* move to _postprocess_output

* remove timesfm_get_large_negative_number

* use view unstead of multiple unsqueeze

* make helpers static methods of the Model

* use to_tuple

* use to_tuple if not return_dict

* remove unused intitialization block as its incorporated in nn.Linear

* remove unused num_key_value_groups

* use the same convention as the masking method

* update modular

* do not use unsqueeze

* use view instead of unsqueeze

* use buffer for inv_timescales

* formatting

* modular conversion

* remove unneeded intialization

* add missing docstrings

* remove cache

* use simple_eager_attention_forward

* support tp_plan

* support for flex and flash attention masks

* Revert "support for flex and flash attention masks"

This reverts commit def36c4fcf31599b3f4937c9334b7da1a20132c3.

* fix device

* fix tests on gpu

* remove unsued large model test

* removed unneeded comments

* add example usage

* fix style

* add import

* Update docs/source/en/model_doc/timesfm.md

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* inherit from LlamaRMSNorm

* use can_return_tuple decorator

* remvoe return_dict

* fix year

* Update docs/source/en/model_doc/timesfm.md

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* pretrained does not inherit from GenerationMixin

* use model for integration test

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Rajat Sen <rsen91@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-04-16 15:00:53 +02:00
8669c016d2 Refactor torchao docs (#37490)
* refactor docs

* add serialization

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* reorder

* add link

* change automatic to autoquant

Co-authored-by: DerekLiu35 <91234588+DerekLiu35@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/torchao.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* nits

* refactor

* add colab

* update

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: DerekLiu35 <91234588+DerekLiu35@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-16 14:56:48 +02:00
e3d3b54638 Keep Quark loading through meta device (#37538) 2025-04-16 14:19:56 +02:00
61436a9323 convert scale and zero to cuda when using HQQ backend (#37425) 2025-04-16 14:13:20 +02:00
7752e7487c Fixes hqq by following a new path for bias parameter in pre_quantized models (#37530)
* fix

* add test
2025-04-16 13:58:14 +02:00
7dafcd0077 More appropriate cuda warmup in resource-constrained hardware (#37550)
* better allocation in resource constrained env

* Update modeling_utils.py

* CIs
2025-04-16 13:40:02 +02:00
6fd87d1172 Add Fast Grounding-Dino Processor (#37108)
* Add Fast Grounding-Dino Processor

* Added modular file

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-16 12:26:08 +02:00
ed53809ac5 enable 6 rt_detr_v2 cases on xpu (#37548)
* enable 6 rt_detr_v2 cases on xpu

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-16 11:23:56 +02:00
d91858c232 enable 3 mpt test cases on XPU (#37546)
* enable 3 mpt test cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-16 11:23:06 +02:00
4541c2cdef Fix BitsAndBytesConfig JSON serialization in TrainingArguments (#37520)
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-16 11:18:17 +02:00
a335dc4d6d enable test_offloaded_cache_implementation on XPU (#37514)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-16 11:04:57 +02:00
33f6c5a5c8 enable several cases on XPU (#37516)
* enable several cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update tests/test_modeling_common.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-16 11:01:04 +02:00
5ab7a7c640 enable 5 cases on XPU (#37507)
* make speecht5 test_batch_generation pass on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enable 4 GlmIntegrationTest cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-16 09:28:02 +02:00
3165eb7c28 Refactor ColPali model documentation (#37309)
* Refactor ColPali model documentation

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Include quantisation exemple + real images

* simpler image loading

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-15 13:52:11 -07:00
33c6fdb2cf Update VITS model card (#37335)
* Update VITS model card

* Update docs/source/en/model_doc/vits.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vits.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vits.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vits.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update vits.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-15 13:16:05 -07:00
4cc6b60654 Fix broken add-fast-image-processor CLI (#37499) 2025-04-15 18:50:21 +02:00
51f544a4d4 Add Fast Conditional-DETR Processor (#37071)
* Add Fast Conditional-DETR Processor

* Update image_processing_conditional_detr_fast.py

* Add modular_conditional_detr.py

* Update image_processing_conditional_detr_fast.py

* Update tests

* make fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-15 18:33:34 +02:00
4f1dbe8152 Add Fast Chinese-CLIP Processor (#37012)
* Add Fast Chinese-CLIP Processor

* Update dummy_torchvision_objects.py

* Fix tests
2025-04-15 18:31:20 +02:00
c08997c52e VDR task guide (#37485)
* VDR task guide

* Add to toctree

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/visual_document_retrieval.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-15 08:55:13 -07:00
57da364d8e fix and enhance pipeline_webserver.md (#36992)
* fix and enhance pipeline_webserver.md

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update docs/source/en/pipeline_webserver.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/pipeline_webserver.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* use pipe

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-15 08:35:05 -07:00
356b3cd71d Fix missing return type for MLCD docs (#37527)
* Fix missing return type for docs

* trigger
2025-04-15 14:04:16 +01:00
0ad3710d47 fix: Restore explicit error surfacing for unexpected hub exceptions (#37525)
* fix: Restore explicit error surfacing for unexpected hub exceptions

Prior to PR #36033, unexpected exceptions (e.g., ModuleNotFoundError) during hub model loading were not swallowed silently. They either matched specific except blocks or were raised.

After #36033, a catch-all except Exception block was introduced without a fallback else, causing unknown errors to be silently ignored and leading to misleading downstream behavior.

This commit adds an `else: raise e` to ensure only explicitly handled exceptions are suppressed. All others are surfaced, restoring pre-4.50 behavior and aiding in debugging and dependency visibility.

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-04-15 14:54:11 +02:00
f6c79f767c Add Fast Yolos Processor (#37292)
* Add Fast Yolos Processor

* Update modular file

* Fix copies

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-15 14:23:08 +02:00
ecaeee66bc Llama4: remove redundant transpose of router_logits (#37468)
* Llama4: remove redundant transpose of router_logits

* Fix formatting
2025-04-15 12:29:26 +01:00
6f7ea1cf00 Add MLCD model (#36182)
* Add MLCD model

* Update codes for auto-mapping

* Add test scripts for MLCD

* Update doc for MLCD model

* Fix import error

* Fix import error

* Fix CI error for attention_outputs

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix code style for CI

* Fix CI error for initialization

* Fix code style for CI

* Fix code style for CI

* Reformat codes and docs for CI test

* Reformat codes and docs for CI test

* Remove unused attributes for CI test

* Fix style for CI test

* List MLCD in flash_attn doc

* Fix: typos, modulars, refactors from suggestions

* Refactoring convert_mlcd_weights_to_hf.py from suggestions

* Fix: docs conflicts

* Fix error for CI test

* Fix style for CI test

* Add integration test for MLCD

* Refactoring by class inheritance

* Fix: refactor attention interface, adjust codes

* Fix: merging conflicts

* Fix: merging conflicts

* Fix: style for CI test

* Fix: style for CI test

* Fix: set test_resize_embeddings to be False

* Fix: initializer for CI test

* Fix: conflicts, CI test, warning and refactoring

* Fix: merging conflicts

* Refactor

* Update docs

* Fix mistakes

* Remove unused args and fix multi-gpu error

* Revert position_embeddings

* Solve conflicts

* Solve conflicts

* Remove dummy

* Update _init_weights

* Update _init_weights

* Update _init_weights for CI test
2025-04-15 11:33:09 +01:00
d6ac923ad9 Change default value of attn_temperature_tuning (#37501)
fix: change default value of `attn_temperature_tuning`
2025-04-15 12:10:38 +02:00
c8e0e603de Detect and use device context manager or global device in from_pretrained (#37216)
* Update modeling_utils.py

* improve

* Update modeling_utils.py

* Update test_modeling_common.py

* Update test_modeling_timm_backbone.py

* Update test_modeling_common.py

* Update test_modeling_common.py

* Update test_modeling_common.py

* Update test_modeling_common.py

* CIs
2025-04-15 09:59:20 +02:00
4e63a1747c Don't auto-assign reviewers when the author is in HF (#37500)
* Don't auto-assign reviewers when the author is in HF

* Trigger tests
2025-04-14 18:17:38 +01:00
8ab296501a Remove deprecation warning for num_logits_to_keep (#37149)
* remove everything

* style
2025-04-14 19:08:45 +02:00
20ceaca228 Add Fast owlvit Processor (#37164)
* Add Fast Owlvit Processor

* Update image_processing_owlvit_fast.py

* Update image_processing_owlvit_fast.py

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 17:58:09 +02:00
cb39f7dd5b [qwen-omni] fix processor (#37493)
* fix

* delete print

* accept kwargs in overriden models as well

* remove duplicate
2025-04-14 17:30:31 +02:00
d228f50acc Fixing gated repo issues (#37463)
using unsloth model
2025-04-14 17:19:10 +02:00
a5dfb98977 Fix wrong argparse type in modular checker script (#37472)
fix(util): wrong argparse type in modular checker script
2025-04-14 16:11:29 +01:00
a53a63c9c2 Add Fast Mobilenet-V2 Processor (#37113)
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 17:08:47 +02:00
4774a39d05 Add ImageProcessorFast to BiT processor (#37180)
* Add ImageProcessorFast to BiT processor

* propose a fast processor and add tests

* all tests pass except one

* run make

* remove useless print

* use same test as clip

* apply make

* Update src/transformers/models/bit/image_processing_bit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update setup.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/bit/image_processing_bit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* apply review comment

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 17:07:48 +02:00
e43f168eb3 Add Fast LeViT Processor (#37154)
* Add Fast LeViT Processor

* Update levit.md

* Update src/transformers/models/levit/image_processing_levit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* ruff check

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 17:07:36 +02:00
1efcfa9ca4 Fix mask handling for flex attention in llama/gemma2/mistral/qwen2 (#37381)
* fix BlockMask handling when using flex_attention for llama/mistral/gemma2

* fix attention_mask types

* revert type hints and fixup

* remove unnecessary assertion
2025-04-14 15:53:27 +01:00
86064035f0 [bug] deprecated deta load_cuda_kernel, MultiScaleDeformableAttention (#37443)
* Update modeling_deta.py

* variable initialization
2025-04-14 15:44:30 +01:00
7cc9e61a3a Add Fast Image Processor for Donut (#37081)
* add donut fast image processor support

* run make style

* Update src/transformers/models/donut/image_processing_donut_fast.py

Co-authored-by: Parteek <parteekkamboj112@gmail.com>

* update test, remove none default values

* add do_align_axis = True test, fix bug in slow image processor

* run make style

* remove np usage

* make style

* Apply suggestions from code review

* Update src/transformers/models/donut/image_processing_donut_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* add size revert in preprocess

* make style

* fix copies

* add test for preprocess with kwargs

* make style

* handle None input_data_format in align_long_axis

---------

Co-authored-by: Parteek <parteekkamboj112@gmail.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 16:24:01 +02:00
4e53840920 Detect and fix most _init_weights() issues - make it work for composite models (#37070)
* Update test_modeling_common.py

* Fix Llama and its modular children

* Update test_modeling_common.py

* qwen3

* first try at prioritizing models

* Update test_modeling_common.py

* Update test_modeling_common.py

* Update test_modeling_common.py

* test

* fix

* fix

* more models

* more

* more

* more

* smarter init for composite models!

* fix post rebase

* smol

* fix missing args

* more

* typo

* Super elegant and efficient init for submodels

* Update modeling_utils.py

* style

* last fixes

* cleanup

* finalize cleanup

* CIs

* improve docstring

* Update modeling_utils.py

* llama4

* style

* CIs

* style

* add dpt

* granite speech

* qwen 2.5 omni

* better fix

* Parse the config file instead

* CIs
2025-04-14 16:19:04 +02:00
1897a02d83 Add Fast Image Processor for LayoutLMv3 (#37201)
* support fast image processor layoutlmv3

* make style

* add warning and update test

* make style

* Update src/transformers/models/layoutlmv3/image_processing_layoutlmv3_fast.py

* Update image_processing_auto.py

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 15:42:11 +02:00
7bff4bdcf6 Fixed broken links (#37466)
* Update broken link

* Update broken link
2025-04-14 14:16:07 +01:00
e16775d103 Add Fast Image Processor for LayoutLMv2 (#37203)
* add support layoutlmv2

* make style

* Apply suggestions from code review

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* add warning and clean up

* make style

* Update src/transformers/models/layoutlmv2/image_processing_layoutlmv2_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 15:06:41 +02:00
49b9a69a36 Add Fast Image Processor for Flava (#37135)
* support flava fast image processor

* run style and quality

* update test

* update according to reviews

* make style

* update comment on BICUBIC

* make style

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 15:05:31 +02:00
a5079a2c84 [ci] fix doc builder (#37489)
happy doc ci
2025-04-14 13:49:31 +02:00
e7f5724efd Add Fast Image Processor for Perceiver (#37176)
* add test and fast image processor

* make style

* Update src/transformers/models/perceiver/image_processing_perceiver_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* make style

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-04-14 13:49:13 +02:00
4b8c6d4cf8 Add Qwen2.5-Omni (#36752)
* Add qwen2.5-omni

* Remove einops dependency

* Add torchdiffeq dependency

* Sort init

* Add torchdiffeq to extras['diffeq']

* Fix repo consistency

* use cached_file

* del odeint

* renew pytest

* format

* Remove torchdiffeq

* format

* fixed batch infer bug

* Change positional_embedding to parameter

* Change default speaker

* Config revision

* Use modular & code clean

* code clean

* decouple padding with model & code cleaning

* sort init

* fix

* fix

* Second code review

* fix

* fix

* rename vars to full name + some comments

* update pytest

* Code clean & fix

* fix

* style

* more clean up

* fixup

* smaller vision model in tests

* fix processor test

* deflake a bit the tests (still flaky though)

* de-flake tests finally + add generation mixin

* final nits i hope

* make sure processor tests are complete

* replace with Qwen2_5OmniForConditionalGeneration

* fix tests after updating ckpt

* fix typos when cleaning, also we can't change ckpt

* fixup

* images and videos kwargs for processor

* thinker and talker loadable from hub ckpt

* address comments and update tests after rebase

* fixup

* skip for now

* fixup

* fixup

* remove torch dependency in processors

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.con>
Co-authored-by: feizi.wx <feizi.wx@alibaba-inc.com>
Co-authored-by: raushan <raushan@huggingface.co>
2025-04-14 12:36:41 +02:00
ac1df5fccd Fix tests failed with gated repos. (#37484)
* fix

* slow

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-14 12:08:13 +02:00
1ef64710d2 Remove fsspec dependency which isn't directly used by transformers (#37318)
Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-14 12:02:28 +02:00
47b9f06aa2 make test_snowman_image_captioning pass on XPU, by sharing same atol w/ ROCM (#37480)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-14 11:39:45 +02:00
78cea3e22c fix: (llama4) fix no_split_modules to be picked up for fsdpv1 and v2 sharding (#37462)
fix: fix no_split_modules to be picked up for fsdpv1 and v2 sharding

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-04-14 10:44:32 +02:00
953196a43d Fix typing issues with SigLip2 (#37356)
* Fix issues

* Fix comment

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-04-11 22:24:23 +01:00
aaf129cdae [agents] remove agents 🧹 (#37368) 2025-04-11 18:42:37 +01:00
69e6ddf27f Delete hubconf.py (#37455)
* Delete hubconf.py

* Trigger tests
2025-04-11 18:12:45 +01:00
623d395aff Add Granite Speech Support (#36801)
* First pass at speech granite

Add encoder / projector, rename things

* Combine into one model file with causal lm outputs for forward

* Add loss calc

* Fix config loading

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* Split new / old loading logic

* Use transformers integration for loading peft adapters

* Add generation wrapper for selective lora enablement

* Add note for qformer encoder automodel

* Guard torch/audio imports in feature extractor

* Handle granite speech autoclasses

* Handle optional deps in package structure for granite speech

* Add granite pretrained model def for init

* Add dummy objects for torch/torchaudio

* Add tests for granite speech processor

* Minor formatting fixes and refactoring

* Add options for falling back to config in forward

* Tentative model docstrings for granite speech

* Fix config type

* Remove legacy load

* Allow non-lora variants for granite speech

* Override weight tying for llm

* Use text config instead of llm config

* Add output embeddings getter to fix weight tying

* Fix relative imports

* computing the number of audio features, based on the raw audio sequence.

* collating audio inputs, and keeping the original lengths.

* asserted we have text. otherwise we can't specify the audio special token.

* assering the number of audio-symbols/audios match correctly.
running get validated_audios only when audio is present

* indentation bugfix + supporting different feature lengths when expanding audio.

* redundant, done in _get_validated_text

* adapting the tests:
- we must have text (not either audio or text)
- _get_num_audio_features takes a list of raw lengths, provided it insetad.

* Minor cleanup, remove unused import

* Add more tests for batch feature processing

* Allow setting offset in rel position embeddings

* Add config option for warning if peft is not installed w/ lora

* Port blip2 qformer code into granite speech

* Add sad test for numpy arr processing

* Allow numpy arrays / tuples in granite speech processor

* Fix config type for projector

* - pad instead of creating a zeros tensor, to keep the original dtype/device (support bfloat16)
- cast input_features to the model dtype (support bfloat16)

* merge Blip2QFormerConfig to GraniteSpeechProjectorConfig

* prevent a crash when re-saving/loading the model (line 109)

* consider additional edge cases during preprocessing.

* consider additional edge cases during preprocessing.

* add features mask for batched inference (bugfix)

* Minor refactor, remove multiaudio processor tests

* Add set input/output embeddings for granite speech

* Fix feature dim check in processor test

* Pop input features in embed test for granite speech

* Small fixes for test edge cases

Add granite speech to seq2seq causal lm mapping names

* Add small tests for granite speech model

* Fix data parallelism test

* Standardize model class names

* Fix check for copies

* Fix misaligned init check

* Skip granite speech in checkpoint check

* Use default for tie_word_embeddings in granite speech

* Fix non documentation granite speech repo issues

* Fix comments and docstring checks

* Add placeholder docs for granite speech

* Fix test naming collision

* Code formatting

* Rerun torch dummy obj regen

* Fix save pretrained for granite speech

* Import sorting

* Fix tests typo

* Remove offset hack

* Pass args through encoder config

* Remove unused prune heads from blip2

* removing einsum. replaced with explicit multiplication (relative positional encodings) and sdpa attention.

* remove Sequential from ConformerFeedForward and ConformerConvModule. + fix for sdpa attention

* remove GraniteSpeechConformerScale

* rename to hidden_states

* rename conformer layers to self.layers, remove the first linear from the list to keep the list homogenous.

* move pre-norm to the attention/feedforward blocks (avoid complex module wrapping)

* adding pre_norm into forward

* feature extractor refactoring to resemble how it's done in phi4multimodal.

* rename feature_extractor to audio_processor

* bugfix: input_feature_mask fix to get the exact number tokens.

* Fix pytest decorator in processor test

* Add (disabled) integration tests for granite speech

* Fix handling of optional feature masking

* Loosen validation in processing for vLLM compatability

* Formatting fixes

* Update init structure to mirror llama

* Make granite speech projector generic

* Update test config to reflect generic projector

* Formatting fixes

* Fix typos, add license

* Fix undefined var in input processing

* Cleanup and expose ctc encoder

* Add missing config docstrings

* Better var names, type hints, etc

* Set attn context size in init

* Add max pos emb to encoder config

* Cleanup feature extractor

* Add granite speech architecture details

* Remove granite speech qformer ref

* Add paper link, explicit calc for qkv

* Calculate padding directly in depthwise conv1d init

* Raise value error instead of asserting

* Reorder class defs (classes used at top)

* Precompute relpos distances

* Run formatting

* Pass attention distances through forward

* Apply suggestions from code review

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

* Add todo for using common batch feature extraction

* Rename audios/features

* Ensure chat template may be provided to processor

* Move granite speech docs to audio models

* Add todos for input proc refactoring

* Fix import order

* Guard torch import

* Use relative imports

* Require torch backend for processor in granite speech

* Add backend guards in feature extractor

---------

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Avihu Dekel <avihu.dekel@ibm.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-04-11 18:52:00 +02:00
435f88f1db nit: typing use Llama4TextConfig instead of Llama4Config (#37430)
nit: typing to text config

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-04-11 17:29:34 +01:00
954f31cd81 Add XPU case to is_torch_bf16_gpu_available (#37132)
* Add xpu case to is_torch_bf16_gpu_available

Signed-off-by: cyy <cyyever@outlook.com>

* Refine error messages

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-11 17:28:47 +01:00
28eae8b4bd Add weights_only=True to torch.load (#37062) 2025-04-11 17:18:41 +01:00
bf46e44878 🚨 🚨 Allow saving and loading multiple "raw" chat template files (#36588)
* Add saving in the new format (but no loading yet!)

* Add saving in the new format (but no loading yet!)

* A new approach to template files!

* make fixup

* make fixup, set correct dir

* Some progress but need to rework for cached_file

* Rework loading handling again

* Small fixes

* Looks like it's working now!

* make fixup

* Working!

* make fixup

* make fixup

* Add TODO so I don't miss it

* Cleaner control flow with one less indent

* Copy the new logic to processing_utils as well

* Proper support for dicts of templates

* make fixup

* define the file/dir names in a single place

* Update the processor chat template reload test as well

* Add processor loading of multiple templates

* Flatten correctly to match tokenizers

* Better support when files are empty sometimes

* Stop creating those empty templates

* Revert changes now we don't have empty templates

* Revert changes now we don't have empty templates

* Don't support separate template files on the legacy path

* Rework/simplify loading code

* Make sure it's always a chat_template key in chat_template.json

* Update processor handling of multiple templates

* Add a full save-loading test to the tokenizer tests as well

* Correct un-flattening

* New test was incorrect

* Correct error/offline handling

* Better exception handling

* More error handling cleanup

* Add skips for test failing on main

* Reorder to fix errors

* make fixup

* clarify legacy processor file docs and location

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Rename to _jinja and _legacy

* Stop saving multiple templates in the legacy format

* Cleanup the processing code

* Cleanup the processing code more

* make fixup

* make fixup

* correct reformatting

* Use correct dir name

* Fix import location

* Use save_jinja_files instead of save_raw_chat_template_files

* Correct the test for saving multiple processor templates

* Fix type hint

* Update src/transformers/utils/hub.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Patch llava_onevision test

* Update src/transformers/processing_utils.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Refactor chat template saving out into a separate function

* Update tests for the new default

* Don't do chat template saving logic when chat template isn't there

* Ensure save_jinja_files is propagated to tokenizer correctly

* Trigger tests

* Update more tests to new default

* Trigger tests

---------

Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
2025-04-11 16:37:23 +01:00
897874748b Disable kernels for quantization (#37446)
fix
2025-04-11 16:35:38 +02:00
6a75528cbc prevent creating a view/leaf param for low rank optimizers w FSDP (#37379)
prevent creating a view/leaf param for low rank optimizers:
2025-04-11 14:36:29 +02:00
6cef03ba66 [Regression] Fix Quark quantized model loading after refactorization (#37407) 2025-04-11 13:43:36 +02:00
a563999a02 [processor] clean up mulitmodal tests (#37362)
* clkea up mulitmodal processor tests

* fixup

* fix tests

* fix one last test

* forgot
2025-04-11 13:32:19 +02:00
3c39c07939 Remove triton mlp kernel, not compiling for some models (#37449)
* remove mlp for now

* disable on docker
2025-04-11 12:47:13 +02:00
f797e3d98a Fix the test fetcher (#37452)
Test fetcher
2025-04-11 12:19:27 +02:00
442d356aa5 Add moe kernels (#37376)
* the fix that did not get in

* add kernels

* full graph does not work

* simpler is better

* Update src/transformers/integrations/hub_kernels.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Update src/transformers/integrations/fbgemm_fp8.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Update src/transformers/integrations/hub_kernels.py

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* fixup

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2025-04-11 11:56:22 +02:00
7e9b57ce62 Update-kernel-pin (#37448)
* update `kernels`

* oups

* new pinned version
2025-04-11 11:19:21 +02:00
54a123f068 Simplify soft dependencies and update the dummy-creation process (#36827)
* Reverse dependency map shouldn't be created when test_all is set

* [test_all] Remove dummies

* Modular fixes

* Update utils/check_repo.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* [test_all] Better docs

* [test_all] Update src/transformers/commands/chat.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* [test_all] Remove deprecated AdaptiveEmbeddings from the tests

* [test_all] Doc builder

* [test_all] is_dummy

* [test_all] Import utils

* [test_all] Doc building should not require all deps

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-11 11:08:36 +02:00
931126b929 Fixes: Corrects file path for CUDA kernels (#37438)
Corrects the file path used to locate the CUDA kernels
for the Deformable Attention module. This ensures that
the kernels are loaded correctly, resolving potential
errors during module initialization and usage.
2025-04-11 09:41:46 +01:00
c7064cdba1 enhance require_deterministic_for_xpu (#37437)
* enhance require_deterministic_for_xpu

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-11 08:06:08 +02:00
371c44d0ef Remove old code for PyTorch, Accelerator and tokenizers (#37234)
* Remove unneeded library version checks

Signed-off-by: cyy <cyyever@outlook.com>

* Remove PyTorch condition

Signed-off-by: cyy <cyyever@outlook.com>

* Remove PyTorch condition

Signed-off-by: cyy <cyyever@outlook.com>

* Fix ROCm get_device_capability

Signed-off-by: cyy <cyyever@outlook.com>

* Revert "Fix ROCm get_device_capability"

This reverts commit 0e756434bd7e74ffd73de5500476072b096570a6.

* Remove unnecessary check

Signed-off-by: cyy <cyyever@outlook.com>

* Revert changes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-04-10 20:54:21 +02:00
7ff896c0f2 [Feat] Support npu in modeling models (#37369) 2025-04-10 19:00:58 +02:00
10907e2846 Adding to self_comment_ci.yml (#37426)
add myself
2025-04-10 17:46:56 +02:00
7d76876498 (Part 2) feat: allow for tp_size attr for tplizing the model (#37054)
* feat: custom tp_size, new transformers tp interface

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: review cmt - error when tp_plan not set for tp_size

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: nit in docs

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-04-10 17:44:09 +02:00
dac443414e fix: use mtime by default in Trainer._rotate_checkpoints with automatic fallback (#37260)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-10 17:42:06 +02:00
6daec12d0b Add GGUF support to Gemma3 Text backbone (#37424)
* add gemma3 gguf support

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix typo and add gguf limit

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix a typo

Signed-off-by: Isotr0py <2037008807@qq.com>

* add vision conversion test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix typos

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-10 17:15:43 +02:00
0ea1151222 Llama Kernel integration (#37092)
* initial commit

* style

* update

* change approach attention

* clean up

* fix import

* update

* update

* fix style

* change method

* attention

* add mlp back

* change name

* update name

* fix copies

* fix config

* fix
2025-04-10 17:13:25 +02:00
9c0c323e12 Fix require_read_token (#37422)
* nit

* fix

* fix
2025-04-10 17:01:40 +02:00
bde41d69b4 Correctly drop tokens in SwitchTransformer (#37123)
Previously, the identity function was used for dropped tokens
with a weight from the expert that was not applied to the hidden states.
This was misleading, because dropping means, the expert weight is zero.
Instead of trying to fix the weight, we take an easier approach by initializing with zeros.

Fixes issue https://github.com/huggingface/transformers/issues/37017
2025-04-10 16:58:57 +02:00
7ecc5b88c0 Add image classifier donut & update loss calculation for all swins (#37224)
* add classifier head to donut

* add to transformers __init__

* add to auto model

* fix typo

* add loss for image classification

* add checkpoint

* remove no needed import

* reoder import

* format

* consistency

* add test of classifier

* add doc

* try ignore

* update loss for all swin models
2025-04-10 15:00:42 +02:00
5ae9b2cac0 Quark Quantization gated repo (#37412)
* fix

* empty commit

* empty

* nit

* fix maybe ?
2025-04-10 14:57:15 +02:00
d9e76656ae Fix new failure reports not including anything other than tests/models/ (#37415)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-10 14:47:23 +02:00
1ae8d54b04 [chat-template] Unify tests and clean up 🧼 (#37275)
* fix tests and some clean up

* make one general test for each modality

* remove redundant merging of kwargs

* edge cases

* dont enforce slow when reloading

* fix gemma3 tests

* has to adapt llama 4 after rebase

* remove also from overriden tests

* should be green now
2025-04-10 14:42:32 +02:00
10144ff116 use rms_norm_eps for the L2Norm for Llama4 (#37418)
use `rms_norm_eps`
2025-04-10 13:33:50 +02:00
aa478567f8 Allow rocm systems to run these tests (#37278)
* Allow rocm systems to run these tests

* Fix skipTest logic

* Use get_device_properties to check system capabilities
2025-04-10 13:33:01 +02:00
ae5ce22664 from_pretrained should handle xpu case (#37382)
* from_pretrained should handle xpu case

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fmt

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-04-10 13:23:17 +02:00
4f139f5a50 Send trainer/fsdp/deepspeed CI job reports to a single channel (#37411)
* send trainer/fsdd/deepspeed channel

* update

* change name

* no .

* final

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-10 13:17:31 +02:00
a2c2fb0108 update kernels to 0.4.3 (#37419)
* update `kernels`

* oups
2025-04-10 12:14:22 +02:00
0ddad2d655 mark llama4 as not supported with fa2 (#37416) 2025-04-10 11:48:46 +02:00
fbb2054ed5 Offloaded hybrid cache for Llama4 (#37401)
* first try (maybe race condition)

* Update cache_utils.py

* cannot avoid the race condition -> use 2 layers

* Update cache_utils.py

* Update cache_utils.py
2025-04-10 11:44:34 +02:00
6d8b0b3378 Fix Llama4 offset (#37414)
* add +1

* Update modeling_llama4.py
2025-04-10 11:40:58 +02:00
f5865d32a2 Restrict & Explain tp_plan for FBgemm (#37404)
* explain tp_plan

* add llama4 check

* add clarification
2025-04-10 11:33:33 +02:00
e39c732644 Handle torch ver in flexattn (#37400)
* Handle torch ver in flexattn

* update
2025-04-10 11:27:54 +02:00
bc0150bb04 Add warning when failed to acquire other user's lock at model download (#37395) 2025-04-10 11:18:27 +02:00
9cda4265d6 handle torch version edge cases (#37399) 2025-04-09 21:49:57 +02:00
e032d12e8a the fix that did not get in (#37370)
* debugging improvements

* add debugging details

* add more debugging details

* debug more

* the fix that did not get in

* First fix flex

* fix query offset

* fix flex first

* fix device mask creation for speed

* small mask creation sdpa

* Update flex_attention.py

* remove chunked prefill from HybridChunkedCache

* never seen such a fucked up merged

* clean up layers + output

* add summary json file

* Efficient general cache

* Update cache_utils.py

* cleanup

* fix?

* fix!

* oups typo

* not everywhere

* more fixes

* revert unrelated changes

* Fix but ugly for now -> should use pad instead

* oups

* re-initialize the cache

* Use pad to simplify

* style

* correct slicing

---------

Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-04-09 20:15:33 +02:00
f834ca2c19 Attention Quantization with FBGemm & TP (#37384)
* fix

* keep fused

* contiguous

* rm print

* update

* update

* rm print
2025-04-09 18:45:42 +02:00
c5c648dd74 Fix some failing AWQ tests (#37383)
* update AwqQuantizer

* fix style

* add an arg to get_modules_to_not_convert to add get_keys_to_not_convert(model)
2025-04-09 18:24:57 +02:00
71b35387fd Apply torchfix to replace deprecated functions: _pytree._register_pytree_node and torch.cpu.amp.autocast (#37372)
fix: apply torchfix
2025-04-09 16:11:18 +01:00
ad340908e4 Fix warning message for PEFT models in text-generation pipeline #36783 (#36887)
* add peft model in constant

* add test

* fix formating

* make fixup execute

* change code

* check by self.task

* add test

* fixup test code

* fix minor typo

* fix pipeline test

* apply maintainers reqests
2025-04-09 15:36:52 +01:00
2527f71a47 Add "selecting a quantization method" doc (#37159)
* initial draft

* make documentation simpler

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/selecting.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* turn pros and cons into tables

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add links to each quant method page

* separate calibration vs no calibration methods

* add calibration time estimates

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-09 15:51:37 +02:00
7ae0be722e update deepspeed docker (#37371)
* update

* create docker image

* 03

* uninstall pytest as it conflits with transformers

* wrong one

* better

* see which package depends on pytest

* up

* resintall

* fix

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

* deepspeedddddddd

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-09 14:54:06 +02:00
e3eda6d188 Add glm4 (#37388)
* add changed

* Revert "add changed"

This reverts commit 0a0166a1fe80556115a49fbf0c2132de0f4f85c9.

* update with NEW MODEL class called GLM4

* update

* Update glm4.md

* Name

* style

* fix copies

* fixup test

---------

Co-authored-by: Yuxuan Zhang <2448370773@qq.com>
2025-04-09 14:02:04 +02:00
1e6ff5fd55 fix: llama4 conversion script no_rope_layers (#37359)
fix conversion script no_rope_layers

`no_rope_layers` should either be a list of NoPE layers or None, such that it is created in the config from the `no_rope_layer_interval`

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-04-09 13:02:15 +02:00
6f4058aee3 Update composition flag usage (#36263)
* update composition flag usage

* remove print

* fix tests

* actually fix

* oh c'mon

* now should be fixed right?

* fix copies
2025-04-09 11:48:49 +02:00
08e3217baf Preserve requires_grad in pre quantized model (#37354)
* Preserve requires_grad in pre quantized model

Summary:
discovered this when running lm-eval for some models, current
code will set requires_grad to True always

Test Plan:
lm_eval --model hf --model_args pretrained=jerryzh168/phi4-torchao-gguf-q4_k --tasks hellaswag --device cuda:0 --batch_size 8

Reviewers:

Subscribers:

Tasks:

Tags:

* ruff format

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-08 18:41:30 +02:00
4d0de5f73a 🚨 🚨 Setup -> setupclass conversion (#37282)
* More limited setup -> setupclass conversion

* make fixup

* Trigger tests

* Fixup UDOP

* Missed a spot

* tearDown -> tearDownClass where appropriate

* Couple more class fixes

* Fixups for UDOP and VisionTextDualEncoder

* Ignore errors when removing the tmpdir, in case it already got cleaned up somewhere

* CLIP fixes

* More correct classmethods

* Wav2Vec2Bert fixes

* More methods become static

* More class methods

* More class methods

* Revert changes for integration tests / modeling files

* Use a different tempdir for tests that actually write to it

* Remove addClassCleanup and just use teardownclass

* Remove changes in modeling files

* Cleanup get_processor_dict() for got_ocr2

* Fix regression on Wav2Vec2BERT test that was masked by this before

* Rework tests that modify the tmpdir

* make fix-copies

* revert clvp modeling test changes

* Fix CLIP processor test

* make fix-copies
2025-04-08 17:15:37 +01:00
c15a7adb28 fix(qwen): fix shape error when using tp (#36947)
* fix(qwen): fix shape error when using tp

* Update modeling_qwen2_vl.py

---------

Co-authored-by: shidongxing <shidongxing@pjlab.org.cn>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-04-08 17:47:30 +02:00
121f91d36c prune LM Head for USD (#36695)
* initial commit

* fix

* fix style

* set default to prune

* add tests

* comment

* remove prune flag from generate

* address Joao's comments

* deprecate_kwarg

* add doc

* fix target_vocab_size

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* fix deprecated argument assistant_model_device

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-08 16:44:10 +01:00
4321b0648c [core] remove GenerationMixin inheritance by default in PreTrainedModel (#37173) 2025-04-08 16:42:05 +01:00
aab0878327 Skip non-selected experts for mixtral and qwen2_moe (#32429)
* Skip non-selected experts for mixtral and qwen2_moe

* Fix: tensor tolist()

* WIP: tokenization test

* fix modular source of truth

* nits

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-04-08 17:41:28 +02:00
35f0f5b5da [llama 4] dynamic rope decorator (#37365)
l4 + dynamic rope decorator
2025-04-08 15:56:31 +01:00
530322ccb6 Set vision config to None for Gemma 1B conversion (#37366)
* Set vision config to None for Gemma 1B conversion

* Trigger tests

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-04-08 14:22:32 +01:00
8064cd9b4f fix deepspeed job (#37284)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-08 15:19:33 +02:00
cdfb018d03 A bit of cleaning 🧹🧹 (#37215)
* cleaning

* CIs
2025-04-08 14:33:58 +02:00
1e6b546ea6 Use Python 3.9 syntax in tests (#37343)
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-08 14:12:08 +02:00
0fc683d1cd convert float for yarn related arguments in rope_scaling (#37139)
* convert float for yarn related arguments in rope_scaling

* sort keys alphabetically

---------

Co-authored-by: ryan.agile <ryan.agile@kakaobrain.com>
2025-04-08 13:58:22 +02:00
2515a5a290 Expose blip2qformer (#37254)
* Expose blip2qformer

* Add missing args to blip2 config
2025-04-08 12:04:33 +02:00
2da82e432d Multiple llama4 fixe (#37353)
* update for fixes

* more fixes

* fuxix dynamic cache?

* style

* fix both traiining and generating. Eager seems alright

* dynamic does not work

* fix most cases, use_cache or not, eager or not, no default cache (ex: not training but you want to get cache states)

* should be final fixes

* fix more stuff no cat

* style

* fix

* style

* final sytle

* qualityeioiwhjfaopsejdpofqsdjkfjha;wesdhgfkjlqsw.denghjkaswednkgs

* fix

* revert
2025-04-08 11:14:49 +02:00
794fde7b1c Fixing flex attention for torch=2.6.0 (#37285)
* adding compile kwarg for torch 2.6

* fixing dynamic

* addressing comment

* typo

* Update src/transformers/integrations/flex_attention.py

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-04-07 23:04:46 +02:00
b54c2f4689 more fixes for post-training llama4 (#37329)
* more fixes for post-training llama4

* use target_length instead of guearded past_key_values
2025-04-07 21:20:23 +02:00
754a370bca Remove unnecessary attr assignment (#36837)
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-04-07 20:19:54 +01:00
31a62c2eb8 Updated Model-card for donut (#37290)
* Updated documentation for Donut model

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated code suggestions

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated code suggestion to Align with the AutoModel example

* Update docs/source/en/model_doc/donut.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated notes section included code examples

* close hfoption block and indent

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-07 11:54:47 -07:00
f830105183 Add bnb to the list of supported quantization methods for LLama4 (#37348)
* add bnb

* style

* update

* add pre_quantized check
2025-04-07 20:34:06 +02:00
e2b0224d94 Update Model Card for Jamba (#37152)
* Update model card for jamba

* Apply the suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review-2

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update model page.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update as per code review.

* Update docs/source/en/model_doc/jamba.md as per code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/jamba.md as per code review

`

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update as per code review.

* fixes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-07 11:02:59 -07:00
6cc109c354 Improvements in Gemma2 model card (#37076)
* Improved Model card for Gemma2

* Made changes in gemma2 as suggested

* Made more changes in the doc (adding image, notes, closing hfoptions)

* minor fixes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-07 10:51:26 -07:00
8bbcdf5409 Clean up the compressed-tensors integration (#37349)
clean up
2025-04-07 19:26:45 +02:00
3a826a45ca Update Model card for GPT2 (#37101)
* Update Model card for gpt2

* Update link for gpt2 space

* fixes docs based on suggestions

* Add transformers-cli and quantization example for GPT-2

* Remove resources and flash attention docs and fix typos
2025-04-07 10:15:28 -07:00
5e855095a2 Update falcon mamba card (#37253)
* feat: edit falcon mamba card

* fix: edit statement on falconmamba arch

* Update docs/source/en/model_doc/falcon_mamba.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon_mamba.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon_mamba.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: add right indent for tags

* fix: remove notas

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-07 10:12:44 -07:00
416b5a875d Update model-card for DINOv2 (#37104)
[docs] Update model-card for DINOv2
2025-04-07 10:11:08 -07:00
f8a16805c5 updated model card for Mistral (#37156)
* model card for Mistral

* Update docs/source/en/model_doc/mistral.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mistral.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mistral.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mistral.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mistral.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggestions

* fix typo

* updated with comments

* updated with comments

* updated with comments

* remove hfoption block

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-07 10:05:36 -07:00
48e179857c Remove HQQ from caching allocator warmup (#37347)
Update modeling_utils.py
2025-04-07 18:33:48 +02:00
832cb684a0 Update translation template (#37294) 2025-04-07 09:29:37 -07:00
22065bd645 fix derived berts _init_weights (#37341)
* fix derived berts

* more

* roformer
2025-04-07 18:25:07 +02:00
f789f960c8 Avoid build crashes when torch.version.xpu doesn't exist and fix Llama4 processor tests (#37346)
* Avoid build crashes when torch.version.xpu doesn't exist

* Trigger tests

* Fix image token and skip inappropriate test

* Remove ignore_errors=True

* Add another skip
2025-04-07 17:05:54 +01:00
12bf24d6ae enable 2 llama UT cases on xpu (#37126)
* enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* switch to use Expectations

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* extract gen bits from architecture and use it

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* add cross refererence

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-07 16:02:14 +02:00
e7ad077012 byebye torch 2.0 (#37277)
* bump Torch 2.1 with broken compatibility `torch.compile`

* dep table

* remove usage of is_torch_greater_or_equal_than_2_1

* remove usage of is_torch_greater_or_equal_than_2_1

* remove if is_torch_greater_or_equal("2.1.0")

* remove torch >= "2.1.0"

* deal with 2.0.0

* PyTorch 2.0+ --> PyTorch 2.1+

* ruff 1

* difficult ruff

* address comment

* address comment

---------

Co-authored-by: Jirka B <j.borovec+github@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-07 15:19:47 +02:00
99f9f1042f Fix torchao usage (#37034)
* fix load path

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix path

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix torchao usage

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert useless change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert fp8 test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix fp8 test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix fp8 test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torch dtype

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-07 14:50:48 +02:00
0fb8d49e88 Use Python 3.9 syntax in examples (#37279)
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-07 12:52:21 +01:00
08f36771b3 Fix init empty weights without accelerate (#37337)
* add the integration

* Update accelerate.py

* Update accelerate.py

* add find_tied_params as well

* Update accelerate.py

* add where copied from

* simplify

* add error
2025-04-07 11:37:29 +02:00
9db31ea585 Fix deepspeed with quantization (#37324)
* Update modeling_utils.py

* Update modeling_utils.py
2025-04-07 11:36:44 +02:00
debfe904c9 fix llama4 training (#37319) 2025-04-07 09:24:44 +02:00
54538ebee3 fix flex attn when optional args aren't passed (#37327) 2025-04-07 09:12:21 +02:00
d1b92369ca v4.52.0.dev0 2025-04-05 22:04:21 +02:00
25b7f27234 Add llama4 (#37307)
* remove one of the last deps

* update fast image processor after refactor

* styling

* more quality of life improvements

* nit

* update

* cleanups

* some cleanups

* vllm updates

* update fake image token

* [convert] Fix typo

* [convert] Strip extraneous bytes from shards

* [convert] Minor fixes

* [convert] Use num_experts

* multi-image fixes in modeling + processor

* fixup size

* 128 experts

* Use default rope

* Unfuse mlp

* simplify a lot inputs embeds merging

* remove .item() 👀

* fix from review

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* set seed

* return aspect ratios and bug fixes

* Moe 128 rebased (#8)

* 128 experts

* Use default rope

* Unfuse mlp

* Address feedback

* Use None "default" for rope_scaling. Add eot.

* Meta/llama quant compat (#7)

* add quant compatible model & conversion code for llama4

* fix a few issues

* fix a few issues

* minor type mapping fix

---------

Co-authored-by: Lu Fang <fanglu@fb.com>

* use a new config parameter to determine which model definition to use for MoE

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Lu Fang <fanglu@fb.com>

* un-comment write_tokenizer from converting script

* remove un-used imports

* [llama4] Pop aspect_ratios from image processor output in Llama4Processor

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* Fix parameter_count name

* Update src/transformers/models/llama4/configuration_llama4.py

* nit

* Add changes for no_rope, moe_layers, chunked attention. Just need to test all

* Update src/transformers/models/llama4/image_processing_llama4_fast.py

* nit

* fix post merge with main

* support flex attention

* fixes

* fix

* add layer

* small updates

* rebase and delete llm_compressor

* nit

* [llama4/mm] Add back <|image|> token that delimits global tile

* [llama4/mm] Fix Llama 4 image processing unit tests

* add explicit dtype

Signed-off-by: Jon Swenson <jmswen@gmail.com>

* sdpa works

* comment todo small

* fix model loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* revert

* nits

* small fix for TP on 1 node

* Read new params from config

* Add <|eom|>

* lol don't know how this got here

* adding fp8

* Save processor, fix chat template

* style

* Add boi/eoi tokens

We don't use them.

* fixes for now flex seems to work :)

* updates

* nits

* updates

* missking keys

* add context parallel

* update

* update

* fix

* nits

* add worldsize and make eager attn work for vision

* Ignore new key present in base models

* add tp_plan

* fix nope

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* minor fix

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* Clean up Llama4 vision model

* current updates

* add support for `attn_temperature_tuning`

* add floor scale

* add missing attn scales

* push what works, dirty trick for the device synch

* oups

* Fix pad_token_id

See
https://huggingface.co/ll-re/Llama-4-Scout-17B-16E/discussions/2/files
Confirmed in the original codebase.

* fix causallml loading

* rm

* fix tied-weights

* fix sdpa

* push current version

* should work with both short and long

* add compressed_tensos & fix fbgemm tp

* Fix flex impl

* style

* chunking

* try to revert the potentially breaking change

* fix auto factory

* fix shapes in general

* rm processing

* commit cache utils cleanup

* Fix context length

* fix

* allocate

* update tp_plan

* fix SDPA!

* Add support for sparse `Llama4TextMoe` layer from the kernel hub

* cleanup

* better merge

* update

* still broken fixing now

* nits

* revert print

* Write max_position_embeddings and max_model_length

* Update modeling_llama4.py

* Save attention_chunk_size

* Sync eos terminators

* Read initializer_range

* style

* remove `dict`

* fix

* eager should use `chunked_attention_mask`

* revert

* fixup

* fix config

* Revert "Merge pull request #36 from huggingface/sparse-llama4-moe"

This reverts commit ccda19f050867dd42ea143c5de60f3dec81375f0, reversing
changes made to a515579aed8c0fe9bf529b6c40446a289406d5d6.

* Fix typo and remove warning with compiled flex and chunked prefill

* Fix MoE vs FF (#41)

* fix

* Use correct no_rope_layers if provided one is empty list

* update tests

* fix

* skipping some tests

* fix fp8 loading

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* fix text geneartion pipeline

Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>

* eager needs 4D mask

* fix

* Some cleanup

* fix

* update

* fix

* replace correctly module

* patch

* modulelist

* update

* update

* clean up

* Don't move to `cuda:0` in distributed mode

* restrict to compressed tensors for now

* rm print

* Docs!

* Fixes

* Update docs/source/en/model_doc/llama4.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fixes

* cuda graph fix

* revert some stuff

* fixup

* styling

* Update src/transformers/models/llama4/modeling_llama4.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* commit licence, cleanup here and there and style

* more styling changes

* fix dummies

* fix and clean docstrings

* remove comment

* remove warning

* Only fast image processor is supported

* nit

* trigger CI

* fix issue with flex encoder

* fix dynamic cache

* Code quality

* Code quality

* fix more tests for now

* Code quality

* Code quality

* Nuke bunch of failing stuff

* Code quality

* Code quality

* cleanup removal of slow image processor

* ruff fix fast image processor

* fix

* fix styling

* Docs

* Repo consistency

* Repo consistency

* fix sliding window issue

* separate llama cache

* styling

* Repo consistency

* Repo consistency

* push waht works

* L4 Repo consistency

* Docs

* fix last last alst alst alst alstsaltlsltlaslt

---------

Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Keyun Tong <tongkeyun@gmail.com>
Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
Co-authored-by: Jon Swenson <jmswen@gmail.com>
Co-authored-by: jmswen <jmswen@users.noreply.github.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
Co-authored-by: Yong Hoon Shin <yhshin@meta.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: drisspg <drisspguessous@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Daniël de Kok <me@danieldk.eu>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-05 22:02:22 +02:00
aa40fda346 Hf Xet extra (#37305)
* Hf Xet extra

* Hf Xet extra
2025-04-05 21:06:05 +02:00
e94571580b Fix deepspeed loading (part 2) (#37306)
* fix

* Update modeling_utils.py

* Update modeling_utils.py

* oups remove print
2025-04-05 20:41:42 +02:00
84aa13dd85 Fix deepspeed loading (#37281)
* Update modeling_utils.py

* Update modeling_utils.py

* fix and remove all imports

* Update modeling_utils.py

* Update modeling_utils.py

* style

* Update modeling_utils.py
2025-04-05 17:05:45 +02:00
0ef339ff1b Update OpenAI GPT model card (#37255)
* Update OpenAI GPT model card

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update OpenAI GPT model card: add usage examples and notes section

* Add API autodoc tags after Notes section for OpenAI GPT model

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/openai-gpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Added missing badges

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-04 15:25:16 -07:00
46d73910d5 Updated T5 model card with standardized format (#37261)
* Updated T5 model card with standardized format

* Updated T5 model card with standardized format, fixed typo

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply reviewer suggestions

* Update docs/source/en/model_doc/t5.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-04 15:23:09 -07:00
579135a2f6 Updated model card for distilbert (#37157)
* Updated model card for distilbert

* Updated the distilbert model card

* Updated model card for distilbert

* Updated the distilbert model card

* Addressed code review comments

* Addressed review comments

* fix pipeline

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-04 15:22:46 -07:00
8cd57eb731 mobilebert model card update (#37256)
* mobilebert model card update

* Updates to model card mobilebert

---------

Co-authored-by: Reshan Gomis <reshang@verdentra.com>
2025-04-04 14:28:35 -07:00
ebe47ce3e9 Fix: Unexpected Keys, Improve run_compressed, Rename Test Folder (#37077) 2025-04-04 21:30:11 +02:00
531e4fcf0e Update model card for Depth Anything (#37065)
[docs] Update model card for Depth Anything
2025-04-04 11:36:05 -07:00
a4e55fcff8 Disable delay_optimizer_creation in Trainer to support fsdp2 (#37147)
* github why you do this

* fix

* make fixup

* disable cpu offload test

* fixup

* tmp reworks

* git branch movement

* make fixup

* add require_fsdp_v2_version

* dep issues

* update ruff and fixup
2025-04-04 20:11:37 +02:00
878562b68d fix test device spec relative path importing issue (#37190)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-04 18:22:55 +02:00
8ebc435267 Fix llava_onevision tests (#37280)
* Fix llava_onevision tests

* Trigger tests
2025-04-04 15:03:38 +01:00
ad3d157188 [RoPE] abstract dynamic RoPE update under a decorator (#37249)
* dynamic rope decorator

* longrope; shorter fwd pass

* propper docstring

* make fixup
2025-04-04 14:27:28 +01:00
3d40bda30e Hugging Face Hub pin to v0.30.0 for Xet (#37166) 2025-04-04 14:58:22 +02:00
acbcb5d07d [Tests] flaky test_constrained_beam_search_generate_dict_output (#37276) 2025-04-04 13:38:42 +01:00
4ba0989eab Clarify error message to ensure min 28x28 image supplied for Qwen 2.5 VL (#37264)
fix: clarify error message for min 28x28 images

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-04-04 12:53:38 +01:00
352ec8ef22 pin specific natten version in docker file (#37274)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-04 13:47:16 +02:00
edd345b52e Fix deprecated PT functions (#37237)
* Fix deprecated PT functions

Signed-off-by: cyy <cyyever@outlook.com>

* Revert some changes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-04-04 12:31:11 +01:00
b016de1ae4 Fix utils/check_bad_commit.py (#37272)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-04 12:18:20 +02:00
f74d7da836 Introduce modular files for speech models (#35902)
* WAV_2_VEC_2 to WAV2VEC2

* added modular files for hubert, wavlm, wav2vec2_bert, data2vec_audio

* remove unnessary definitions in modulars

* added modular files for UniSpeech, UniSpeechSat, Wav2Vec2Conformer

* docstring fix for UniSpeechForCTC

* removed unneccessary re-definition of modular classes

* reverted lazy imports change on modular_model_converter, type-alias for Wav2Vec2BaseModelOutput

* top-level import of deepspeed in seamless_m4t, speecht5

* avoid tracking imports inside classes, relocate lazy deepspeed, peft imports in their original locations

* convert modular

* tiny modular typing fixes

* some more modular fixes

* make style

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-04-04 11:46:27 +02:00
d130cd0e16 update error msg (#37207) 2025-04-04 10:21:30 +02:00
41b9b92b52 [qwen-vl] fix image processor (#37258)
* fix

* add test
2025-04-03 19:48:56 +02:00
8dd0a2b89c Update model card for electra (#37063)
* Update ELECTRA model card with new format

* Update ELECTRA model card with new format

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/electra.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* close hfoption block

---------

Co-authored-by: Wun0 <f20191221@hyderabad.bits-pilani.ac.in>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-03 10:45:35 -07:00
15ac2b6ac5 Update Model Card for ModernBERT (#37052)
* Modify Model Card for ModernBERT.

* Update as per code review.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update model card.

* Update model card.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-03 10:14:02 -07:00
b552708694 chore: Update model doc for code_llama (#37115)
* Update code_llama.md

aims to handle https://github.com/huggingface/transformers/issues/36979#issuecomment-2758560598

sub part of https://github.com/huggingface/transformers/issues/36979

* Update docs/source/en/model_doc/code_llama.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/code_llama.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/code_llama.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* make changes as per code review

* chore: make the function smaller for attention mask visualizer

* chore[docs]: update code_llama.md with some more suggested changes

* Update docs/source/en/model_doc/code_llama.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* chore[docs] : Update code_llama.md with indentation changes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-03 10:09:41 -07:00
2b84831a93 Update model card for Cohere (#37056)
* Update Cohere model card to follow standard template

* Update docs/source/en/model_doc/cohere.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/cohere.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/cohere.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/cohere.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/cohere.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/cohere.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update cohere.md

Update code snippet for AutoModel, quantization, and transformers-cli

* Update cohere.md

* Update docs/source/en/model_doc/cohere.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-03 09:51:40 -07:00
2d46a08b63 Purge unused ModelTester code (#37085)
* Purge correctly this time

* Remove more methods from recent PRs

* make fixup
2025-04-03 17:48:35 +01:00
1b29409d89 feat: updated model card for qwen_2.5_vl (#37099)
* feat: updated model card for qwen_2.5_vl

* applied suggested change 1

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applied suggested change 2

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applied suggested change 3

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: made requested changes for quantization and notes

* suggeested model card change 4

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated model card wiht suggested change 5

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated model card wiht suggested change 6

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated model card wiht suggested change 7

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* feat: applied requested changes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-03 09:13:26 -07:00
8a828a747e Add Optional to types (#37163)
Signed-off-by: cyy <cyyever@outlook.com>
2025-04-03 16:38:01 +01:00
3f6af96732 Adding links to ShieldGemma 2 technical report (#37247) 2025-04-03 16:26:29 +01:00
9a1c1fe7ed [CI] green llama tests (#37244)
* green llama tests

* use cleanup instead

* better test comment; cleanup upgrade

* better test comment; cleanup upgrade
2025-04-03 14:15:53 +01:00
782d7d945d Allow flexible generation params arg when checking pipeline specs (#37211)
* Allow flexible generation params arg

* Trigger tests

* Add docstring and rename js_generate to hub_generate
2025-04-03 13:29:36 +01:00
afafb84b59 Add support for fast image processing in image-pretraining example (#37021)
* Add support for fast image processing in image-pretraining example

Fix typo: correct tuple formatting in IMAGE_PROCESSOR_MAPPING_NAMES

Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>

* Use fast image processor by default

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>

---------

Signed-off-by: jafraustro <jaime.fraustro.valdez@intel.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-04-03 13:26:46 +01:00
34ccfebf32 Fix AST parsing when looking for remote code imports (#37245)
* Not all Call.func nodes have id because they can be methods

* Trigger tests

* Trigger tests
2025-04-03 13:00:51 +01:00
f697b3f824 enable 2 types of case on XPU (#37198)
enable 2 types of case on XPU 1. test_resize_tokens_embeddings_with_deepspeed_multi_gpu 2. test_resize_embeddings_untied_with_deepspeed_multi_gpu

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-03 11:37:55 +02:00
2099287a59 [CI] lazy loading external datasets (#37218) 2025-04-03 09:57:45 +01:00
a0803a9555 [tests] fix mamba integration simple inference precision issue (#37193)
* fix precision issue

* use float32
2025-04-03 10:38:03 +02:00
6ce238fe7a Fix test (#37213)
* Update test_modeling_common.py

* style
2025-04-03 10:24:34 +02:00
12048990a9 Add new dim to num_items_in_batch if necessary (#36967)
* Add new dim to `num_items_in_batch` if necessary

* Unsqueeze only in the DP case

---------

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-03 09:57:03 +02:00
98601cc818 [Phi4] add multimodal chat template (#36996)
* phi4 chat template

* remove from valid kwargs
2025-04-03 09:52:09 +02:00
c9302c0983 Fix static cache export (#37229)
Co-authored-by: Guang Yang <guangyang@fb.com>
2025-04-03 07:05:57 +02:00
2056287940 Updated model card for Qwen2 (#37192)
* Update qwen2.md

* Update qwen2.md

* Update qwen2.md

* Update qwen2.md

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update qwen2.md

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-02 18:10:41 -07:00
3e96a0c32b Update falcon model card (#37184)
* feat: updated model card for falcon

* fix:rewrite model description

* fix: add link to conversion script

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: Add suggested changes

* fix: typo in link for quantization

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/falcon.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: fix indent and close ticks

* fix: add indent

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-02 17:30:37 -07:00
199d7adf10 Updated the model card for CLIP (#37040)
* Update clip.md

* Update docs/source/en/model_doc/clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Incorporated suggested changes

* Update docs/source/en/model_doc/clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/clip.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-02 14:57:38 -07:00
126abe3461 More ReDOS fixes! (#36964)
* More ReDOS fixes!

* Slight regex cleanup

* Cleanup regex replacement

* Drop that regex entirely too

* The regex didn't match config.json, let's make sure we don't either

* Cleanup allowed_value_chars a little

* Cleanup the import search

* Catch multi-condition blocks too

* Trigger tests

* Trigger tests
2025-04-02 18:46:14 +01:00
3d133cc557 Stop DOSing the Hub in the CI (#37209)
* As the title suggests, stop hammering the same files

* make fixup

* Use shutil instead of pathlib
2025-04-02 17:19:33 +01:00
e90d55ebcc [Tests] add min_new_tokens to prevent flaky length checks (#37175) 2025-04-02 15:24:00 +01:00
cbfa14823b No more dtype_byte_size() (#37144)
* No more dtype_byte_size()

* Remove function once again

* Fix rebase cruft

* Trigger tests
2025-04-02 14:58:38 +01:00
7613cf1a45 Add py.typed (#37022) 2025-04-02 14:17:27 +01:00
32c12aaec3 [3/N] Use pyupgrade --py39-plus to improve code (#36936)
Use pyupgrade --py39-plus to improve code

Signed-off-by: cyy <cyyever@outlook.com>
2025-04-02 14:16:06 +01:00
764ab0d46a Merge tensor operations with device transfer operations (#37097)
* Merge operations with to

Signed-off-by: cyy <cyyever@outlook.com>

* Use dtype

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-04-02 14:15:23 +01:00
c94c6ed397 Fix some code annotation typos. (#37102)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-04-02 14:00:41 +01:00
e94d607c8b fix: Add 'image-text-to-text' to TASK_MAPPING (#37107)
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-02 14:51:03 +02:00
adfc91cd46 Try to avoid/reduce some remaining CI job failures (#37202)
* try

* try

* Update tests/pipelines/test_pipelines_video_classification.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-04-02 14:39:57 +02:00
6f5dc9c82e Fixes DynamicCache export issues due to control flow and inplace modifications (#36652)
* Remove unnecessary masked_fill in deberta models

* Enable some code when exporting but not compiling

* add missing import

* style

* replace if by torch.cond

* style

* use numel

* style

* add unit tests

* style

* change empty value for dynamic cache

* replace != [] by numel()

* fix import issue

* style
2025-04-02 12:04:40 +01:00
a165458901 Add device workaround for int4 weight only quantization after API update (#36980)
* merge

* fix import

* format

* reformat

* reformat

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-02 12:42:22 +02:00
ed95493ce0 Skip code 307 in RequestCounter (#36953)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-02 11:35:46 +02:00
211e4dc9a4 [chat-template] fix video loading (#37146)
* fix

* add video

* trigger

* push new iamges

* fix tests

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-02 11:27:50 +02:00
800510c67b [doc] Fix link for Quark quantization page (#37179)
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-04-01 20:57:38 +02:00
41f5c3216c Revert #37031 (#37178)
Update modeling_utils.py
2025-04-01 19:48:15 +02:00
bc2dea3f54 Fix meta state dict loading with quantizers (#37136)
Update modeling_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-01 18:45:58 +02:00
35253076f4 Avoid pipeline test failing related to Hub call (#37170)
* cls

* cls

* cls

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-04-01 18:22:45 +02:00
bf41e54fc8 Fixes the inconsistency of the optionality of attention_mask (#37153)
* debugging issue 36758

* debugging issue 36758

* debugging issue 36758

* updated attn_mask type specification in _flash_attention_forward

* removed pdb

* added a blank line

* removed indentation
2025-04-01 15:31:10 +01:00
3249c5dc15 Refactor attention for SigLIP based models (#36981)
* Update Siglip attention implementation

* Update tests for Siglip

* Remove one level of indentation

* Update test to be more specific

* Fixup

* Idefics2

* Idefics3

* Emu3

* SmolVLM

* Phi4 (just init small update)

* Idefics2 (test fix)

* Update siglip2 tests

* Update eager

* trigger

* Clean up

* Transfer inputs to device in test

* Fixing test

* Fixing test

* Revert contiguous

* Remove unused is_flash_attn_2_available

* Move flaky to specific models
2025-04-01 15:37:25 +02:00
24e311f42b fix XPU UT error case brough by RNG difference btw XPU and CUDA (#37121)
* fix XPU UT error case brough by RNG difference btw XPU and CUDA

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Revert "enable tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits and tests/models/llama/test_modeling_llama.py::LlamaIntegrationTest::test_model_7b_logits_bf16 on xpu"

This reverts commit 3ef83a4f0204642daa45fda56e8aca1afed24b4f.

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-01 13:52:55 +01:00
897ff9af0e [ModernBERT] Never save 'reference_compile' config; should be set based on end user (#36305)
* Never save 'reference_compile' config; should be set based on end user

* Reformat (I ran 'make style' from the wrong env)

* Use pop instead of del

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Use pop instead of del

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-04-01 14:14:39 +02:00
c0bd8048a5 Make canine model exportable by removing unncessary complicated logic (#37124) 2025-04-01 12:31:12 +01:00
60b75d99b6 Only count num items in batch when needed (#36867)
only count num itels when needed
2025-04-01 12:30:39 +02:00
fac70ff3c0 Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses (#36736)
* make _VALID_DICT_FIELDS as a class attribute

* fix test case about TrainingArguments
2025-04-01 12:29:12 +02:00
ae34bd75fd Use public export API on torch 2.5 and future (#36781)
Co-authored-by: Guang Yang <guangyang@fb.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-04-01 10:47:38 +01:00
8f6b27eb5c enable test_assisted_decoding_in_different_gpu test on XPU (#37120)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-04-01 11:22:59 +02:00
737cbd2109 Fix llava xpu tests. (#37130)
* fix llava 4bit xpu test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix llava 4bit xpu test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-01 11:10:13 +02:00
3a6ab46a0b add gpt2 test on XPU (#37028)
* add gpt2 test on XPU

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* auto dtype has been fixed

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* convert model to train mode

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-04-01 11:09:29 +02:00
4b13a02920 Fix std initialization in Idefics variants (#37100)
* Nit 😅

* Another one

* fix

* run ci

* revert change
2025-04-01 09:18:54 +02:00
786d9c5ed9 Fix more inefficient PT operations (#37060)
* Fix inefficient operations

* Remove cpu() call

* Reorder detach()

* Reorder detach()

* tolist without detach

* item without detach

* Update src/transformers/models/rag/modeling_rag.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update tests/models/encodec/test_modeling_encodec.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Use detach().cpu().numpy

* Revert some numpy operations

* More fixes

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-31 16:31:24 +01:00
a1e389e637 Refactor return_dict logic to remove complicated if/else paths (#36794)
* SAM

* CLIP

* SigLIP

* GOT-OCR2 (depends on SAM)

* SigLIP2 (depends on SigLIP)

* trigger tests

* Fix SAM

* Fix missed indexing, use named attributes

* Llama

* Aria

* Bamba

* Update llama: missed outputs return type

* (fixup) Aria

* DiffLlama

* Emu3

* Gemma

* Gemma2

* Paligemma

* Fix paligemma

* Gemma3

* GLM

* Helium

* JetMoe

* Jamba

* Mistral

* Mistral

* Mixtral

* Nemotron

* Olmo

* Olmo2

* Persimmon

* Phi

* Phi3

* PhiMoe

* Qwen2

* Qwen2_moe

* StableLM

* Starcoder2

* Add return_dict decorator

* SAM

* Update decorator: compile, export, trace - friendly

* Llama (decorator)

* SAM (decorator)

* Add decorator `can_return_tuple`

* Llama

* Update to decorator

* Update CLIP

* Update decorator to store `_is_top_level_module` in self

* Update decorator to correctly handle compile/export

* Remove is_torchdynamo_compiling constraint, all work fine with self attribute assignment

* Typing

* GPT NeoX

* Fixup

* Fix attribute Granite

* Fix return type mixtral

* Update Gemma3

* Fix Cohere amd Cohere2

* Fixup

* Fix corner case for Phi4, when activation is shared

* (fix-copies) deepseekv3, phi4

* Fixup

* Apply to qwen3/qwen3_moe

* Fix
2025-03-31 16:23:37 +01:00
f304318f5f Remove low_cpu_mem_usage and _fast_init (#36963)
* Remove low_cpu_mem_usage and _fast_init

* Update deepspeed.py

* Update modeling_utils.py

* remove the first 2 tests everywhere

* Update test_modeling_common.py

* remove what was remaining about fast_init

* fix logic and simplify

* mismatched keys logic update

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* fix 2 models init_weights

* extend to others

* remove grad

* Update modeling_fsmt.py

* init weights in tests

* style

* Update test_modeling_fsmt.py

* more old models

* fix more init_weights

* copies

* fix

* style

* Update modeling_lxmert.py

* fix inits

* more and more

* more

* should finalize

* style

* Update modeling_dinov2_with_registers.py

* fix

* Update modeling_encoder_decoder.py

* fix

* style

* Update modeling_lxmert.py

* post rebase cleanup

* Update modeling_informer.py

* back to start for device

* fix

* add test to detect all failing cases correctly

* Update test_modeling_common.py

* fix

* fix

* sam

* style

* Update modeling_maskformer_swin.py

* CIs

* CIs

* remove test - will add it on separate PR

* fix

* fix

* Update modeling_sam.py

* CIs

* CIs

* CIs

* convnext

* suggestions

* CIs

* fix copies after merge

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-31 17:18:43 +02:00
8805600406 [qwen3] fix generation tests (#37142)
* do not skip tests

* fix qwen3-moe as well

* fixup

* fixup
2025-03-31 16:33:41 +02:00
e686fed635 [Feature] Support using FlashAttention2 on Ascend NPU (#36696)
* [Feature] Support using flash-attention on Ascend NPU

* Fix qwen3 and qwen3_moe moduler conversion mismatch
2025-03-31 16:12:58 +02:00
a03cee7a1d skip (#37141)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-31 15:38:40 +02:00
3b07ca78bb Export T5 (encoder-decoder) to ExecuTorch (#36486)
Co-authored-by: Guang Yang <guangyang@fb.com>
2025-03-31 12:10:26 +02:00
475664e2c6 [tests] remove cuda-only test marker in AwqConfigTest (#37032)
* enable on xpu

* add xpu support

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-31 11:53:02 +02:00
0710e9b1e8 Create and Expose SamVisionModel as public for better accessibility (#36493)
* move encoder below

* auto modeling

* write SamVisionTester

* fix vision attention shape

* fix SamVisionTest

* minor changes to SamVisionTest

* Revert "fix vision attention shape"

This reverts commit d2a4083ae5704716e33351aed03af8f3cc45f3ae.

* fix attention output shape in new tests

* remove encoder examples

* run modular on got_ocr2

* code formatting

* fix got_ocr2

* ruff fixes

* code quality

* add sam_vision in auto modeling and auto configuration

* remove composite test

* updated index.md

* add TFSamVisionEncoder to __init__

* fix public TFSamVisionEncoder

* remove outdated todo comment

* set test_torch_exportable

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* rename: VisionEncoder -> VisionModel

* bring back original SamVisionEncoder

* rename back: VisionEncoderOutput -> VisionModelOutput

* undo changes in SamModelTester

* reuse SamVisionEncoder in SamVisionModel

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-03-31 11:45:07 +02:00
f99c279d20 Remove deprecated code (#37059)
* Remove deprecated code

* fix get_loading_attributes

* fix error

* skip test

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-31 11:15:35 +02:00
d1efaf0318 RWKV: fix mask warning typo (#37114)
rwkv: fix mask warning typo
2025-03-31 11:07:51 +02:00
19919689b2 Fix Gemma3 embedding scaling (#37109)
fix gemma3 embedding
2025-03-31 11:04:02 +02:00
d0b65bb479 [MLU] Fix FA2 check error, remove deepspeed-mlu deps. (#36159)
* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker

* Cambricon support SDPA and flash_attn

* MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu

* Fix mlu FA2 check. Remove deepspeed-mlu check. add mlu tests support.

* fix testing errors.

* Merge branch 'hf/main' into main

* fix get_device_count error.

* fix mlu testing utils.

* fix code quality and style.

* switch to @require_torch_multi_accelerator
2025-03-31 11:02:49 +02:00
ad63d20dff fix whisper re-compile (#36712)
* fix whisper re-compile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix copy

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix copies

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert useless changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-31 11:01:51 +02:00
286393fbb1 enable tp on CPU (#36299)
* enable tp on CPU

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* get rank from cpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable TP tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* em print

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix model id

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix conflict

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix index and add doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-03-31 10:55:47 +02:00
4705b04c74 Fix 4090/ada not detected as having FP8 support (#37067)
fix 4090/ada not detected as having FP8 support

Signed-off-by: Qubitium <qubitium@modelcloud.ai>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-31 10:53:48 +02:00
2b4734bd49 Support passing flash_attn_kwargs when gradient_checkpointing is enabled (#37037)
* support passing flash_attn_kwargs when gradient_checkpointing is enabled

* make modeling_deepspeek_v3.py consistent with modular_deepseek_v3.py
2025-03-31 10:53:02 +02:00
bd41b9c1ac Gaudi: Fix the pipeline failed issue with hpu device (#36990)
* Gaudi: fix the issue of is_torch_hpu_available() returns false

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix make fixup

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add comments for the implicit behavior of import

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update src/transformers/utils/import_utils.py

* Update src/transformers/utils/import_utils.py

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-03-31 10:23:47 +02:00
6acd5aecb3 Adding Qwen3 and Qwen3MoE (#36878)
* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-03-31 09:50:49 +02:00
0d6a60fe55 🌐 [i18n-KO] Translated qwen2_vl.md to Korean (#36750)
* fix: manual edits

* fix: resolve suggestions

* Update toctree.yml
2025-03-30 15:00:27 -07:00
b7fc2daf8b Kenlm (#37091)
* kenlm

* kenlm

* kenlm

* kenlm

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 21:42:54 +01:00
bab605dd04 [Cache] rename dtype attribute 🚨 🚨 (#37044)
* yoink

* same pattern in all cache
2025-03-28 19:08:02 +01:00
9fd9476005 [generate] beam search -- fix output cropping (#37080)
* handle jagged beams

* better comment

* bart -- beam search tests print special tokens

* more bart test updates

* more tests!

* better comment
2025-03-28 18:57:51 +01:00
257bc670fb fixed typo. (#37057)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-28 17:12:14 +00:00
2bea6bf24e Fix AttentionInterface following feedback (#37010)
* up

* typo

* update doc

* Update attention_interface.md
2025-03-28 18:00:35 +01:00
a86dad56bc Fix state_dict map location when quantized (#37086)
* Update modeling_utils.py

* Update modeling_utils.py
2025-03-28 17:57:16 +01:00
d6064754ea Update w/ new account (#37084)
* Update w/ new account

* DS
2025-03-28 12:43:00 -04:00
581cf96e0c fix tied weigths issue (#37031)
* fix

* comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 16:36:44 +01:00
eca74d1367 [WIP] add deepseek-v3 (#35926)
* init commit

* style

* take comments into account

* add deepseekv3 modeling

* remove redundant code

* apply make style

* apply fix-copies

* make format

* add init files

* rename deepseekv3 into deepseek_v3 based on its model_type

* rename deepseekv3 into deepseek_v3 based on its model_type

* deepseek-v3 not deepseek_v3

* set model_type as deepseek_v3

* use default docs

* apply make

* fill type and docstring

* add rope_config_validation

* use custom DeepseekV3MLP

* hold code only for checkpoints congifuration; remove redundant

* revise rope yarn for DeepSeek variation

* rename DeepSeek-V3

* some refactoring

* revise load_hook to work properly; make moe func trainable; use llama instead of mixtral

* fix attention forward

* use -1 for not-changing dim when to use exapnd

* refactor DeepseekV3TopkRouter

* use reshape_for_rope instead of load_hook; revise attention forward for TP; rename q_head_dim with qk_head_dim

* register pre_hook and hook both

* make style

* use n_shared_experts

* Update src/transformers/models/deepseek_v3/configuration_deepseek_v3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add test file

* update modeling_file according to modular file

* make style

* add mapping for DeepseekV3ForSequenceClassification

* remove aux_loss_alpha

* add deepseek_v3 for perf

* add deepseek_v3

* rename test as deepseekv3

* use tiny-deepseek-v3

* remove DeepseekV3ForSequenceClassification

* cache before padding

* remote output_router_logits

* Revert "remote output_router_logits"

This reverts commit f264f800d04950390db8413b9efb24cef8186330.

* remove output_router_logits

* make e_score_correction_bias as buffer

* skip tests not compatible

* make style

* make e_score_correction_bias as buffer

* use rope_interleave instead of load_hook

* skip tests not compatible with MLA

* add doc for rope_interleave

* fix typo

* remove torch.no_grad for selecting topk

* fix post merge issue

* mrege with main and simplify

* nits

* final

* small fixes

* fix

* support TP better

* stash

* changes currently requires

* remove synch

* more fixes for TP

* temp fix for TP : some attention layers's FP8 scales are too small + shared is local colwise and anything is local if FP8 because weights are used

* updates to have generation work!

* push most of the changes

* reorder functions + call for contributions!

* update readme

* nits

* update

* ruff was updated on main

* merge with main and fix copies

* revert unrelated changes

* route all tokens to all experts when testing to avoid no gradient iddues

* finish fixing all tests

* fixup

* nit

* clean config

* last readme changes

* nit

* do cnit

* typo

* last nit

* one more one more

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-165-131.ec2.internal>
2025-03-28 15:56:59 +01:00
52cc204dd7 [blip-2] Fix dtype mismatch when keep in fp32 (#37068)
* fix fp32 BLIP2

* no need to reorder that

* check for `Noneness` as well before casting dtype
2025-03-28 15:52:11 +01:00
aa3778afc2 Change deprecated PT functions (#37041)
Change deprecated functions
2025-03-28 14:26:22 +00:00
c90e6e9625 Fix some typos about benchmark scripts. (#37027)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-03-28 14:10:20 +00:00
1fcaad6df9 Use lru_cache for tokenization tests (#36818)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-28 15:09:35 +01:00
jp
3af425d4c6 fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id' (#37026)
* Add image_token_id and video_token_id handling in Llava processors

* fix: image to video

* fix: correct image and video token ID handling in Llava processors

* fix: improve image and video token ID handling in Llava processors
2025-03-28 10:46:24 +01:00
064cd7cdac Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0) (#36891)
* fix sdpa implementation

* ruff

* also modify 2_5 for consistency
2025-03-28 09:54:21 +01:00
348f3285c5 fix: Fully remove legacy cache from Llama (#36958)
* bug: fully remove legacy cache from Llama

* bug: fix CI issues

* bug: update jetmoe model

* bug: apply =check_modular_conversion.py= fix

* bug: apply make fix-copies

* bug: fix ruff

* PR suggestions

* Remove trailing commas in auto-gen files

* Trivial new line removal
2025-03-27 17:22:44 +00:00
d6b3c7486b fixed typo (#37036) 2025-03-27 15:37:53 +00:00
6cc9c8d7d1 Remove deprecated batch_size parameter (#37007) 2025-03-27 15:01:56 +00:00
4cc65e990f Replace default split function with jnp.split() in flax models (#37001)
Replace split with jnp's split function for flax models (#36854)
2025-03-27 14:59:57 +00:00
41a0e58e5b Set weights_only in torch.load (#36991) 2025-03-27 14:55:50 +00:00
de77f5b1ec Fix typing for None valued variables (#37004)
Fix typing for None-able variables
2025-03-27 14:46:32 +00:00
8c5e29bad5 Avoid unnecessary device operations in loss computing (#36950)
* Avoid unnecessary tensor copy in loss computing

* Add type
2025-03-27 14:45:14 +00:00
471cf1de63 clean pipeline question_answering. (#36986)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-03-27 14:35:33 +00:00
29f322d04d [generate, cache] handle more complex device maps (#37014) 2025-03-27 14:33:20 +00:00
fb8e6c50e4 [audio utils] fix fft_bin_width computation (#36603)
* fix fft_bin_width computation

* update docstring + enforce correct params

* update test with correct value

* udpate test

* update feature extractors for concerned models

* update

* make

* udpate docstring

* udpate docstring
2025-03-27 15:20:02 +01:00
e97c760006 [chat templates} support loading audio from video (#36955)
* add audio from video

* typos

* delete print

* comments
2025-03-27 14:46:11 +01:00
c7bc79bd2a Fixup for distill_any_depth conversion script (#37043)
* Fixup

* trigger
2025-03-27 13:29:25 +00:00
d1eafe8d4e Optimize to_py_obj for python-native numeric lists and scalars (#36885)
* Optimize to_py_obj for python-native numeric lists and scalars

* Fix bug that tuple is not converted to list

* Try np.array for more robust type checking

* Apply review and add tests for to_py_obj
2025-03-27 14:16:46 +01:00
0e56fb69a2 fix pegasus init weights and other copied models (#36844)
* fix pegasus init weights

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix the rest of models

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix informer init

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* init weight before checking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix roformer tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix roformer tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-03-27 14:14:30 +01:00
7e813f9cf0 Add Distill Any Depth (#36614)
* Added conversion Script

* Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Updated Conversion Script

* Update src/transformers/models/depth_anything/convert_distill_any_depth_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-03-27 13:10:03 +00:00
92429057d9 Skip FP8 linear tests For device capability < 9.0(#37008)
* skip fp8 linear

* add capability check

* format
2025-03-27 12:38:37 +01:00
279c2e302a remove redundant code in trainer (#36994)
* Update optimization.py

* Update optimization.py
2025-03-27 11:35:15 +01:00
d13c390d01 Mark 2 tests as flaky for now (#37038)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-27 10:59:47 +01:00
d6d930a64b [Modeling] Load FP8 safetensors such as DeepSeek (#36828)
support loading fp8

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-03-27 10:47:10 +01:00
927ce1d39f Fix PixtralProcessor patch_size when spatial_merge_size is used (#37019) 2025-03-27 10:46:23 +01:00
49b5ab6a27 Support QuestionAnswering Module for ModernBert based models. (#35566)
* push ModernBertForQuestionAnswering

* update ModernBertForQuestionAnswering

* update __init__ loading

* set imports for ModernBertForQuestionAnswering

* update ModernBertForQuestionAnswering

* remove debugging logs

* update init_weights method

* remove custom initialization for ModernBertForQuestionAnswering

* apply make fix-copies

* apply make style

* apply make fix-copies

* append ModernBertForQuestionAnswering to the pipeline supported models

* remove unused file

* remove invalid autoload value

* update en/model_doc/modernbert.md

* apply make fixup command

* make fixup

* Update dummies

* update usage tips for ModernBertForQuestionAnswering

* update usage tips for ModernBertForQuestionAnswering

* add init

* add lint

* add consistency

* update init test

* change text to trigger stuck text

* use self.loss_function instead of custom loss

By @Cyrilvallez

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update modeling_modernbert.py

make comparable commit to even it out

* Match whitespace

* whitespace

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Orion Weller <wellerorion@gmail.com>
Co-authored-by: Orion Weller <31665361+orionw@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-03-26 21:24:18 +01:00
5b08db8844 fix transformers_cli import relative path issue (#36989)
* fix transformers_cli relative import path issue

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-26 18:45:56 +00:00
3a8ec8c467 [docs] Attention mask image (#36970)
add image
2025-03-26 10:11:34 -07:00
2b550c47b2 Remove deprecated training arguments (#36946)
* Remove deprecated training arguments

* More fixes

* More fixes

* More fixes
2025-03-26 16:44:48 +00:00
44715225e3 fix typos in the code comments and error messages (#36993)
* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments

* chore: enhance code comments
2025-03-26 16:09:48 +00:00
79d6f9fd70 Log the correct learning rate (#36973)
* fix learning rate log

* fix lr log

* add lr
2025-03-26 16:52:00 +01:00
13d36e89fe Fix device_map check for ggml files (#37003)
fix
2025-03-26 16:24:57 +01:00
021006e1b0 Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support. (#36975)
* Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support.

Related to https://github.com/bitsandbytes-foundation/bitsandbytes/issues/1573 and https://github.com/huggingface/transformers/issues/36949 , this resolves a bug in allowing ROCm/HIP support in bitsandbytes.

* Related to bitsandbytes-foundation/bitsandbytes#1573 and huggingface#36949 , this resolves a bug in the biteandbytes integration, allowing ROCm/HIP support in bitsandbytes.

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-26 16:18:08 +01:00
788e1092e9 Allow easy registration of custom attention functions (#36889)
* Update modeling_utils.py

* style

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* add to init

* Update modeling_utils.py

* style

* update

* Update modeling_utils.py

* Update modeling_utils.py

* style

* Add some doc

* Update _toctree.yml

* readd it for tgi/vllm compat

* CIs

* CIs
2025-03-26 16:15:06 +01:00
ad5d40de9c Fix get_device_properties (#36997)
Fix remove remnant self from get_device_properties

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-26 15:46:34 +01:00
8084b26294 Fix Optional type annotation (#36841)
* Fix annotation

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-26 13:53:44 +00:00
b56d8f07e4 Install networkx==3.2.1 manually in some CircleCI jobs after #36957 (#37000)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 14:49:09 +01:00
78afa1c537 Use torch.expm1 (#36995) 2025-03-26 13:06:33 +00:00
181d453069 byebye CircleCI TF jobs (#36998)
* byebye tf jobs

* byebye tf jobs

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 12:49:50 +01:00
e7139d06f5 Fix tensor dtype mismatch (#36985)
* Fix tensor dtype mismatch

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-26 10:37:46 +01:00
be37d34f44 🚨Deprecate legacy argument for image-text-to-text models and adopt new behavior by default (#36307)
* deprecate legacy argument and adopt new behavior by default

* revert back modification git
2025-03-25 17:32:17 -04:00
ab4656f6b7 update bot comment again (#36974)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 19:42:09 +01:00
ba531278ca Add ruff target-version (#36971) 2025-03-25 19:41:25 +01:00
a844297088 [docs] Fix image link (#36869)
* fix image link

* fix

* update

* fix
2025-03-25 11:34:21 -07:00
d68a91aebf Remove extra tensor clone in PyTorch code (#36748)
* Use detach().clone()

* Eliminate continuous()

* Merge clone and other calls with to

* Merge clone and other calls with to
2025-03-25 17:42:15 +00:00
121830ab47 update examples after ruff being updated (#36972)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 18:15:47 +01:00
a41677a68b Updated docker files to use uv for installing packages (#36957)
* Updated docker files to use uv pip install as uv is blazingly fast.

* Removed -y flag for uv pip uninstall.

* Passed --no-build-isolation flag

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 18:12:51 +01:00
3dce98a437 typo fixed in README_fr.md (#36951) 2025-03-25 09:29:36 -07:00
ebd2029483 Change GPUS to GPUs (#36945)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 17:25:39 +01:00
69632aadb7 Update after #36962 (#36965)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:16:06 +01:00
c6814b4ee8 Update ruff to 0.11.2 (#36962)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-25 16:00:11 +01:00
bc1c90a755 [Utils] torch version checks optionally accept dev versions (#36847) 2025-03-25 10:58:58 +00:00
80b4c5dcc9 Fix cuda index issue in cache allocator (#36937)
fix
2025-03-25 11:51:41 +01:00
0f733110a6 Support return_tensors in audio chat templates (#34601)
* add audio chat templates

* update

* update

* nit

* green ci

* we dont care about the order anymore

* clean up after rebase

* overriden tests rename

* rename shieldgemma also

* one more rename

* require_read_token

* removde images/videos

* retrigger CI flaky
2025-03-25 11:08:47 +01:00
19085c28da fix typos in the tests directory (#36932)
* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: fix typos in test codes

* chore: format codes
2025-03-25 10:49:24 +01:00
69bcb86c58 Export for Phi4-mini (#36780)
* Export for Phi4-mini

* Update tests/models/phi3/test_modeling_phi3.py

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-25 10:46:38 +01:00
be2c0e7bff Fixing _pre_quantization_dtype when torch_dtype is None (#36930)
fix
2025-03-25 10:43:27 +01:00
4303d88c09 Add Phi4 multimodal (#36939)
* raw start

* update

* update

* add to imports

* update

* up

* simplify configs

* clean configs

* style

* typos

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* fix

* up

* up

* up

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* up

* up

* up

* Update feature_extraction_phi4_multimodal.py

* up

* up

* up

* up

* up

* simplify configs

* typo

* cut code

* typo

* typo

* typo

* re

* typo

* up

* up

* up

* add tests

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* up

* Update test_modeling_phi4_multimodal.py

* doc

* fix

* up

* up

* up

* up

* up

* up

* simplify

* up

* simplify

* config docstrings

* cleanup

* clean

* typo

* typo

* fix

* Update phi4_multimodal.md

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* update

* simplify reshapes and permutes

* up

* simplify special tokens

* simplify processor a lot

* Update processing_phi4_multimodal.py

* Update processing_phi4_multimodal.py

* switch to fast processor

* image processor

* Update image_processing_phi4_multimodal_fast.py

* add lora extraction to converter

* Update convert_phi4_multimodal_weights_to_hf.py

* Update __init__.py

* add AudioInput type in audio_utils

* rewrite feature_extraction: support torch batched FFT

* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values

* test update

* not mono channel warning update

* remove auto maps from processor

* kargs dispatch in processor

* simplify kwargs dispatch

* simplify merging

* remove default sampling rate

* style

* Update test_modeling_phi4_multimodal.py

* update doc

* doc

* torch only feature extractor

* make fake tokens adjustable

* Update feature_extraction_phi4_multimodal.py

* fix

* Update processing_phi4_multimodal.py

* simplify mask

* last touch

* fix copies

* style

* Update audio_utils.py

* style

* Update feature_extraction_phi4_multimodal.py

* Update __init__.py

* docstrings

* copies

* fix all checks

* back to fix-copies

* trigger CIs

* Update feature_extraction_phi4_multimodal.py

* improve tests with multimodal inputs

* trigger CIs

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
47e5432805 Deprecate #36741 and map Causal to Conditional (#36917)
* deprecate the prev fix

* reword warning and update docs

* reword warning

* tests

* dont bloat `get_text_config()`
2025-03-25 09:13:56 +01:00
2b8a15cc3f Disallow Offload to disk for gguf files (#36933)
update

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-24 19:30:01 +01:00
91455c1825 Fix processor kwargs qwen2 vl (#36890)
* Fix qwen2_vl and qwen2_5_vl processors cutom images kwargs

* change version warning
2025-03-24 13:19:26 -04:00
48385aa4f4 Added support for seed in DataCollatorForWholeWordMask (#36903)
* Added support for seed in `DataCollatorForWholeWordMask`, and also wrote tests.

Also fixed bugs where the code hardcoded values for mask replacement probability and random replacement probability, instead of using the values passed by the user.

* formatting issues

* Used better way to generate seed in TF. Made tests more consistent.
2025-03-24 16:57:17 +00:00
5932606d8e More precise comment (#36935)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-24 17:03:09 +01:00
2be2984462 Fix pytorch defomr attn path (#36923)
* Fix pytorch path for DeformableAttention

* Apply for GroundingDino
2025-03-24 15:58:51 +00:00
00d077267a [2/N] Use pyupgrade --py39-plus to improve code (#36857)
Use pyupgrade --py39-plus to improve code
2025-03-24 15:42:25 +00:00
a6ecb54159 Update trainer_pt_utils.py docstrings for consistency (#36912)
* Update trainer_pt_utils.py

* update docstrings trainer_pt_utils.py for consistency

* Update src/transformers/trainer_pt_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-03-24 14:46:41 +00:00
cbf924b76c Fix typos (#36910)
* fix typos

* fix typos

* fix typos

* fix typos
2025-03-24 14:08:29 +00:00
340500b1a9 Use another repo. for Mistral3 processor testing (#36925)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-24 14:36:05 +01:00
9e125d9a2e Fix Compressed tensors to_dict_diff (#36922)
fix
2025-03-24 13:06:33 +01:00
57f551c78d [chameleon] fix num image token check (#36918)
* [chameleon] fix num image token check

* embed after merging image token

* skip this also

* mistral require_read_token
2025-03-24 12:36:08 +01:00
a41e08aa19 tests: fix asyncio.wait() usage for python>=3.11 (#36898)
tests: fix asyncio.wait() usage for python>=3.7

Passing coroutings directly to `asyncio.wait()` is deprecated since
python 3.8 and removed starting from python 3.11. Instead, it's required
to explicitly wrap coroutine in the task with `asyncio.create_task()` which
first appeared in python 3.7.

We step into this issue running the following Transformers tests on a
system with python 3.11 or later (for example, Ubuntu 24.04 has python 3.12):

* `tests/trainer/test_trainer_distributed.py`
* `tests/extended/test_trainer_ext.py`

The error will be:
```
src/transformers/testing_utils.py:2380: in execute_subprocess_async
    result = loop.run_until_complete(
/usr/lib/python3.12/asyncio/base_events.py:687: in run_until_complete
    return future.result()
src/transformers/testing_utils.py:2368: in _stream_subprocess
    await asyncio.wait(
...
E           TypeError: Passing coroutines is forbidden, use tasks explicitly.

```

See: https://docs.python.org/3.10/library/asyncio-task.html#asyncio.wait
See: https://docs.python.org/3.10/library/asyncio-task.html#asyncio.wait
See: https://docs.python.org/3.7/library/asyncio-task.html#asyncio.create_task

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-24 11:53:59 +01:00
e28be7a692 [Fix] Add original_max_position_embeddings to YARN rope_scaling optional keys (#36877)
[fix] Update optional keys in _validate_yarn_parameters to include original_max_position_embeddings
2025-03-24 11:05:19 +01:00
48da44be24 Fix torch version guard at import (#36907)
fix
2025-03-24 10:33:33 +01:00
fe4ca2f4a7 fix Gemma3 Config (#36893)
* fix Gemma3 Config

* fix config in modular gemm3
2025-03-24 10:05:44 +01:00
c9d1e5238a Update installation.md (#36826)
* Update installation.md

* Update README.md
2025-03-21 16:32:02 -07:00
d253de6d58 [docs] Model docs (#36469)
* initial

* fix

* fix

* update

* fix

* fixes

* quantization

* attention mask visualizer

* multimodal

* small changes

* fix code samples
2025-03-21 15:35:22 -07:00
beb9b5b022 Fix Pan and Scan on batched images Gemma3 (#36864)
* process flattened images in fast image proc

* process flattened images in low proc and add tests

* remove print

* add unbalanced batch test pas image proc

* fix integration tests
2025-03-21 13:56:00 -04:00
dd3933dd65 Simplify keep_in_fp32_modules logic (#36722)
* better regex everywhere

* fix

* Update test_modeling_instructblip.py

* BC with explanations this time otherwise it makes no sense at all

* Update test_modeling_instructblip.py

* style

* CIs

* update _keep_in_fp32_modules in blip2

* Update modeling_utils.py

* Update modeling_utils.py

* style

* CIs

* add check

* trigger CIs

* Update modeling_utils.py

* trigger CIs
2025-03-21 16:12:59 +01:00
90e2df5d55 fix: loss computation after embeddings resize - mllama (#36840)
* move loss to generation class

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* code cleanup

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* test for resize and loss computation

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix tests

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix:test for resize and loss

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix resize embedding mllama test

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* review changes

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
2025-03-21 14:47:59 +01:00
4542b8fb27 push v4.51.0.dev0 2025-03-21 13:45:25 +01:00
523f6e743c Fix: dtype cannot be str (#36262)
* fix

* this wan't supposed to be here, revert

* refine tests a bit more
2025-03-21 13:27:47 +01:00
3f9ff19b4e Minor Gemma 3 fixes (#36884)
fix attention mask dtype + outputs type
2025-03-21 13:15:22 +01:00
f94b0c59f2 Use deformable_detr kernel from the Hub (#36853)
* Use `deformable_detr` kernel from the Hub

Remove the `deformable_detr` kernel from `kernels/` and use the
pre-built kernel from the Hub instead.

* Add license header

* Add `kernels` as an extra `hub-kernels`

Also add it to `testing`, so that the kernel replacement gets tested
when using CUDA in CI.
2025-03-21 13:08:47 +01:00
2638d54e78 Gemma 3 tests expect greedy decoding (#36882)
tests expect greedy decoding
2025-03-21 12:36:39 +01:00
b8aadc31d5 🔴 🔴 🔴 supersede paligemma forward to shift pos id indexing (#36859)
* supersede paligemma forward to shift pos id indexing

* fix prepare_inputs_ as well

* fix modular error

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-03-21 12:36:27 +01:00
6321876b5b add eustlb as an actor 2025-03-21 12:32:12 +01:00
94f487626a [generate] model defaults being inherited only happens for newer models (#36881) 2025-03-21 11:01:09 +00:00
f19d018bff Revert "Update deprecated Jax calls (#35919)" (#36880)
* Revert "Update deprecated Jax calls (#35919)"

This reverts commit f0d5b2ff04e1354d32beac70984adcc8100352a0.

* Revert "Update deprecated Jax calls (#35919)"

This reverts commit f0d5b2ff04e1354d32beac70984adcc8100352a0.

* udpate
2025-03-21 11:01:44 +01:00
62116c967f Make ViTPooler configurable (#36517)
* Make ViT Pooler configurable, so that it is possible to pick the activation function and the number of channels in the output

* Add documentation and allow functions as activations (instead of just string)

* formatting change

* Use ACT2FN

* Formatting change

* Formatting changes

* force pooler_act to be string

* force pooler_act to be string

* Add configs to OBJECTS_TO_IGNORE to make check_docstrings happy

* Making the same change in ijepa to make check_modular_conversion happy

* Add IJepaConfig to make CI happy

* rename pooler_size to pooler_output_size as defined in the config

* typo

* revert change to ignore variable

* Ran utils/check_docstrings.py --fix_and_overwrite

* revert unrelated change

* remove redundant defaults

* rename self.act -> self.activation

* tanh activation function in mapping
2025-03-21 11:01:07 +01:00
26c83490d2 chore: fix typos in the tests directory (#36813)
* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* chore: fix typos in the tests

* fix: format codes

* chore: fix copy mismatch issue

* fix: format codes

* chore: fix copy mismatch issue

* chore: fix copy mismatch issue

* chore: fix copy mismatch issue

* chore: restore previous words

* chore: revert unexpected changes
2025-03-21 10:20:05 +01:00
0adbc873d0 Remove call to .item in get_batch_samples (#36861) 2025-03-21 10:14:26 +01:00
6bb8565f0c FIX FSDP plugin update for QLoRA (#36720)
The _fsdp_qlora_plugin_updates checks for LoraConfig but other PEFT
methods can also support quantized models, e.g. VeRA. Therefore, the
isinstance check is now looking for PeftConfig in general.

Moreover, the fsdp_plugin variable may be undefined in the 2nd if
condition, leading to an `UnboundLocalError` error. This is fixed by not
assigning the variable at all.

I checked for tests that may need updating but only found
test_fsdp_config_transformers_auto_wrap associated with this change.
AFAICT, this test does not cover the changed code, since the test does
not start the training loop. Therefore, I haven't updated any tests. LMK
if/how this fix should be tested.

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-21 10:11:47 +01:00
949cca4061 [CI] doc builder without custom image (#36862)
* no image

* test

* revert jax version updates

* make fixup

* update autodoc path for model_addition_debugger

* shieldgemma2

* add missing pages to toctree
2025-03-21 09:10:27 +00:00
97d2f9d8ae Mllama: raise better error (#35934)
* fix mllama

* update test

* fix test
2025-03-21 09:35:37 +01:00
6a2627918d Refactor Aya Vision with modular (#36688)
* refactor aya_vision with modular (incorrect docstring)

* Fix docstrings

* Fix other modulars

* fix docstring

* revert changes

* add tie_weights and resize_token_embeddings
2025-03-20 15:34:56 -04:00
9e771bf402 Add support for seed in DataCollatorForLanguageModeling (#36497)
Add support for `seed` in `DataCollatorForLanguageModeling`. Also wrote tests for verifying behaviour.
2025-03-20 18:27:43 +00:00
ecd60d01c3 [CI] fix update metadata job (#36850)
fix updata_metadata job
2025-03-20 17:17:36 +00:00
42c489f2ae Gemma3: fix test (#36820)
* fix test

* require_read_token and public repo ids

* flash-attn test uncomment

* fix torchscript
2025-03-20 18:14:53 +01:00
068b663f90 [torchao] revert to get_apply_tensor_subclass (#36849)
* revert to old name

* empty commit

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-20 18:00:13 +01:00
1d3f35f30a Add model visual debugger (#36798)
* draft of model tracer visualiser

* add context manager in addition to decorator

* add debug utils to init

* move model debugging utils to dedicated file

* add documentation

* protect some imports

* format

* move and protect imports

* format

* doc: improve errors in case of broken dummy imports.

* format

* use automatic torch backend

* update doc

* fix backend

* (TEMP) move to dummies while backend wait

* update documentation

* doc
2025-03-20 17:37:29 +01:00
6515c25953 Add Prompt Depth Anything Model (#35401)
* add prompt depth anything model by modular transformer

* add prompt depth anything docs and imports

* update code style according transformers doc

* update code style: import order issue is fixed by custom_init_isort

* fix depth shape from B,1,H,W to B,H,W which is as the same as Depth Anything

* move prompt depth anything to vision models in _toctree.yml

* update backbone test; there is no need for resnet18 backbone test

* update init file & pass RUN_SLOW tests

* update len(prompt_depth) to prompt_depth.shape[0]

Co-authored-by: Joshua Lochner <admin@xenova.com>

* fix torch_int/model_doc

* fix typo

* update PromptDepthAnythingImageProcessor

* fix typo

* fix typo for prompt depth anything doc

* update promptda overview image link of huggingface repo

* fix some typos in promptda doc

* Update image processing to include pad_image, prompt depth position, and related explanations for better clarity and functionality.

* add copy disclaimer for prompt depth anything image processing

* fix some format typos in image processing and conversion scripts

* fix nn.ReLU(False) to nn.ReLU()

* rename residual layer as it's a sequential layer

* move size compute to a separate line/variable for easier debug in modular prompt depth anything

* fix modular format for prompt depth anything

* update modular prompt depth anything

* fix scale to meter and some internal funcs warp

* fix code style in image_processing_prompt_depth_anything.py

* fix issues in image_processing_prompt_depth_anything.py

* fix issues in image_processing_prompt_depth_anything.py

* fix issues in prompt depth anything

* update converting script similar to mllamma

* update testing for modeling prompt depth anything

* update testing for image_processing_prompt_depth_anything

* fix assertion in image_processing_prompt_depth_anything

* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/model_doc/prompt_depth_anything.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/model_doc/prompt_depth_anything.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* update some testing

* fix testing

* fix

* add return doc for forward of prompt depth anything

* Update src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix prompt depth order

* fix format for testing prompt depth anything

* fix minor issues in prompt depth anything doc

* fix format for modular prompt depth anything

* revert format for modular prompt depth anything

* revert format for modular prompt depth anything

* update format for modular prompt depth anything

* fix parallel testing errors

* fix doc for prompt depth anything

* Add header

* Fix imports

* Licence header

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-03-20 16:12:44 +00:00
66291778dd Refactor Attention implementation for ViT-based models (#36545)
* Refactor vit attention

* Refactor ViT-based models

* 🚨🚨🚨 Fix prefix for DPT

* Update params order

* trigger tests

* Fix Dinov2 attention

* Fix DPT attention impl propagation for backbone config

* Common test fix: config is modif. inplace - avoid it

* view->reshape

* Fixup

* Fixup

* Enable IJepa FA2

* Add FA2 in corresponding model docs
2025-03-20 15:15:01 +00:00
730d2a52e7 DeepSpeed tensor parallel+ZeRO (#36825)
add ds tp change
2025-03-20 16:12:01 +01:00
1a374799ce Support loading Quark quantized models in Transformers (#36372)
* add quark quantizer

* add quark doc

* clean up doc

* fix tests

* make style

* more style fixes

* cleanup imports

* cleaning

* precise install

* Update docs/source/en/quantization/quark.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/quark_integration/test_quark.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* remove import guard as suggested

* update copyright headers

* add quark to transformers-quantization-latest-gpu Dockerfile

* make tests pass on transformers main + quark==0.7

* add missing F8_E4M3 and F8_E5M2 keys from str_to_torch_dtype

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-03-20 15:40:51 +01:00
ce091b1bda Use pyupgrade --py39-plus to improve code (#36843) 2025-03-20 14:39:44 +00:00
3e8f0fbf44 Fix hqq skipped modules and dynamic quant (#36821)
* Fix hqq skip_modules and dynamic_quant

* fix skipped modules loading

* add dynamic/skip HqqConfig test
2025-03-20 15:31:49 +01:00
055afdb6bb Fix ONNX export for sequence classification head (#36332)
* set dtype to int32

* fix style
2025-03-20 14:22:48 +00:00
487dab1b2b Shieldgemma2 (#36678)
* single commit

* correct config

* fixup

* dummy pt

* Use ShieldGemma2Config in conversion script

* Update src/transformers/models/shieldgemma2/configuration_shieldgemma2.py

* Adding shieldgemma2 to models.__init__.py

* Adding ShieldGemma2 to main __init__.py

* Update shieldgemma2.md

* Update shieldgemma2.md

* Adding tests. Addressing review feedback.

* Minor docs update

* Fixing code quality feedback from CI

* Fixing empty messages bug reported by ghunkins

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Ren Pang <ain-soph@live.com>
2025-03-20 15:14:38 +01:00
a63e92e2f0 Fix: remove the redundant snippet of _whole_word_mask (#36759)
remove the redundant snippet of _whole_word_mask
2025-03-20 14:10:43 +00:00
8124a234ca Gemma 3: Adding explicit GenerationConfig and refactoring conversion … (#36833)
Gemma 3: Adding explicit GenerationConfig and refactoring conversion script
2025-03-20 15:03:32 +01:00
cf8091c017 Fix import for torch 2.0, 2.1 - guard typehint for "device_mesh" (#36768)
* Fix device_mesh

* Remove rebase leftover
2025-03-20 11:55:47 +00:00
388e6659bf Update min safetensors bis (#36823)
* update setup.py

* style
2025-03-20 12:50:07 +01:00
b47d9b2f8a [generate] clarify docstrings: when to inherit GenerationMixin (#36605) 2025-03-20 10:58:54 +00:00
8e97b44087 [modular] Sort modular skips (#36304) 2025-03-20 10:55:12 +00:00
63380b77d4 Pass state dict (#35234)
* Pass state_dict argument to get_peft_model_state_dict

* Style fix

* Change arguments order
2025-03-20 11:54:59 +01:00
957b05b413 [qwen2 audio] remove redundant code and update docs (#36282) 2025-03-20 10:54:51 +00:00
f0d5b2ff04 Update deprecated Jax calls (#35919)
* Remove deprecated arguments for jax.numpy.clip.

* Remove deprecated arguments for jax.numpy.clip.

* Update jax version to 0.4.27 to 0.4.38.

* Avoid use of deprecated xla_bridge.get_backend().platform

Co-authored-by: Jake Vanderplas <jakevdp@google.com>

---------

Co-authored-by: Jake Vanderplas <jakevdp@google.com>
2025-03-20 11:51:51 +01:00
1ddb64937c Fix fp16 ONNX export for RT-DETR and RT-DETRv2 (#36460)
* Fix FP16 ONNX export

* Fix typo

* Sync omdet-turbo

* Refactor encoder for better readability

* Fix _no_split_modules

* Fix int -> torch_int

* Fix rt_detr

* Apply to rt-detr-v2

* Fixup

* Fix copies
2025-03-20 10:43:51 +00:00
e7337ee7be Pass num_items_in_batch directly to loss computation (#36753)
* Pass num_items_in_batch directly to loss computation

* use self loss instead

* fix loss kwrgs

* fix vocab size
2025-03-20 10:35:35 +00:00
8b479e39bb Saving Trainer.collator.tokenizer in when Trainer.processing_class is None (#36552)
* feat: Saving tokenizer in collator when processing_class is None

* chore: Style issue

* chore: Typo

* dbg: Check why test failed

* dbg: Remove logics and another test failed which successed before, so should be the stablibility issue

* test: Init unit-test

* chore: Style

* chore: Add err log

* fix: Case

* Update tests/trainer/test_trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* chore: Try to use get_regression_trainer

* fix: Impl and style

* fix: Style

* fix: Case

* fix: Import err

* fix: Missed import

* fix: Import block un-sorted problem

* fix: Try another tokenizer

* fix: Test logic

* chore: Light updates

* chore: Reformat

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-20 11:27:47 +01:00
3f03c379d2 fix tiktoken convert to pass AddedToken to Tokenizer (#36566)
* pass AddedToken to Tokenizer

* ruff

* handle dict for special tokens

* option: test tokenizer from tiktoken same as fast

* ruff

* ruff
2025-03-20 11:26:49 +01:00
8f64b177f6 [ForCausalLMLoss] allow users to pass shifted labels (#36607)
* [ForCausalLMLoss] allow users to pass shifted labels

Signed-off-by: Stas Bekman <stas@stason.org>

* style

Signed-off-by: Stas Bekman <stas@stason.org>

---------

Signed-off-by: Stas Bekman <stas@stason.org>
2025-03-20 11:25:22 +01:00
94555437e2 Disable inductor config setter by default (#36608)
* Disable inductor config setter by default

This is hard to debug and should be off by default

* remove default settings in autoquant too

* Add info to torchao.md about recommended settings

* satisfying Ruff format

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-20 11:23:14 +01:00
8733297b41 Fix swanlab global step (#36728)
* fix

* global step
2025-03-20 11:13:37 +01:00
b815fae359 Move the warning to the documentation for DataCollatorWithFlattening (#36707)
Remove init warning
2025-03-20 11:09:57 +01:00
9be4728af8 Just import torch AdamW instead (#36177)
* Just import torch AdamW instead

* Update docs too

* Make AdamW undocumented

* make fixup

* Add a basic wrapper class

* Add it back to the docs

* Just remove AdamW entirely

* Remove some AdamW references

* Drop AdamW from the public init

* make fix-copies

* Cleanup some references

* make fixup

* Delete lots of transformers.AdamW references

* Remove extra references to adamw_hf
2025-03-19 18:29:40 +00:00
51bd0ceb9e Update configuration_qwen2.py (#36735)
* Update configuration_qwen2_moe.py

* Update modeling_qwen2_moe.py

* ruff fmt

* docstring add qkv_bias
2025-03-19 18:15:54 +00:00
107fedc1e2 quick fix fast_image_processor register error (#36716)
* fix fast_image_processor register error

* update error message

* remove redundant import

* fix format
2025-03-19 18:05:45 +00:00
258dd9cc69 Add Space to Bitsandbytes doc (#36834)
* add space

* address review
2025-03-19 18:56:07 +01:00
f39f4960f3 Support tracable dynamicKVcache (#36311)
* Support tracable dynamicKVcache

* Fix lint

* More fine grained test

* Lint

* Update

* Update

* Fix up

* Apply suggestions from code review

* Update src/transformers/cache_utils.py

* Update tests/utils/test_cache_utils.py

* Apply suggestions from code review

* Update

* Change error message

* Rename

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

---------

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-19 16:52:30 +00:00
63c3116530 One more fix for reviewer assignment (#36829)
* one more fix

* one more fix

* Trigger tests
2025-03-19 16:25:24 +00:00
7c233980f4 [gemma 3] multimodal checkpoints + AutoModelForCausalLM (#36741) 2025-03-19 15:04:19 +00:00
b11050d6a2 enable OffloadedCache on XPU from PyTorch 2.7 (#36654)
* fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model

* follow Marc's suggestion to use _tie_weights to fix

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* enable OffloadedCache on XPU since PyTorch 2.7

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* don't change bart

Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>

* make code more concise per review comments

Signed-off-by: N <matrix.yao@intel.com>

* fix review comments

Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>

* Revert "fix review comments"

This reverts commit acf1484b86c7cc58b2dee69e7008c0eeb4c97b1b.

* fix review comments

Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>

* fix style

Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>
Signed-off-by: N <matrix.yao@intel.com>
Co-authored-by: root <root@a4bf01945cfe.jf.intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-19 15:15:52 +01:00
e8d960329e Add option for ao base configs (#36526) 2025-03-19 14:59:47 +01:00
fef8b7f8e9 Add attention visualization tool (#36630)
* add utils  fiel

* style

* nits

* nits

* update

* updaets

* update

* fix init issues

* big updates

* nits

* nits?

* small updates

* nites

* there were still some models left

* style

* fixes

* updates

* nits _ fixes

* push changes

* update

* update

* update

* Apply suggestions from code review

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* style

* styling and return a string for testing

* small updates

* always biderectional for now

* update

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-03-19 13:58:46 +01:00
0fe0bae0a8 [Generation] remove leftover code from end-to-end compilation (#36685) 2025-03-19 11:28:33 +00:00
a861db01e5 Fix Device map for bitsandbytes tests (#36800)
fix
2025-03-19 11:57:13 +01:00
b9374a0763 Remove dist": "loadfile" for pytest in CircleCI jobs (#36811)
* fasterrrrr

* avoid crash in example jobs

* avoid crash in TF example jobs

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-19 11:15:09 +01:00
4fa91b1be5 fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model (#36572)
* fix "Cannot copy out of meta tensor; no data!" issue for BartForConditionalGeneration model

* follow Marc's suggestion to use _tie_weights to fix

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix review comments.

Signed-off-by: N <matrix.yao@intel.com>

* fix quality

Signed-off-by: N <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: N <matrix.yao@intel.com>
2025-03-19 10:48:47 +01:00
706703bba6 Expectations test utils (#36569)
* Add expectation classes + tests

* Use typing Union instead of |

* Use bits to track score in properties cmp method

* Add exceptions and tests + comments

* Remove compute cap minor as it is not needed currently

* Simplify. Remove Properties class

* Add example Exceptions usage

* Expectations as dict subclass

* Update example Exceptions usage

* Refactor. Improve type name. Document score fn.

* Rename to DeviceProperties.
2025-03-18 23:39:50 +01:00
179d02ffb8 [generate] vectorized beam search (#35802) 2025-03-18 18:39:36 +00:00
12f2ebef63 Support custom dosctrings in modular (#36726)
* Override docstrings in modular if not none

* Update doc
2025-03-18 14:00:54 -04:00
Gar
00915d3041 Fix chameleon's TypeError because inputs_embeds may None (#36673)
* fix chameleon TypeError when inputs_embeds is None

* reformat

* hotfix
2025-03-18 18:59:30 +01:00
14b597f518 Fix casting dtype for qunatization (#36799)
* fix

* remove print
2025-03-18 18:46:03 +01:00
30580f035b Fix Mistral3 tests (#36797)
* fix processor tests

* fix modeling tests

* fix test processor chat template

* revert modeling test changes
2025-03-18 13:08:12 -04:00
db1d4c5a0b Loading optimizations (#36742)
* improvements

* Update modeling_utils.py

* add some doc about loading

* Update modeling_utils.py
2025-03-18 16:38:44 +01:00
7baf00089a Update SHA for tj-actions/changed-files (#36795)
* trigger

* trigger

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-18 16:19:39 +01:00
3017536ebf fix hqq due to recent modeling changes (#36771)
* fix-hqq

* style

* test
2025-03-18 12:20:27 +01:00
e959530b8f Add Mistral3 (#36790)
* initial start

* style and dummies

* Create convert_mistral3_weights_to_hf.py

* update

* typo

* typo

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* up

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* update

* update

* Update image_processing_mistral3.py

* Update convert_mistral3_weights_to_hf.py

* fix patch merger

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* up

* update modular to fit

* style

* Update convert_mistral3_weights_to_hf.py

* typo

* Update modular_mistral3.py

* simplify a lot all shape shenanigans

* simplify

* add working test processor

* Add partially working common modeling tests

* All tests working and remove mistral3 image processors

* add docs and fixup

* fix inference with image size >1540

* 🚨fix test image proc pixtral

* Remove vision_feature_select_strategy

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* Update convert_mistral3_weights_to_hf.py

* clean

* fix test checkpoints

* Update test_modeling_mistral3.py

* Update test_modeling_mistral3.py

* style

* Use Pixtral processor

* up

* finish cleaning processor to use pixtral directly

* Update __init__.py

* Update processing_pixtral.py

* doc

* Update __init__.py

* Update mistral3.md

* Update _toctree.yml

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: yonigozlan <yoni.gozlan10@gmail.com>
2025-03-18 12:04:42 +01:00
bd92073692 Fix gemma3_text tokenizer in mapping (#36793) 2025-03-18 11:50:22 +01:00
7426d02ea8 Fixing typo in gemma3 image_processor_fast and adding a small test (#36776)
Co-authored-by: zebz13 <zeb@fedora>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-18 11:35:06 +01:00
19b9d8ae13 chore: fix typos in tests directory (#36785)
* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory

* chore: fix typos in tests directory
2025-03-18 10:31:13 +01:00
7f5077e536 fix typos in the tests directory (#36717) 2025-03-17 17:45:57 +00:00
cbfb8d7b27 doc: Clarify is_decoder usage in PretrainedConfig documentation (#36724)
* fix: clarify decoder usage in PretrainedConfig documentation

* Apply suggestions from code review

updated doc

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-03-17 09:40:25 -07:00
ac1a1b66b9 [docs] Update README (#36265)
* update

* feedback

* feedback

* update versions
2025-03-17 09:37:19 -07:00
cff4caa0c1 [CI] remove redundant checks in test_eager_matches_sdpa_inference (#36740) 2025-03-17 16:29:18 +00:00
e3af4fec91 [MINOR:TYPO] Update hubert.md (#36733)
* [MINOR:TYPO] Update hubert.md

- typo fix (wave2vec instead of hubert)
- make code snippet copiable and runnable

* Run tests
2025-03-17 09:07:51 -07:00
c8a2b25f91 Fix TrainingArguments.torch_empty_cache_steps post_init check (#36734)
Mistaken use of De Morgan's law. Fixed "not (X or Y)"
to correct "not (X and Y)" check to raise a ValueError.

Added corresponding test to check "positive int or None" condition.

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-03-17 16:09:46 +01:00
8e67230860 Fix test isolation for clear_import_cache utility (#36345)
* test fixup

* test fixup

* fixing tests for unused imports

* style fixes

* fix

* style fixes

* styke fix

* remove isolated module cache

* rm custom subprocess defination

* run using exsiting fn

* style fixup

* make fixup

* remove redundant comments

* rm redundat skipif + style changes
2025-03-17 16:09:09 +01:00
27361bd218 fix xpu tests (#36656)
* fix awq xpu tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix llava next video bnb tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-17 15:57:49 +01:00
da7d64f4ff Allow ray datasets to be used with trainer (#36699)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-17 15:44:47 +01:00
2256875a77 fix can_generate (#36570)
* fix can_generate

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix can generate for speecht5 and blip

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix speecht5 tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-03-17 14:56:18 +01:00
9e94801146 enable/disable compile for quants methods (#36519)
* disable compile for most quants methods

* fix

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update tests/quantization/bnb/test_mixed_int8.py

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* changes from joao suggestions

---------

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-17 11:38:21 +01:00
c53d53da89 🚨🚨🚨 Fix sdpa in SAM and refactor relative position embeddings (#36422)
* fall back to eager if output_attentions

* improve relative position embeddings

* run modular on got_ocr2

* run-slow: sam

* fix run-length encoding

* fix tf processor errors

* update tf_sam

* fix compile error

* re-run tests
2025-03-17 09:39:52 +00:00
fc8764c9a6 [Generation, Gemma 3] When passing a custom generation_config, overwrite default values with the model's base generation_config (#36684) 2025-03-15 12:40:09 +00:00
f263e88dcf Update self-push-caller.yml 2025-03-15 11:32:04 +01:00
6f3e0b68e0 Fix grad accum arbitrary value (#36691) 2025-03-14 22:03:01 +01:00
2c2495cc7b Fix post_init() code duplication (#36727)
* Update modeling_utils.py

* CIs
2025-03-14 17:36:02 +01:00
25992b493c 🌐 [i18n-KO] Translated codegen.md to Korean (#36698)
* Initial translation

* Add _toctree.yml
2025-03-14 09:31:18 -07:00
42ebb6c23e [tests] Parameterized test_eager_matches_sdpa_inference (#36650) 2025-03-14 14:41:27 +00:00
9215cc62d4 Try working around the processor registration bugs (#36184)
* Try working around the processor registration bugs

* oops

* Update error message

* Clarify error

* Docstring docstring docstring

* The extra content is indexed by config class, so let's grab some values out of there

* Commit my confusion as a TODO

* Resolve my confusion

* Cleanup and mostly revert to the original

* Better autoclass fallback

* Don't nest f-strings you lunatic

* Clearer error message

* Less getattr()

* Revert a lot of changes to try a different approach!

* Try the global registry

* Check the dynamic list as well as the transformers root

* Move the dynamic list somewhere safer

* Move the dynamic list somewhere even safer

* More import cleanup

* Simplify all the register_for_auto_class methods

* Set _auto_class in the register() methods

* Stop setting the cls attribute in register()

* Restore specifying the model class for Model derivatives only

* Fix accidentally taking the .__class__ of a class

* Revert register_for_auto_class changes

* Fix get_possibly_dynamic_module

* No more ALL_CUSTOM_CLASSES

* Fix up get_possibly_dynamic_module as well

* Revert unnecessary formatting changes

* Trigger tests
2025-03-14 13:56:21 +00:00
691d1b52c3 Fix/best model checkpoint fix (#35885)
* Set best_model_checkpoint only when ckpt exists.

Rather than set it explicitly without checking if the checkpoint directory even exists as before, now we moved the setting logic inside of _save_checkpoint and are only setting it if it exists.

* Added best_global_step to TrainerState.

* Added tests for best_model_checkpoint.

* Fixed hard-coded values in test to prevent fail.

* Added helper func and removed hard-coded best_step.

* Added side effect patch generator for _eval.

* Added evaluate side effect func.

* Removed erroneous patching.

* Fixed minor bug.

* Applied Ruff.

* Fixed Ruff problem in make style.

* Used Trainer.set_initial_training_values.
2025-03-14 14:24:53 +01:00
3bd1a0ddf1 [model loading] don't gc.collect() if only 1 shard is used (#36721)
* don't gc collect if 1 shard is used

* delete state dict anyways
2025-03-14 12:56:56 +00:00
8cb522b419 Cleanup the regex used for doc preprocessing (#36648)
* Cleanup the regex used for doc preprocessing

* Run tests
2025-03-14 12:18:49 +00:00
72861e11eb Make the flaky list a little more general (#36704)
* Make the flaky list a little more general

* Trigger tests

* Make the flaky list a little more general
2025-03-14 12:15:32 +00:00
53742b11f5 Gemma3 processor typo (#36710)
* fix typo when  is on

* tiny

* add test and remove 'text_crops'

* lint
2025-03-14 13:07:55 +01:00
69bc848480 Add support for fast image processors in add-new-model-like CLI (#36313)
* add support for fast image processors in add-new-model-like

* fix header not found add-fast-image-processor-cli

* Encourage adding fast image processor

* nit

* start improve doc

* update docs

* make requested modifs
2025-03-13 14:16:37 -04:00
48ef468c74 Final CI cleanup (#36703)
* make fixup

* make fixup

* Correct skip decorator

* Add TODOs

* add is_flaky() parentheses
2025-03-13 17:26:09 +00:00
b070025aa6 Add GGUF support to T5-Encoder (#36700)
* add gguf support to t5encoder

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove gguf from model_kwargs

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-13 17:57:33 +01:00
4a60bae8e2 Handling an exception related to HQQ quantization in modeling (#36702)
* adding exception

* style

* add types
2025-03-13 17:53:36 +01:00
09a309d273 fix: fsdp sharded state dict wont work for save_only_model knob (#36627)
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-13 17:17:35 +01:00
2a004f9ff1 Add loading speed test (#36671)
* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* trigger CIs

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* better error messages

* Update test_modeling_utils.py

* Update test_modeling_utils.py
2025-03-13 17:07:30 +01:00
a3201cea14 [CI] Automatic rerun of certain test failures (#36694) 2025-03-13 15:40:23 +00:00
d84569387f chore: fix typos in utils module (#36668)
* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module

* chore: fix typos in utils module
2025-03-13 15:12:44 +00:00
32c95bd847 Fix dtype for params without tp_plan (#36681)
* Update tensor_parallel.py

* CIs
2025-03-13 15:28:14 +01:00
bb965d8e87 fix type annotation for ALL_ATTENTION_FUNCTIONS (#36690)
Corrects the type annotation to match actual usage. The variable was typed as
Dict[str, Dict[str, Callable]] but is actually used as Dict[str, Callable]
where keys are attention mechanism names and values are the corresponding
attention functions directly. This change makes the type annotation consistent
with how the dictionary is used in the codebase.
2025-03-13 14:27:50 +00:00
1c287aecfc Change Qwen2_VL image processors to have init and call accept the same kwargs (#36207)
Change qwen2VL image processors to have init and call accept the same kwargs
2025-03-13 10:15:17 -04:00
65b8e38aac Upgrading torch version and cuda version in quantization docker (#36264)
* update

* small update

* no spqr quant

* testing

* testing

* test nightly

* gptqmodel

* flute

* fix hadamard

* running tests

* new docker

* fix docker

* run tests

* testing new docker

* new docker

* run tests

* new docker

* run tests

* final test

* update

* update

* run tests

* new docker

* launch tests

* test_docker

* running tests

* add comments

* fixing yml

* revert
2025-03-13 12:39:16 +01:00
87b30c3589 fix wandb hp search unable to resume from sweep_id (#35883)
* fix wandb hp search unable to resume from sweep_id

* format styles

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-03-13 12:32:26 +01:00
47cc4da351 Changing the test model in Quanto kv cache (#36670)
changing model
2025-03-13 12:23:34 +01:00
bc3d5781e7 Fix slicing for 0-dim param (#36580)
* fix

* switch to ellipsis instead

* Add co-author
Co-authored-by: fxmarty-amd <fxmarty-amd@users.noreply.github.com>

* Add co-author second try
Co-authored-by: fxmarty-amd <felmarty@amd.com>
2025-03-13 12:16:13 +01:00
fbb18ce68b Update config.torch_dtype correctly (#36679)
* fix

* style

* new test
2025-03-13 12:08:02 +01:00
c4161238bd [Cache] Don't initialize the cache on meta device (#36543) 2025-03-13 10:13:29 +00:00
79254c9b61 Fix rescale normalize inconsistencies in fast image processors (#36388)
* fix fused rescale normalize inconsistencies

* fix siglip2 fast image processor

* refactor kwargs validation and fused nirmalize rescale

* cleanup kwargs handling in preprocess

* update new procs after refactor
2025-03-12 23:18:34 -04:00
48292a9848 Refactor siglip2 fast image processor (#36406)
* refactor siglip2 fast image processor, add unused_kwargs in base fast image processor

* nits

* change unused_kwargs default to None

* update siglip2 fast image proc
2025-03-12 20:28:27 -04:00
ea219ed164 Remove differences between init and preprocess kwargs for fast image processors (#36186)
* Remove differences between init and preprocess kwargs in fast image processors

* make modifs got_ocr2

* update gemma3
2025-03-12 19:44:05 -04:00
cc3a361b46 [quants] refactor logic for modules_to_not_convert (#36672) 2025-03-12 23:43:30 +01:00
bc3253f076 Remove hardcoded slow image processor class in processors supporting fast ones (#36266)
* Add fast image processor class to processors supporting them

* fix test kosmos2
2025-03-12 18:39:25 -04:00
0013ba61e5 Fix Failing GPTQ tests (#36666)
fix tests
2025-03-12 20:03:02 +01:00
c7eb95581a Don't accidentally mutate the base_model_tp_plan (#36677)
* Don't accidentally mutate the base_model_tp_plan

* Co-authored by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Trigger tests

* Marking grad accum test as slow

* Add a flaky decorator

* Add a flaky decorator

* Use cyril's codeblock

* Don't copy() when it's None

* Use cyril's new codeblock

* make fixup
2025-03-12 18:59:13 +00:00
071a161d3e [core] Large/full refactor of from_pretrained (#36033)
* squash everything together
start to simplify inner logic

Update modeling_utils.py

Update modeling_utils.py

Update modeling_utils.py

Update modeling_utils.py

continue refactor

fix

small fixes

add type hints/docstring

Update modeling_utils.py

remove _fast_init

keep improving

Update modeling_utils.py

Update modeling_utils.py

new first tp loading version

style

fix weird in-place op

trigger CIs

Update modeling_utils.py

much clearer renaming of keys

fix

update

Update test_modeling_common.py

trigger CIs

update

update

style

Update modeling_utils.py

Update modeling_utils.py

Update modeling_utils.py

fix

fast download first prototype

remove old function

remove old functions

Remove unused function and move back _get_tp_registry

fix tp plan registry

simplify

CIs

Update hub.py

Update modeling_utils.py

simplify

simplify renaming logic

remove unused check

add sanity check back (a test depends on it)

Update modeling_utils.py

finalize sound renaming logic

style

add forgotten check

Update modeling_utils.py

add key_mapping keyword

style

Update modeling_utils.py

add comment

minor updates

minor change for clarity

fix small prefix issue and simplify

style

trigger CIs

typo fix

Post rebase fix

post rebase cleanup

simplify tp

typo

oupsi

typo

correctly escape

improvements based on Marc's review

finalize Marc's review comments

 squash everything

* improve

* Update modeling_utils.py

* Update modeling_utils.py

* fix

* Update modeling_utils.py

* Update modeling_utils.py

* style

* Update modeling_utils.py

* simplify

* style

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* fix dtype issue

* Update modeling_utils.py

* style

* remove test that does not make sense

* style

* small fixes

* style

* fix

* cleanup after rebase

* style

* typo

* escape

* tp for task specific top modules

* Update modeling_utils.py

* Update modeling_utils.py

* fix allocation

* CIs

* CIs

* CIs

* improve docstring

* CIs

* Update modeling_utils.py

* fix
2025-03-12 13:39:25 +01:00
7652804d23 Fix bnb regression due to empty state dict (#36663)
fix
2025-03-12 11:40:46 +01:00
994cad2790 [CI] gemma 3 make fix-copies (#36664)
* make fixup

* trigger ci
2025-03-12 10:35:13 +00:00
2829013d2d fix block mask typing (#36661)
* fix block mask typing

* updated

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* gemma

* fix

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-03-12 11:29:11 +01:00
89f6956015 HPU support (#36424)
* test

* fix

* fix

* skip some and run some first

* test fsdp

* fix

* patches for generate

* test distributed

* copy

* don't test distributed loss for hpu

* require fp16 and run first

* changes from marc's PR fixing zero3

* better alternative

* return True when fp16 support on gaudi without creating bridge

* fix

* fix tested dtype in deepspeed inference test

* test

* fix

* test

* fix

* skip

* require fp16

* run first fsdp

* Apply suggestions from code review

* address comments

* address comments and refactor test

* reduce precison

* avoid doing gaudi1 specific stuff in the genreation loop

* document test_gradient_accumulation_loss_alignment_with_model_loss test a bit more
2025-03-12 09:08:12 +01:00
50d3530aa0 Gemma3 (#36658)
* Fix converter

* [Broken] Adds Gemma 3 to Hugging Face Transformers

* Consolidating Config and Processor params across impls

* Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right.

* Additional plumbing for CausalLM and ConditionalGeneration variants

* incomplete draft of Orbax conversion script

* More complete checkpoint conversion

* Supporting Gemma 3 1B checkpoints

* Updating RoPE for multiple frequencies

* Adjustments to rotary embedder

* Proof of life for text-only operation

* Updating the conversion script to handle multimodal projection weights

* Fixing tet-only conversions

* Cleaner conversion script with multimodal support and a simpler processor

* Additional refatcors to the Gemma3Processor

* Simplified Processor to work over text representations

* Updated conversion script to join text and vision embeddings at converion time

* Logging for debugging

* Update src/transformers/models/gemma2/modeling_gemma2.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Removed extraneous Config params

* Switching to fast tokenizer for checkpoint conversions

* isolating siglip for performance tetsing

* Minor changes for debugging tests against baselines

* Adding average pooling for soft tokens

* Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts

* Updating conversion script for ShieldGemma 2 conversion compatibility

* Allow disable_compile to be provided as a kwarg

* Refresh from modular

* Updated conversion script and corrected sliding window

* Fix type mismatch in cache_position (#4)

* Fix dtype (#5)

* Fix type mismatch in cache_position

* Actually fix in the modular file

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

---------

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor

* Adding 2D pooling for image embeddings

* Revert "Adding 2D pooling for image embeddings"

This reverts commit 65350cf531296f050b2078a5b8e46f61642b2648.

* Gemma3 average pooling changed from 1D to 2D

* Major refactor to Gemma3MultimodalInputProjection

* Updating Gemm 3 Auto* registrations

* Add option to save Gemma 3 chat template with tokenizer during weights conversion

* Removing unused imports

* Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration

* Removing duplicate config property

* Removing final logit softcapping and 1-indexing of position ids

* Fixing image processor config and none --> None typo

* Fixing sliding window size for 1B

* Updating image_mean and image_std in Image Processor

* Attention masking changed to lower triangular

* Moving image special tokens to conversion script

* Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs

* Remove special token variables from symbol space

* Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration

* tie lm_head and embedding weights

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Correct tied weights in Gemma3CausalLM

* iterative bidirectional attention

* resolving merge conflicts

* Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6

* Correcting RoPE scaling

* clean up first pass, dummy model geenration works

* final clean up before fixing tests

* causal lm test works, so fine

* Fix conversion

* Update src/transformers/models/gemma3/processing_gemma3.py

* model tests are happy

* processor tests are happy

* image processing tests added

* fixup

* Fix pre-processing in conversion

* Inputs merging

* Do not normalize vision embeddings

* Apply Ryan's (and team) changes to attention

* token type ids + mask

* template

* move embed scale, add rope scale, fix tests

* Add chat template to tokenizer

* Use prefix for causal model loading

* use existing code for sliding mask from gemma2

* self.embed_tokens already normalizes

* Correcting Gemma3TextConfig parameters in conversion script

* typo, modular overwrites my fixes

* enable device map for text model

* Conversion updates

* ultra nit: no einsums

* update image token

* copy deepcopy config + some docs

* add some test, still WIP

* Refactoring --include_chat_tempalte logic in converter

* Update src/transformers/models/gemma3/modular_gemma3.py

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Add eos tokens for instruct models

* dump so i can work on dgx

* Removing add_bos by default

* dump

* add fast im proc

* docs for PaS + fixup

* another fixup

* one more fixup

* fix tests

* Inverting prior BOS change

* ultra nit

* Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS

* resize embeds, remove sqrt, add slow test outputs

* FA2 but quality is meh

* nit

* skip FA2, no idea what happened

* last bit for green CI

* please, green CI for docs

* T_T

* Fix for Gemma3 logits

* Support both options for system prompt

* Update src/transformers/models/gemma3/image_processing_gemma3_fast.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/gemma3.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Docs updates now that assets are live

* Style fixes

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Lysandre <hi@lysand.re>
2025-03-12 09:06:17 +01:00
81aa9b2e07 fix typos in the docs directory (#36639)
* chore: fix typos in the docs directory

* chore: fix typos in the docs directory

* chore: fix typos in the docs directory
2025-03-11 09:41:41 -07:00
cb384dcd7a Fix gguf docs (#36601)
* update

* doc

* update

* Update docs/source/en/gguf.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-03-11 15:29:14 +01:00
1e4286fd59 Remove research projects (#36645)
* Remove research projects

* Add new README to explain where the projects went

* Trigger tests

* Cleanup all references to research_projects
2025-03-11 13:47:38 +00:00
ed1807bab3 [docs] Update docs dependency (#36635)
update
2025-03-11 13:42:49 +00:00
b80b3ec529 Stop warnings from unnecessary torch.tensor() overuse (#36538) 2025-03-11 13:41:13 +00:00
556d2c23c6 Remove remote code warning (#36285)
* Remove redundant pipeline warning

* Remove redundant pipeline warning
2025-03-11 13:29:15 +00:00
b1a51ea464 Fix AriaForConditionalGeneration flex attn test (#36604)
AriaForConditionalGeneration depends on idefics3 vision transformer which does not support flex attn
2025-03-11 11:05:49 +01:00
d126f35427 Proper_flex (#36643)
* proper performant flex attention implementation

* wrapper for flex attention to compile only when triggered

* wrapper for flex attention to compile only when triggered

* attention mask type detection

* Update src/transformers/integrations/flex_attention.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* nit

* nit

* nit

* nit

* gemma2 support

* add citation for torchtune

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update flex_attention.py

* nit

* nit

* nit

* reset gemma2 modifications

* nit

* nit

* nit

* licencing

* apply changes to other models

* safe import

---------

Co-authored-by: Sung Ching Liu <sunny19981005@outlook.com>
Co-authored-by: Sung Ching Liu <22844540+bursteratom@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-03-11 10:24:12 +01:00
d8663cb8c5 Fix bugs in mllama image processing (#36156)
* fix: handle input_channel_dim == channels_last

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

* fix: default PIL images to channels_last

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fixup from review batch

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

* test: add 1x1 PIL image to ambiguous channel test

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

* fix(mllama): avoid 0 dimension for image with impractical aspect ratio

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-03-11 10:22:48 +01:00
1c4b62b219 Refactor some core stuff (#36539)
* some config changes

* update

* current state

* update

* update

* updates and cleanup

* something that works

* fixup

* fixes

* nits

* nit

* nits and fix

* Update src/transformers/integrations/tensor_parallel.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/integrations/tensor_parallel.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* cleanup

* style

* safe import

* fix

* updates

* rename stuff an clean

* style

* small updates

* ups

* oups

* nit

* protect imports

* update tp

* rodfl

* arf

* turbo nit on init

* fix import error

* frumble gumbgle

* try to fix the import error

* should fix the non model test

* update keep in float32

* update

* fix

* nits

* fix subvconfigs

* test was weird

* nit

* fix failing test

* fix instruct blip

* fixes

* style

* x.com

* fix overwrite

* ok last bit of failing test

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2025-03-11 09:26:28 +01:00
e9756cdbc7 [docs] Serving LLMs (#36522)
* initial

* fix

* model-impl
2025-03-10 13:14:19 -07:00
af9b2eaa54 chore: fix typos in language models (#36586)
* chore: fix typos in language models

* chore: fix typos in mistral model

* chore: fix model copy from issue

* chore: fix model copy from issue

* chore: fix model copy from issue

* chore: fix model copy from issue

* chore: fix model copy from issue
2025-03-10 15:54:49 +00:00
a929c466d0 Fix auto-assign reviewers (#36631)
* Fix auto-assign reviewers

* Clean up endanchor a bit

* We don't actually need the end anchor at all
2025-03-10 15:52:13 +00:00
858545047c [HybridCache] disable automatic compilation (#36620) 2025-03-10 09:24:26 +00:00
94ae1ba5b5 Fix check for XPU. PyTorch >= 2.6 no longer needs ipex. (#36593) 2025-03-07 14:09:35 +00:00
a1cf9f3390 Fixed datatype related issues in DataCollatorForLanguageModeling (#36457)
Fixed 2 issues regarding `tests/trainer/test_data_collator.py::TFDataCollatorIntegrationTest::test_all_mask_replacement`:
1. I got the error `RuntimeError: "bernoulli_tensor_cpu_p_" not implemented for 'Long'`. This is because the `mask_replacement_prob=1` and `torch.bernoulli` doesn't accept this type (which would be a `torch.long` dtype instead. I fixed this by manually casting the probability arguments in the `__post_init__` function of `DataCollatorForLanguageModeling`.
2. I also got the error `tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute Equal as input #1(zero-based) was expected to be a int64 tensor but is a int32 tensor [Op:Equal]` due to the line `tf.reduce_all((batch["input_ids"] == inputs) | (batch["input_ids"] == tokenizer.mask_token_id))` in `test_data_collator.py`. This occurs because the type of the `inputs` variable is `tf.int32`. Solved this by manually casting it to `tf.int64` in the test, as the expected return type of `batch["input_ids"]` is `tf.int64`.
2025-03-07 14:09:27 +00:00
4fce7a0f0f Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/decision_transformer (#36582)
Bump jinja2 in /examples/research_projects/decision_transformer

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-07 13:35:59 +00:00
f2fb41948e Update "who to tag" / "who can review" (#36394)
update who to tag
2025-03-07 13:09:31 +00:00
1b9978c360 Update chat_extras.md with content correction (#36599)
Update chat_extras.md - content

Fixed a typo in the content, that may confuse the readers.
2025-03-07 13:09:02 +00:00
f2e197c30a Github action for auto-assigning reviewers (#35846)
* First draft of github action on PR opening for auto-assigning reviewers

* fix missing import

* Don't reassign reviewers if we already have them

* Temporarily comment out the opened line so we can test the script

* Correct path for codeowners file

* Update workflow permissions

* Update workflow permissions

* Update debug logs

* Strip inline comments

* Remove prefix

* Request reviews instead of assigning

* Request reviews instead of assigning

* Add TODO

* Use pull-request-target instead

* Update the script

* Set back to pull_request for testing

* Set to pull_request_target, testing works!

* Add licence

* Tighten up one of the globs

* Refactor things to be a bit less convoluted

* Only assign reviewers when marked ready for review
2025-03-07 12:18:49 +00:00
8a16edce67 Export base streamer. (#36500)
* Export base streamer. 

Previously, the base streamer class was not exported so the set of available streamers was fixed to 3 streamer classes. 

This change makes it so that customers may extend the default base streamer class.

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2025-03-07 11:16:09 +00:00
6f775970c7 avoid errors when the size of input_ids passed to PrefixConstrainedLogitsProcessor is zero (#36489)
* avoid errors when the size of `input_ids` passed to PrefixConstrainedLogitsProcessor is zero

* use more reasonable process

* avoid early return

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-03-07 11:02:49 +00:00
51ed61e2f0 Mention UltraScale Playbook 🌌 in docs (#36589) 2025-03-06 14:48:11 -08:00
159445d044 fix: argument (#36558)
752ef3fd4e/utils/modular_model_converter.py (L1729)
2025-03-06 13:11:19 -08:00
5275ef6f3d [XGLM] tag tests as slow (#36592)
these tests should be slow
2025-03-06 17:54:41 +00:00
c1b24c0b73 [bark] fix loading of generation config (#36587) 2025-03-06 16:55:19 +00:00
0440dbc0e1 Integrate SwanLab for offline/online experiment tracking and local visualization (#36433)
* add swanlab integration

* feat(integrate): add SwanLab as an optional experiment tracking tool in transformers

- Integrated SwanLab into the transformers library as an alternative for experiment tracking.
- Users can now log training metrics, hyperparameters, and other experiment details to SwanLab by setting `report_to="swanlab"` in the `TrainingArguments`.
- Added necessary dependencies and documentation for SwanLab integration.

* Fix the spelling error of SwanLabCallback in callback.md

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Fix typo in comment

* Fix typo in comment

* Fix typos and update comments

* fix annotation

* chore: opt some comments

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: AAssets <20010618@qq.com>
Co-authored-by: ZeYi Lin <944270057@qq.com>
Co-authored-by: KAAANG <79990647+SAKURA-CAT@users.noreply.github.com>
2025-03-06 17:35:30 +01:00
bc30dd1efb Modular Conversion --fix_and_overwrite on Windows (#36583)
* Modular Conversion --fix_and_overwrite on Windows

* -newline on read
2025-03-06 13:12:30 +00:00
9e385109cf Delete redundancy if case in model_utils (#36559)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-03-06 11:36:11 +00:00
acc49e390d Bump transformers from 4.38.0 to 4.48.0 in /examples/research_projects/pplm (#36540)
Bump transformers in /examples/research_projects/pplm

Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.48.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.48.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-06 11:35:47 +00:00
9e84b38135 chore: enhance message descriptions in parameters,comments,logs and docstrings (#36554)
* chore: enhance message descriptons in parameters,comments,logs and docstrings

* chore: enhance message descriptons in parameters,comments,logs and docstrings

* Update src/transformers/hf_argparser.py

* Update src/transformers/keras_callbacks.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-03-06 11:02:35 +00:00
6966fa1901 Fix typos . (#36551)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-03-05 16:31:43 -08:00
996f512d52 Fix typos in tests (#36547)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-03-05 15:04:06 -08:00
752ef3fd4e guard torch version for uint16 (#36520)
* u16

* style

* fix
2025-03-05 11:27:01 +01:00
66f29aaaf5 chore: enhance messages in docstrings (#36525)
chore: enhance the message in docstrings
2025-03-04 16:31:20 +00:00
89d27fa6ff Fix links in quantization doc (#36528)
fix quantization doc
2025-03-04 16:43:03 +01:00
c0c5acff07 Fix bamba tests amd (#36535) 2025-03-04 15:24:27 +01:00
37508816d6 chore: Fix typos in docs and examples (#36524)
Fix typos in docs and examples

Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-03-04 13:47:41 +00:00
84f0186e89 Add aya (#36521)
* initial commit

* small fix

* move stuff to image processing file

* remove stuff in validate turn and fix return tensor

* remove liquid stuff

* in the process of addressing comments

* changes to get the right tokenization

* new __init__ works

* fixing defulat std and mean

* works

* small testing scipt -- to be deleted before merge

* remove redundant code

* addressing comments

* fix inits, add docs templates

* refactor processor, switch to gotocr image processor

* remove image proc from init

* refactor to working llava-style architecture

* Change AyaVisionModel to AyaVisionForConditionalGeneration

* add tests

* fixups

* update doc

* Adding logits_to_keep explicitly in ayavision forward to enable compatibility with cohere model

* better variable names + remove code paths

* Updates to aya_vision.md

* address comments

* adding copied from

* make style and remove unused projector_hidden_act from config

* sort init

* include usage of fast image proc and proc on cuda in doc

* update checkpoint iin test processor

* update checkpoint in test processor 2

* remove test_model and update docstring

* skip failing tests

---------

Co-authored-by: Saurabh Dash <saurabh@cohere.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-03-04 12:24:33 +01:00
c0f8d055ce [docs] Redesign (#31757)
* toctree

* not-doctested.txt

* collapse sections

* feedback

* update

* rewrite get started sections

* fixes

* fix

* loading models

* fix

* customize models

* share

* fix link

* contribute part 1

* contribute pt 2

* fix toctree

* tokenization pt 1

* Add new model (#32615)

* v1 - working version

* fix

* fix

* fix

* fix

* rename to correct name

* fix title

* fixup

* rename files

* fix

* add copied from on tests

* rename to `FalconMamba` everywhere and fix bugs

* fix quantization + accelerate

* fix copies

* add `torch.compile` support

* fix tests

* fix tests and add slow tests

* copies on config

* merge the latest changes

* fix tests

* add few lines about instruct

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* "to be not" -> "not to be" (#32636)

* "to be not" -> "not to be"

* Update sam.md

* Update trainer.py

* Update modeling_utils.py

* Update test_modeling_utils.py

* Update test_modeling_utils.py

* fix hfoption tag

* tokenization pt. 2

* image processor

* fix toctree

* backbones

* feature extractor

* fix file name

* processor

* update not-doctested

* update

* make style

* fix toctree

* revision

* make fixup

* fix toctree

* fix

* make style

* fix hfoption tag

* pipeline

* pipeline gradio

* pipeline web server

* add pipeline

* fix toctree

* not-doctested

* prompting

* llm optims

* fix toctree

* fixes

* cache

* text generation

* fix

* chat pipeline

* chat stuff

* xla

* torch.compile

* cpu inference

* toctree

* gpu inference

* agents and tools

* gguf/tiktoken

* finetune

* toctree

* trainer

* trainer pt 2

* optims

* optimizers

* accelerate

* parallelism

* fsdp

* update

* distributed cpu

* hardware training

* gpu training

* gpu training 2

* peft

* distrib debug

* deepspeed 1

* deepspeed 2

* chat toctree

* quant pt 1

* quant pt 2

* fix toctree

* fix

* fix

* quant pt 3

* quant pt 4

* serialization

* torchscript

* scripts

* tpu

* review

* model addition timeline

* modular

* more reviews

* reviews

* fix toctree

* reviews reviews

* continue reviews

* more reviews

* modular transformers

* more review

* zamba2

* fix

* all frameworks

* pytorch

* supported model frameworks

* flashattention

* rm check_table

* not-doctested.txt

* rm check_support_list.py

* feedback

* updates/feedback

* review

* feedback

* fix

* update

* feedback

* updates

* update

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-03-03 10:33:46 -08:00
6aa9888463 Remove unused code (#36459) 2025-03-03 18:31:10 +00:00
9fe82793ee [Style] fix E721 warnings (#36474)
* fix E721 warnings

* config.hidden_size is not a tuple

* fix copies

* fix-copies

* not a tuple

* undo

* undo
2025-03-03 18:03:42 +00:00
1975be4d97 Fix edge case for continue_final_message (#36404)
* Fix edge case for continue_final_message

* lstrip() correctly

* Add regression test

* Add a clearer error message when the final message is not present

* Add a clearer error message when the final message is not present

* Fix massive bug!
2025-03-03 18:03:03 +00:00
2aff938992 Fix pipeline+peft interaction (#36480)
* Fix pipeline-peft interaction

* once again you have committed a debug breakpoint

* Remove extra testing line

* Add a test to check adapter loading

* Correct adapter path

* make fixup

* Remove unnecessary check

* Make check a little more stringent
2025-03-03 18:01:43 +00:00
28159aee63 chore: fix message descriptions in arguments and comments (#36504)
chore: fix messagedescriptions in arguments and comments
2025-03-03 17:54:57 +00:00
acb8586dd9 Fix some typos in docs (#36502)
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-03-03 17:53:53 +00:00
0463901c92 fix torch_dtype, contiguous, and load_state_dict regression (#36512)
* fix regression

* fix param

* fix load_state_dict

* style

* better fix for module

* fix tests

* quick fix for now

* rm print
2025-03-03 18:35:37 +01:00
3e83ee75ec Fix kwargs UserWarning in SamImageProcessor (#36479)
transformers/image_processing_utils.py:41: UserWarning: The following named arguments are not valid for `SamImageProcessor.preprocess` and were ignored: 'point_pad_value'
2025-03-03 16:23:34 +00:00
9e3a1072c2 Check TRUST_REMOTE_CODE for RealmRetriever for security (#36511)
* fix

* repush

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-03-03 15:08:12 +01:00
4d8259d245 Fix loading zero3 weights (#36455)
* Check if fixes

* Fix zero3 loading

* Quality

* Fix marc nit

* Add fast tests

* Migrate to integrations.deepspeed rather than modeling_utils

* Style
2025-03-03 15:05:58 +01:00
dcbdf7e962 Fix _load_state_dict_into_meta_model with device_map=None (#36488)
* Fix _load_state_dict_into_meta_model with device_map=None

* Update src/transformers/modeling_utils.py
2025-03-02 08:33:36 +01:00
a40f1ac602 Fix couples of issues from #36335 (#36453)
* fix

* style

* better allocation

* fix

* fix

* style

* revert disk

* exit

* style

* return if nothing to cache

* dtensor guard

* fix regressiion

* fix regression

* fix

* fix
2025-03-01 07:12:17 +01:00
2c5d038f92 Add Got-OCR 2 Fast image processor and refactor slow one (#36185)
* refactor image processor slow got ocr

* add working image processor fast

* fix fast image processor, update doc

* use one big loop for processing patches
2025-03-01 00:56:00 -05:00
51083d1bac [docs] fix bug in deepspeed config (#36081)
bug fix
2025-02-28 07:09:54 -08:00
02776d2c6a Fix loading models with mismatched sizes (#36463)
* Fix loading model with mismatched sizes

* trigger tests
2025-02-28 11:48:59 +01:00
222505c7e4 [GroundingDino] Fix grounding dino loss 🚨 (#31828)
* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher

* More updates

* More updates

* fixed: GroundingDinoLoss

* fixed: failing tests

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Addressed comments

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* add: cardinality loss and make box loss as copy from

* change: default for reduction loss is sum

* fix: vectorized generate fake box

* fix copies

* Addressed comments

* addressed comments

* addressed one-hot

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Addressed comments

* fixed test

* Update src/transformers/models/grounding_dino/modeling_grounding_dino.py

* Update tests/models/grounding_dino/test_modeling_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher

* More updates

* More updates

* fixed: GroundingDinoLoss

* add: cardinality loss and make box loss as copy from

* fix copies

* Revert "Update tests/models/grounding_dino/test_modeling_grounding_dino.py"

This reverts commit aa74c4c57c430e54cc74c414d6269edb65c73e83.

* [run-slow] groundigdino

* remove nestedtensor

* [run-slow] groundig_dino

* [run-slow] grounding_dino

* [run-slow] grounding_dino

* [run-slow] grounding_dino

* check

* check

* add: enconder intermediate outputs to ImageLoss forward

* add: GroundingDinoForObjectDetectionLoss in the loss directory

* make style

* fix the loss function

* remove class_reduction since it sum is default

* remove class_reduction

* Update src/transformers/loss/loss_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* simple fix

* Update src/transformers/loss/loss_grounding_dino.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* minor fix

* Update src/transformers/loss/loss_for_object_detection.py

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-27 19:15:58 +00:00
482d17be60 Fix hub_retry (#36449)
* cry

* trigger

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-27 14:38:25 +01:00
6a876462c3 Lazy import libraries in src/transformers/image_utils.py (#36435)
* Lazy import libraries in `src/transformers/image_utils.py`

* `make fixup`

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Protect imports

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-27 12:53:42 +00:00
8aed019764 [generate] torch.distributed-compatible DynamicCache (#36373)
* test

* docstring

* prepare distributed cache data

* fix cat dim

* test mvp

* add test checks

* like this?

* working test and solution

* nit

* nit

* add shape info
2025-02-27 11:48:57 +00:00
17792556b2 [save_pretrained ] Skip collecting duplicated weight (#36409)
* Skip collecting duplicated weight

* format
2025-02-27 10:57:11 +01:00
2d6cc0dfde Add contents: write (#36445)
fix permission

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-27 10:55:37 +01:00
549db241e5 Fix another permission (#36444)
* fix permission

* fix permission

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-27 10:29:06 +01:00
a8e4fe45fd Fix permission (#36443)
fix permission

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-27 10:08:31 +01:00
d0727d92cd Change PR to draft when it is (re)opened (#36417)
* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

* draft

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-27 09:44:33 +01:00
8ede897c30 restrict cache allocator to non quantized model (#36428) 2025-02-26 22:16:15 +01:00
a7fbab33ae Fix Expected output for compressed-tensors tests (#36425)
fix
2025-02-26 21:17:24 +01:00
1603018e7a Update form pretrained to make TP a first class citizen (#36335)
* clean code

* oups

* fix merge

* yups

* fix if

* now you can play

* fix shape issue

* try non blocking

* fix

* updates

* up

* updates

* fix most of thetests

* update

* update

* small updates

* up

* fix the remaining bug?

* update

* rename when you read from the file

* buffer issues

* current status

* cleanup

* properly allocate dumb memory

* update a small bug

* fix colwise rep issue

* fix keep in float 32 that was keeping everything in float 32

* typo

* more fixes with keep_in_fp32_modules as we use to serach on it

* fix ROPE dtype for TP

* remove what's breaking the tests

* updates

* update and fixes

* small cleanup after merging

* allocate 2x to be safe

* style, auto

* update

* yup nit

* fix

* remove slow as fuck torch api :(

* work

* fixup

* update

* brting the fix back

* fix and update

* fixes

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* updates because some suggestions were wrong 👀

* update?

* fuck this bloated function

* typo

* fix the dumb prefix thing once and forall

* fixes here and there

* updates

* remove prints

* fix strict cases

* styel

* properly fix keys on load!

* update

* fix base model prefix issue

* style

* update

* fix all?

* remoce 1 print

* fix the final etsts

* fixup

* last nits

* fix the detach issue which cause a 2x slowdown

* fixup

* small fixes

* ultra nit

* fix

* fix

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-26 20:12:38 +01:00
981c276a02 Fix compressed tensors config (#36421)
* fix config

* update

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-26 17:56:15 +01:00
d18d9c3205 Universal Speculative Decoding CandidateGenerator (#35029)
* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file

* refactor

* NOTHING. add space to rerun github actions tests

* remove it...

* `UniversalSpeculativeDecodingGenerator`

* Use `UniversalSpeculativeDecodingGenerator` when `generation_config.do_sample=True`

* assistant tokenizes only the target's new suffix

* formatting

* fix code

* fix code

* formatting

* add `TestGenerateWithDifferentModels`

* `TestGenerateWithDifferentModels` parameterize on `do_sample`

* `AssistantVocabMapping` & `AssistantVocabMappingCache`

* formatting

* `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits`

* improve `_get_assistant_to_target_input_ids` & formatting

* renaming

* WIP: debugging `min_new_tokens`

* fix get_target_ids

* `UniversalSpeculativeDecodingGenerator`

* assistant tokenizes only the target's new suffix

* formatting

* fix code

* fix code

* formatting

* `TestGenerateWithDifferentModels` parameterize on `do_sample`

* `AssistantVocabMapping` & `AssistantVocabMappingCache`

* formatting

* `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits`

* improve `_get_assistant_to_target_input_ids` & formatting

* renaming

* WIP: debugging `min_new_tokens`

* fix get_target_ids

* fix device issue

* fix get_assistant_input_ids

* add `TestAssistedCandidateGeneratorDifferentTokenizers`

* formatting

* `AssistantVocabTranslatorCache` refactor & tests

* revert changes in `src/transformers/generation/logits_process.py`

* refactor `AssistedCandidateGenerator`

* refactor `AssistedCandidateGeneratorDifferentTokenizers`

* formatting

* refactor `UniversalSpeculativeDecodingGenerator`

* fix negative value for max_new_tokens

* fix generation length target + attention_mask vs. assistant + attent

* fix device

* fix negative max_new_tokens bug

* fix UAG

* minor

* formatting

* `AssistedCandidateGeneratorDifferentTokenizers` `lookbehind`s init

* resolve conflict & formatting

* rerun CI tests

* remove space...

* remove old code

* fix candidate_input_ids device

* minor

* formatting

* Fix prepare + apply (#7)

* fix prepare + apply

* move to cpu

* simplity suppress_tokens

* fix bugs and refacatoring

* device move

* handle self.config.vocab_size > len(target_tokenizer.get_vocab())

* no need to normalize in candidate_generator

* address Nadav's comments + minor

* optimize device move + SuppressTokensLogitsProcessor

* AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements

* padding size

* padding improvement

* fix and simplify get_target_logits

* renaming in get_target_logits

* minor

* add filter_value and suppress_tokens_id

* style + rename

* remove TODO

* restore original SelectTokensLogitsProcessor with modification

* fix style

* fix _update_past_and_masks and optimize code

* remove assistant_vocab_size arg

* fix attention_mask

* call _prepare_attention_mask also if not has_past_key_values

* handling attention mask for first generation

* comment

* restore test

* remove SelectTokensLogitsProcessor

* _update_past_and_masks implementation for USD

* Add unittests for Universal Assisted generation

* fix style

* update tests

* Remove unused import and fix `test_speculation_depth` test

* exclude special and reserved tokens from tokenizer for UAG

* mv `test_universal_assisted_generation.py` to `generation/test_candidate_generator.py`

* Remove unused imports and fix style using `make style` (#9)

* formatting

* Swap gated `meta-llama/llama-3.2` with `allenai/llama` (#10)

* Fix space sign disagreement (#12)

* default values for AssistantToTargetTranslator fileds

* fix space sign

* minor

* fix test + style

* Default values for some fields of assistant to target translator (#11)

* default values for AssistantToTargetTranslator fileds

* fix

* add support to empty logit_processors

* Update candidate_generator.py (#15)

fix typo

* BUG fix in _prepare_assistant_input_ids (#14)

* fix _prepare_assistant_input_ids

* target_to_assistant_input_ids

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Nadav Timor <nadav.timor@weizmann.ac.il>

---------

Co-authored-by: Nadav Timor <nadav.timor@weizmann.ac.il>

* typo (`target_to_assistant_input_ids`)

* formatting

* merge upstream/main

* Fix minor review comments (#16)

* Fix: `token_ids.to(torch.int64)` (#18)

* tok ids to `torch.int64` (reference: https://huggingface.co/docs/transformers.js/en/api/tokenizers)

* `LongTensor`

* fix dtype

* `assistant_input_ids.to(dtype=torch.long)`

* Remove unused import from test_candidate_generator.py

* Remove unused import from test_candidate_generator.py

* Remove `numpy` import

* resolve pr comments (#19)

* `AssistantToTargetTranslator` docstring

* (per gante's comment) `filter_value` and `suppress_tokens_id` to class constants

* update `AssistantToTargetTranslator` docstring

* (gante's comment) replace `match-case`

* formatting

* Fix Joao's comments (#21)

* remove threading

* fix logits_processor

* fix test device

* fix style (#23)

* Move atm (#24)

* move AssistantToTargetTranslator

* fixup

* fix logit_processor

* add atm_translator test

* refactor test

* remove threading from test

* add require_torch in tests

* move AssistantVocabTranslatorCache + add tests

* ruff fix

---------

Co-authored-by: jmamou <jonathan.mamou@intel.com>
Co-authored-by: Gaurav <gauravj@d-matrix.ai>
Co-authored-by: Gaurav Jain <gaurjain14@gmail.com>
Co-authored-by: gauravjain14 <41287729+gauravjain14@users.noreply.github.com>
2025-02-26 16:14:02 +00:00
082834dd79 fix: prevent model access error during Optuna hyperparameter tuning (#36395)
* fix: prevent model access error during Optuna hyperparameter tuning

The `transformers.integrations.integration_utils.run_hp_search_optuna` function releases model memory and sets trainer.model to None after each trial. This causes an AttributeError when  subsequent Trainer.train calls attempt to access the model before reinitialization. This is only an issue when `fp16_full_eval` or `bf16_full_eval` flags are enabled.

* Update src/transformers/trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-26 17:06:48 +01:00
6513e5e402 add recommendations for NPU using flash_attn (#36383)
* add recommendations for Ascend NPU using flash_attn

* update recommend_message_npu

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-26 14:51:08 +01:00
b4965cecc5 Fixing the docs corresponding to the breaking change in torch 2.6. (#36420) 2025-02-26 14:11:52 +01:00
9a217fc327 Deprecate transformers.agents (#36415) 2025-02-26 11:38:47 +01:00
41925e4213 Add retry hf hub decorator (#35213)
* Add retry torch decorator

* New approach

* Empty commit

* Empty commit

* Style

* Use logger.error

* Add a test

* Update src/transformers/testing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Fix err

* Update tests/utils/test_modeling_utils.py

---------

Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 20:53:11 +01:00
9ebfda3263 Fixed VitDet for non-squre Images (#35969)
* size tuple

* delete original input_size

* use zip

* process the other case

* Update src/transformers/models/vitdet/modeling_vitdet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* [VITDET] Test non-square image

* [Fix] Make Quality

* make fix style

* Update src/transformers/models/vitdet/modeling_vitdet.py

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-02-25 19:31:24 +00:00
cbe0ea59f3 Security fix for benchmark.yml (#36402)
security

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-25 17:22:09 +01:00
88d10517b4 Fix convert_to_rgb for SAM ImageProcessor (#36369) 2025-02-25 15:10:21 +00:00
e1ce948908 [CLI] add import guards (#36376)
* add import guards

* nit
2025-02-25 15:06:50 +00:00
fb83befb14 Fix pytorch integration tests for SAM (#36397)
Fix device in tests
2025-02-25 14:53:34 +00:00
ca6ebcb9bc chore: fix function argument descriptions (#36392) 2025-02-25 14:28:34 +00:00
7c8916ddb5 fix audio classification pipeline fp16 test on cuda (#36359)
* fix audio classification pipeline fp16 test on cuda

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add comments

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update tests/pipelines/test_pipelines_audio_classification.py

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 15:01:25 +01:00
c3700b0eee [tests] enable autoawq tests on XPU (#36327)
add autoawq

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 13:38:09 +01:00
b4b9da6d9b tests: revert change of torch_require_multi_gpu to be device agnostic (#35721)
* tests: revert change of torch_require_multi_gpu to be device agnostic

The 11c27dd33 modified `torch_require_multi_gpu()` to be device agnostic
instead of being CUDA specific. This broke some tests which are rightfully
CUDA specific, such as:

* `tests/trainer/test_trainer_distributed.py::TestTrainerDistributed`

In the current Transformers tests architecture `require_torch_multi_accelerator()`
should be used to mark multi-GPU tests agnostic to device.

This change addresses the issue introduced by 11c27dd33 and reverts
modification of `torch_require_multi_gpu()`.

Fixes: 11c27dd33 ("Enable BNB multi-backend support (#31098)")
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* fix bug: modification of frozen set

---------

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-25 13:36:10 +01:00
d80d52b007 addressing the issue #34611 to make FlaxDinov2 compatible with any batch size (#35138)
fixed the batch_size error, all tests are passing

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-02-25 10:44:44 +00:00
3a02fe56c2 Added handling for length <2 of suppress_tokens for whisper (#36336)
* Update generation_whisper.py

Added handling for <2 length of suppress_tokens for whisper

* Updated None check for suppress_tokens to avoid ambiguity

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-02-25 10:32:49 +00:00
da4ab2a1b6 Fix doc formatting in forward passes & modular (#36243)
* fix indentation issues + modular without magic keyword

* style

* Update doc.py

* style

* Fix all decorators indentation

* all models

* style

* style

* Update doc.py

* fix

* general fix

* style
2025-02-25 11:09:01 +01:00
92abc0dae8 Update _get_eval_sampler to reflect Trainer.tokenizer is deprecation self.tokenizer -> self.processing_class (#36315)
* fix warning self.tokenizer -> self.processing_class

* formating change
2025-02-25 11:07:50 +01:00
9d6abf9778 enable torchao quantization on CPU (#36146)
* enable torchao quantization on CPU

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix int4

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable CPU torchao tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cuda tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cpu tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix style

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix cuda tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torchao available

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torchao available

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torchao config cannot convert to json

* fix docs

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm to_dict to rebase

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* limited torchao version for CPU

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix skip

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix cpu test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-25 11:06:52 +01:00
401543a825 Fix is_causal fail with compile (#36374)
fix
2025-02-25 10:44:56 +01:00
bc65f3fc1c [modular] Do not track imports in functions (#36279)
* Add check

* just check for function

* Update examples
2025-02-25 10:29:47 +01:00
4b5cf5496d Load models much faster on accelerator devices!! (#36380)
* caching allocator warmup

* Update modeling_utils.py

* reuse expanded map

* style
2025-02-25 09:41:22 +01:00
931e5f4ac3 Update modeling_llava_onevision.py (#36391)
Fixed a potential bug in modeling_llava_onevision.py
2025-02-25 09:34:50 +01:00
2ab7bdc403 notify new model merged to main (#36375)
notify new model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-24 17:53:18 +01:00
05dfed06d7 [Modeling] Reduce runtime when loading missing keys (#36312)
* hoist keys

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove hoist

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-02-24 16:10:28 +00:00
18276b03f7 fix(type): padding_side type should be Optional[str] (#36326) 2025-02-24 16:09:42 +00:00
f4684a6eb2 Update amd pytorch index to match base image (#36347)
pip pytorch index should match docker base image
2025-02-24 16:17:20 +01:00
2af272c101 Add autoquant support for torchao quantizer (#35503)
* Add autoquant support for torchao quantizer

Summary:
att, also verified that autoquantized model can be saved and loaded:

save: https://gist.github.com/jerryzh168/01d367aaf44dbbbfd4068a4a10a00061
load: https://gist.github.com/jerryzh168/d5c6c401b2abdf18e0b6771341f1525c

Test Plan:
tested locally with above script
model uploaded to https://huggingface.co/jerryzh168/llama3-8b-autoquant

Reviewers:

Subscribers:

Tasks:

Tags:

* add test

* ruff fix

* ruff reformat

* add docs and min_sqnr support

* format

* format

* fix test

* update doc

* format

* remove disable_compile

* format
2025-02-24 15:54:16 +01:00
977a61f743 Change slack channel for mi250 CI to amd-hf-ci (#36346) 2025-02-24 15:50:06 +01:00
884a8ea1f0 Improve model loading for compressed tensor models (#36152)
* Disable warnings for stacked compressors
* Introduce two new hooks in HfQuantizer lifecycle
to allow updates to missing and unexpected keys
* Update missing and unexpected keys
for stacked compressors
* Add tests
* Fix: run_compressed cases
* Fix: uncompressed cases

* Rename compressed_tensor folder to compressed_tensors
Move RunCompressedTest to the same file
Update tests to unittest
2025-02-24 13:47:21 +01:00
4dbf17c17f [tests] enable bnb tests on xpu (#36233)
* fix failed test

* fix device

* fix more device cases

* add more cases

* fix empty cache

* Update test_4bit.py

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-02-24 11:30:15 +01:00
92c5ca9dd7 Fix exploitable regexes in Nougat and GPTSan/GPTJNeoXJapanese (#36121)
* Fix potential regex catastrophic backtracking in NougatTokenizerFast

The original regex pattern in tokenization_nougat_fast.py was vulnerable to
catastrophic backtracking due to greedy quantifiers and nested alternations.
This commit replaces it with a more efficient pattern that:

1. Uses explicit character classes instead of dot (.)
2. Handles whitespace more precisely
3. Avoids unnecessary backtracking
4. Supports both lowercase and uppercase roman numerals
5. Maintains the same functionality while being more robust

* Try another regex

* Trying deepseek's answer

* Start with a simplification

* Another simplification

* Just rewrite the whole function myself

* Fix gptneox and gptsan

* Simplify the regex even further

* Tighten up the price regex a little

* Add possessive version of the regex

* Fix regex

* Much cleaner regexes

---------

Co-authored-by: openhands <openhands@all-hands.dev>
2025-02-21 19:49:51 +00:00
547911e727 Uses Collection in transformers.image_transforms.normalize (#36301)
* Uses Collection instead of Sequence in transformers.image_transforms.normalize

* Uses collections.abc.Collection in lieu of deprecated typing one
2025-02-21 18:38:41 +01:00
7c5bd24ffa [tests] make quanto tests device-agnostic (#36328)
* make device-agnostic

* name change
2025-02-21 14:20:40 +01:00
678885bbbd [CI] Check test if the GenerationTesterMixin inheritance is correct 🐛 🔫 (#36180) 2025-02-21 10:18:20 +00:00
a957b7911a Add SigLIP 2 (#36323)
* Docs

* Inits

* Auto classes

* Add siglip base

* Add base tests

* Fix Siglip V1 for fix res version

* Add image processor

* Update conversion

* Experimenting with vectorized embeddings

* Fixup

* Add modular Siglip2Processor

* Add modular configuration

* Rename num patches

* Correct image and text features merging

* Working conversion script

* Refactoring conversion script

* Remove unused code in conversion script

* Shorten dict a bit

* Refactoring conversion

* Done conversion refactoring

* Fixup

* Modular siglip2

* Make model exportable and compilable without graph breaks

* Remove position_ids from image_processor

* REmove position ids from modeling file

* Update modular

* Type hint

* Fixup

* Set defaults to processor

* Add integration test

* Revert spatial shapes back to tensor

* Change order

* Fix most of the tests

* Fix docstring

* Remove interpolate_pos_encoding arg (not needed)

* Update docs

* Standardize processing

* Fix attention_mask in vision head

* Siglip v1: remove double transpose in FA2

* Update modular file

* Update FA2 test

* Update expected logits

* Fix interpolation for siglip2 image processor

* Skip init test

* Skip dispatch on flash test

* Fix modeling tests

* Fixup

* Add dummy objects

* Fix some docstrings

* Add siglip2 in index.md

* Fix consistency

* Add docs

* Remove size and data format

* Add image processor tests

* Fix

* Add fast image processor

* Fix style

* Fix

* Docs

* Set lowercase for tokenizer

* Adjust head size for Siglip v1

* Update siglip2 for consistency with siglip1

* Update siglip2 conversion

* Update pipeline

* Update checkpoints in tests

* Update checkpoint name

* Fix pooling for image classification model

* Fix FA2 test

* Update processor

* Fix check repo

* Update docs

* Fix typos

* Fix docstring for fast image processor

* Add siglip2 to FA2 docs

* Fix fast ip tests

* Fix constitency

* Fix tokenizer class for siglip v1

* Fix missing header

* Refactor scaling for clip, siglip, siglip2

* Remove unused imports

* Make fast IP default for siglip2

* Update docs

* Update checkpoints

* Update modular

* Update paper link

* Fixup

* Fix name in toctree

* Fix test
2025-02-21 09:04:19 +00:00
14552cbd7c VLMs: even more clean-up (#36249)
* squash

* style
2025-02-21 09:46:31 +01:00
e18f233f6c Fix default attention mask of generate in MoshiForConditionalGeneration (#36171) 2025-02-20 19:53:27 +00:00
27d1707586 [smolvlm] make CI green (#36306)
* add smolvlm to toctree

* add requirements

* dev-ci

* no docker changes

* dev-ci

* update torch-light.dockerfile

* derp

* dev-ci
2025-02-20 18:56:11 +01:00
effaef334b fix: prevent second save in the end of training if last step was saved already (#36219)
* fix: prevent second save in the end of training

* fix: prevent second save in the end of training

* test: added test for no duplicate save on epoch save strategy

* fix: removed TrainerControl

* chore: style formatting

---------

Co-authored-by: JaktensTid <jaktenstid1@gmail.com>
2025-02-20 17:38:52 +01:00
12v
5412ff1a13 Fix typo in Pixtral example (#36302)
Fix typo
2025-02-20 14:13:48 +00:00
4397dfcb71 SmolVLM2 (#36126)
* smolvlm init

* updates

* fixing bugs

* minimal run, no checks

* minimal run, no checks

* passing first check + adding url support

* updating video dataloading logic

* fixing image logic

* trying modular, but fails

* modular is working, changing processor to match PR comments and general transformers logic

* fixing kwargs

* offloading video loading logic to  image_util

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* fixing circleci code formatting errors

* update

* add idefics3-based tests

* add keyword to all

* add PreTrainedModel

* updateing video loading logic

* working inference

* updates for PR comments

* updates for PR comments

* moving SmolVLMPretrainedModel higher to fix import error

* CI test pass

* CI test pass

* removing lambda

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* CI test pass

* processor tests

* add example in docs

* typo

* fix copies

* skip compile tests - sdpa for VisionTransformer

* fix init

* raise import error for num2words

* update doc for FA2

* more doc fix

* CI

* updates for PR comments

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* fixing processor -- tokenizer not defined properly, (gpt2 tokenizer), and does not have the attributes of fake image token, etc

* adding smolvlm to VQA models

* removing vqa auto class

* Update src/transformers/models/smolvlm/processing_smolvlm.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* removing smolvlmvisiontransformer from index.md

* my bad, video processing had typos

* fixing docs

* renaming params in SmolVLMModel.inputs_merger

* removing un-needed dtype/device in model forward

* ruff for CI

* update docs

* Update docs/source/en/model_doc/smolvlm.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* return cache position

* return cache position

* return cache also in modular

* needed to run modular again

* fix training tests

* push vectorized inputs merger

* format

* format

* reduce number of mappings

* addressing PR comments

* happy CI, happy me :)

* skip non-nested images

* adjust integration test for smaller GPUs

* format

* fix kwargs in chat template apply

* skip this for now

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Pablo <pablo.montalvo.leroux@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Joshua Lochner <admin@xenova.com>
2025-02-20 15:00:26 +01:00
f2ab182dca Ignore conversion files in test fetcher (#36251)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-20 13:32:02 +01:00
e8531a0e33 Fix broken CI on release branch due to missing conversion files (#36275)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-20 13:22:10 +01:00
5e2183f344 Make cache traceable (#35873)
simply make cache traceable
2025-02-20 09:59:25 +01:00
31bb662db1 Fix callback handler reference (#36250)
* fix reference

* style
2025-02-19 18:17:33 +01:00
78d6484675 docs: Update README_zh-hans.md (#36269)
Update README_zh-hans.md

docs: Fix awkward sentence in README
2025-02-19 09:04:46 -08:00
e5cea20743 Add Example for Custom quantization (#36286)
* add example

* rename
2025-02-19 17:09:23 +01:00
e3d99ec2f5 [tests] make test_from_pretrained_low_cpu_mem_usage_equal less flaky (#36255)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-19 15:14:02 +00:00
99adc74462 [tests] remove flax-pt equivalence and cross tests (#36283) 2025-02-19 15:13:27 +00:00
fa8cdccd91 [tests] deflake dither test (#36284) 2025-02-19 15:13:10 +00:00
60226c6ff3 TP initialization module-by-module (#35996)
* module-by-module loading!

* Update modeling_utils.py

* dtyle and comments

* Update modeling_utils.py

* Update modeling_utils.py

* Update test

* Update modeling_utils.py

* Update modeling_utils.py

* Update test_tp.py

* Update test_tp.py

* Update modeling_utils.py

* re-trigger CIs

* re-trigger CIs
2025-02-19 14:04:57 +01:00
0863eef248 [tests] remove pt_tf equivalence tests (#36253) 2025-02-19 11:55:11 +00:00
1a81d774b1 Add dithering to the Speech2TextFeatureExtractor API. (#34638)
* Add dithering to the `Speech2TextFeatureExtractor` API.

- in kaldi : 4a8b7f6732/src/feat/feature-window.cc (L145)
- with dithering without a seed, the features become non-deterministic due
  to small Gaussian noise added to the audio (i.e. 2 runs lead to little
  different outputs)

* update the PR

- add dithering also for WhisperFeatureExtractor
- not adding to Wav2Vec2FeatureExtractor (no FBANK computation)

* add unit-tests for dithering, fix docstrings

* ruff

* utils/check_copies.py --fix_and_overwrite

* update code, add seed to unit-test

* adding explanation of dithering
2025-02-19 11:50:02 +01:00
9f51dc2535 Add support for post-processing kwargs in image-text-to-text pipeline (#35374)
* fix error and improve pipeline

* add processing_kwargs to apply_chat_template

* change default post_process kwarg to args

* Fix slow tests

* fix copies
2025-02-18 17:43:36 -05:00
9b479a245b Uniformize LlavaNextVideoProcessor kwargs (#35613)
* Uniformize processor kwargs and add tests

* add videos_kwargs tests

* fix copies

* fix llava_next_video chat template tests

* remove unnecessary default kwargs
2025-02-18 14:13:51 -05:00
8ee50537fe Qwen2VL fix cos,sin dtypes to float when used with deepspeed (#36188)
* fix dtype of cos,sin when used with deepspeed

* move sin,cos casting withing flash attention functions

* fix cos,sin float casting in modular

---------

Co-authored-by: ardalan.mehrani <ardalan.mehrani@ardalanmehranis-MacBook-Pro.local>
Co-authored-by: ardalan.mehrani <ardalan.mehrani@bytedance.com>
2025-02-18 19:18:29 +01:00
8eaae6bee9 Added Support for Custom Quantization (#35915)
* Added Support for Custom Quantization

* Update code

* code reformatted

* Updated Changes

* Updated Changes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-02-18 16:14:19 +01:00
07182b2e10 GitModelIntegrationTest - flatten the expected slice tensor (#36260)
Flatten the expected slice tensor
2025-02-18 16:04:19 +01:00
4d2de5f63c Fix XGLM loss computation (PyTorch and TensorFlow) (#35878)
* Fix XGLM loss computation (PyTorch and TensorFlow)

* Update expected output string in XGLM sample test

This updates the expected output string of test_xglm_sample for torch
2.0 to the correct one and removes the one for torch 1.13.1 + cu116
(transformers moved to torch 2.0 with PR #35358).

* Update expected output IDs in XGLM generation test
2025-02-18 15:37:48 +01:00
c3ba53303b feat: add support for tensor parallel training workflow with accelerate (#34194)
* feat: add support for tensor parallel flow using accelerate

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: add tp degree to env variable

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: add version check for accelerate to allow TP

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* docs: tensor parallelism

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* nit: rename plugin name

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: guard accelerate version before allow tp

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* docs: add more docs and updates related to TP

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-18 14:05:46 +01:00
e6cc410d5b Remove flakiness in VLMs (#36242)
* fix

* nit

* no logits processor needed

* two more tests on assisted decoding
2025-02-18 11:41:07 +01:00
fdcfdbfd22 Fix TorchAoConfig not JSON serializable (#36206)
**Summary:** TorchAoConfig optionally contains a
`torchao.dtypes.Layout` object which is a dataclass and not
JSON serializable, and so the following fails:

```
import json
from torchao.dtypes import TensorCoreTiledLayout
from transformers import TorchAoConfig

config = TorchAoConfig("int4_weight_only", layout=TensorCoreTiledLayout())

config.to_json_string()

json.dumps(config.to_dict())
```

This also causes `quantized_model.save_pretrained(...)` to
fail because the first step of this call is to JSON serialize
the config. Fixes https://github.com/pytorch/ao/issues/1704.

**Test Plan:**
python tests/quantization/torchao_integration/test_torchao.py -k test_json_serializable

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-18 11:05:42 +01:00
626666c444 Au revoir flaky test_fast_is_faster_than_slow (#36240)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-17 18:30:07 +01:00
429f1a682d [tests] remove test_export_to_onnx (#36241) 2025-02-17 16:52:44 +00:00
dae8708c36 Add compressed tensor in quant dockerfile (#36239)
add compressed_tensors in the dockerfile
2025-02-17 17:48:57 +01:00
3e970dbbf1 Bump transformers from 4.38.0 to 4.48.0 in /examples/research_projects/codeparrot/examples (#36237)
Bump transformers in /examples/research_projects/codeparrot/examples

Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.48.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.48.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-17 16:28:43 +00:00
77aa9fc076 [generate] Fix encoder decoder models attention mask (#36018) 2025-02-17 15:42:28 +00:00
55493f1390 [tests] remove tf/flax tests in /generation (#36235) 2025-02-17 14:59:22 +00:00
c877c9fa5b v4.45.0-dev0 2025-02-17 15:21:20 +01:00
7ec35bc3bd Add missing atol to torch.testing.assert_close where rtol is specified (#36234) 2025-02-17 14:57:50 +01:00
dad513e0c2 [generate] remove cache v4.47 deprecations (#36212) 2025-02-17 13:55:03 +00:00
936aeb70ab AMD DeepSpeed image additional HIP dependencies (#36195)
* Add hipsolver and hipblastlt as dependencies

* Upgrade torch libs with rocm6.2.4 index
2025-02-17 11:50:49 +01:00
23d6095e8f Fix LlavaForConditionalGenerationModelTest::test_config after #36077 (#36230)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-17 11:49:07 +01:00
fae0f3dde8 [tests] fix EsmModelIntegrationTest::test_inference_bitsandbytes (#36225)
fix failed test
2025-02-17 11:10:33 +01:00
dd16acb8a3 set test_torchscript = False for Blip2 testing (#35972)
* just skip

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-14 17:43:32 +01:00
0a9923a609 Use args.num_workers in check_modular_conversion.py (#36200)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-14 17:31:03 +01:00
a570e2ba87 add shared experts for upcoming Granite 4.0 language models (#35894)
* Modular GraniteMoE with shared Experts.

Signed-off-by: Shawn Tan <shawntan@ibm.com>

* Modified

* Import order.

* Modified for style

* Fix space.

* Test

* Remove extra granitemoe file.

* New converted file and tests

* Modified __init__ files.

* Formatting.

* Dummy PT objects

* register granitemoe shared model

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix linting of a file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix import in modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* add documentation

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update docstrings

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* update generated modeling file

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix docstrings in config class

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* merge main

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Shawn Tan <shawntan@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Shawn Tan <shawntan@ibm.com>
Co-authored-by: Shawn Tan <shawn@wtf.sg>
Co-authored-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>
2025-02-14 16:55:28 +01:00
7ae7e87a09 Add @require_bitsandbytes to Aria test_batched_generation (#36192) 2025-02-14 15:48:47 +01:00
bcfc9d795e [Bugfix] Fix reloading of pixtral/llava configs (#36077)
* add is_composition flag to LlavaConfig

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* WIP: pixtral text config

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* use is_composition for pixtral

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Revert "use is_composition for pixtral"

This reverts commit a53d5f9fc5149c84419b0e9e03db6d99362add53.

* Revert "Revert "use is_composition for pixtral""

This reverts commit 3ab1c99404e2c2963fba0bcf94b9786d6365db0f.

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-02-14 15:27:05 +01:00
0c78ef6cd3 🔴 VLM: compile compatibility (#35724)
* llavas

* add mroe models

* fix `compile_forward` test for all models

* fix copies

* make style

* also doesn't support cache class

* fix some tests

* not copied from

* ci green?

* fix tests

* fix copies

* fix tests

* check with `numel` and remove `item`

* fix copies

* fix copies

* Update src/transformers/models/cohere2/modeling_cohere2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* opt remove cross attn

* gemma2

* fixup

* fixup

* fix newly added test

* maybe fixed?

* green please?

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-02-14 15:23:49 +01:00
b45cf0e90a Guard against unset resolved_archive_file (#35628)
* archive_file may not be specified
When loading a pre-trained model from a gguf file, resolved_archive_file may not be set. Guard against that case in the safetensors availability check.

* Remap partial disk offload to cpu for GGUF files
GGUF files don't support disk offload so attempt to remap them to the CPU when device_map is auto. If device_map is anything else but None, raise a NotImplementedError.

* Don't remap auto device_map and raise RuntimeError
If device_map=auto and modules are selected for disk offload, don't attempt to map them to any other device. Raise a runtime error when a GGUF model is configured to map any modules to disk.

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-14 14:44:31 +01:00
96f01a36ac Revert qwen2 breaking changes related to attention refactor (#36162)
* dito

* add a test

* upsate

* test needs fa2

* update test and configuration

* test requires fa2

* style
2025-02-14 13:44:14 +01:00
cb586a3999 Add require_read_token to fp8 tests (#36189)
fix
2025-02-14 12:27:35 +01:00
5f726f8b8e New HIGGS quantization interfaces, JIT kernel compilation support. (#36148)
* new flute

* new higgs working

* small adjustments

* progress and quallity

* small updates

* style

---------

Co-authored-by: Andrey Panferov <panferov.andrey3@wb.ru>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-02-14 12:26:45 +01:00
15ec971b8e Prepare processors for VideoLLMs (#36149)
* allow processor to preprocess conversation + video metadata

* allow callable

* add test

* fix test

* nit: fix

* add metadata frames_indices

* Update src/transformers/processing_utils.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* port updates from Orr and add one more test

* Update src/transformers/processing_utils.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* typo

* as dataclass

* style

* docstring + maek sure tests green

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-02-14 11:34:08 +01:00
33d1d715b0 Add ImageProcessorFast to Qwen2.5-VL processor (#36164)
* add qwen2 fast image processor to modular file

Signed-off-by: isotr0py <2037008807@qq.com>

* fix modular

Signed-off-by: isotr0py <2037008807@qq.com>

* fix circle import

Signed-off-by: isotr0py <2037008807@qq.com>

* add docs

Signed-off-by: isotr0py <2037008807@qq.com>

* fix typo

Signed-off-by: isotr0py <2037008807@qq.com>

* add modular generated files

Signed-off-by: isotr0py <2037008807@qq.com>

* revert qwen2vl fast image processor

Signed-off-by: isotr0py <2037008807@qq.com>

* remove qwen2.5-vl image processor from modular

Signed-off-by: isotr0py <2037008807@qq.com>

* re-generate qwen2.5-vl files

Signed-off-by: isotr0py <2037008807@qq.com>

* remove unnecessary test

Signed-off-by: isotr0py <2037008807@qq.com>

* fix auto map

Signed-off-by: isotr0py <2037008807@qq.com>

* cleanup

Signed-off-by: isotr0py <2037008807@qq.com>

* fix model_input_names

Signed-off-by: isotr0py <2037008807@qq.com>

* remove import

Signed-off-by: isotr0py <2037008807@qq.com>

* make fix-copies

Signed-off-by: isotr0py <2037008807@qq.com>

---------

Signed-off-by: isotr0py <2037008807@qq.com>
2025-02-14 17:34:55 +08:00
1931a35140 Chat template docs (#36163)
* decompose chat template docs

* add docs

* update model docs

* qwen2-5

* pixtral

* remove old chat template

* also video as list frames supported

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_template_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* remove audio for now

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-02-14 10:32:14 +01:00
3bf02cf440 CI: fix test-save-trainer (#36191)
* fix

* also the docstring
2025-02-14 10:20:56 +01:00
0ae93d31ce Add support for partial rotary embeddings in Phi3 model (#35947)
* Added support for partial_rotary_factor

* addressed comments

* refactored
2025-02-14 09:37:38 +01:00
336dc69d63 Uniformize OwlViT and Owlv2 processors (#35700)
* uniformize owlvit processor

* uniformize owlv2

* nit

* add positional arg test owlvit

* run-slow: owlvit, owlv2

* run-slow: owlvit, owlv2

* remove one letter variable
2025-02-13 17:30:26 -05:00
e6a7981711 Fix make_batched_videos and add tests (#36143)
* add support for initial shift in video processing and other fixes

* revert modifications video loading functions
2025-02-13 17:14:30 -05:00
8fd4bc7d1d Fix a mistake in #36175 (#36179)
fix my bad

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-13 18:33:02 +01:00
b1a2de075d Follow up to SpQR integration (#36176)
fix
2025-02-13 17:40:59 +01:00
12962fe84b Fix the key name for _load_rng_state under torch.cuda (#36138)
fix load key name for _load_rng_state under torch.cuda

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-13 11:35:08 -05:00
bfe46c98b5 Make check_repository_consistency run faster by MP (#36175)
* speeddddd

* speeddddd

* speeddddd

* speeddddd

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-13 17:25:17 +01:00
5f0fd1185b Optimize Qwen2VL vision model by precomputing cos/sin embeds before ViT blocks (#35837)
* Optimize Qwen2VL vision model by precomputing cos/sin embeds before ViT blocks

* Make rotary_pos_emb optional & fix type

* Adapt pre-computed cos/sin to Qwen2.5VL

* More concise
2025-02-13 17:10:58 +01:00
d72642bccc Use tqdm auto (#35726)
* Remove traces of the progressbar

* Use tqdm auto
2025-02-13 15:41:30 +00:00
62c7ea0201 CI: avoid human error, automatically infer generative models (#33212)
* tmp commit

* move tests to the right class

* remove ALL all_generative_model_classes = ...

* skip tf roberta

* skip InstructBlipForConditionalGenerationDecoderOnlyTest

* videollava

* reduce diff

* reduce diff

* remove  on vlms

* fix a few more

* manual rebase bits

* more manual rebase

* remove all manual generative model class test entries

* fix up to ernie

* a few more removals

* handle remaining cases

* recurrent gemma

* it's better here

* make fixup

* tf idefics is broken

* tf bert + generate is broken

* don't touch tf :()

* don't touch tf :(

* make fixup

* better comments for test skips

* revert tf changes

* remove empty line removal

* one more

* missing one
2025-02-13 16:27:11 +01:00
06231fdfc7 add disable compile option (#36161)
* add disable compile code

* fix
2025-02-13 16:24:46 +01:00
0ca7259217 fix training issues (#36158)
* fix training issues

* Update

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-13 16:24:28 +01:00
845b0a2616 Efficient Inference Kernel for SpQR (#34976)
* Resolve vptq conflict

* Rename spqr package to spqr_quant

* Get rid of aqlm mention

* Start working on tests

* Resolve ruff code checks

* Ruff format

* Isort

* Test updates

* Add gpu tag

* Rename to modules_to_not_convert

* Config update

* Docs and config update

* Docs and config update

* Update to update_torch_dtype

* spqr config parameter validation

* Ruff update

* Apply ruff fixes

* Test fixes

* Ruff update

* Mark tests as @slow again; Ruff; Docstring update

* Ruff

* Remove absolute path

* Resolve typo

* Remove redundandt log

* Check accelerate/spqr availability

* Ruff fix

* Check if the config contains proper shapes

* Ruff test

* Documentation update

* overview update

* Ruff checks

* Ruff code quality

* Make style

* Update docs/source/en/quantization/spqr.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update spqr.md

* Enable gptqmodel (#35012)

* gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update readme

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* gptqmodel need use checkpoint_format (#1)

* gptqmodel need use checkpoint_format

* fix quantize

* Update quantization_config.py

* Update quantization_config.py

* Update quantization_config.py

---------

Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* Revert quantizer_gptq.py (#2)

* revert quantizer_gptq.py change

* pass **kwargs

* limit gptqmodel and optimum version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix warning

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix version check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert unrelated changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix requires gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix Transformer compat (#3)

* revert quantizer_gptq.py change

* pass **kwargs

* add meta info

* cleanup

* cleanup

* Update quantization_config.py

* hf_select_quant_linear pass checkpoint_format and meta

* fix GPTQTestCUDA

* Update test_gptq.py

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* cleanup

* add backend

* cleanup

* cleanup

* no need check exllama version

* Update quantization_config.py

* lower checkpoint_format and backend

* check none

* cleanup

* Update quantization_config.py

* fix self.use_exllama == False

* spell

* fix unittest

* fix unittest

---------

Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format again

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gptqmodel version (#6)

* update gptqmodel version

* update gptqmodel version

* fix unit test (#5)

* update gptqmodel version

* update gptqmodel version

* "not self.use_exllama" is not equivalent to "self.use_exllama==False"

* fix unittest

* update gptqmodel version

* backend is loading_attibutes (#7)

* fix format and tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device mismatch

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix result check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* update tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* review: update docs (#10)

* review: update docs (#12)

* review: update docs

* fix typo

* update tests for gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update document (#9)

* update overview.md

* cleanup

* Update overview.md

* Update overview.md

* Update overview.md

* update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* typo

* doc note for asymmetric quant

* typo with apple silicon(e)

* typo for marlin

* column name revert: review

* doc rocm support

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix : Nemotron Processor in GGUF conversion (#35708)

* fixing nemotron processor

* make style

* Update docs/source/en/quantization/spqr.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add missing TOC to doc

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-02-13 16:22:58 +01:00
c5506f4f00 Bump transformers from 4.38.0 to 4.48.0 in /examples/research_projects/adversarial (#36168)
Bump transformers in /examples/research_projects/adversarial

Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.48.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.48.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-13 15:06:16 +00:00
d7c5d1b539 Bump transformers from 4.38.0 to 4.48.0 in /examples/tensorflow/language-modeling-tpu (#36167)
Bump transformers in /examples/tensorflow/language-modeling-tpu

Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.48.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.48.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-13 14:46:38 +00:00
636ee57489 [generate] revert change in Aria: the maximum cache length must match max_length (#36120)
* revert inputs_embeds len

* Update test_utils.py

* make fixup
2025-02-13 14:36:33 +00:00
b41591d847 Fix : fix doc fp8 (#36173)
* fix

* fix
2025-02-13 15:29:59 +01:00
b079dd1fa2 Fix red CI (#36174)
test was weird
2025-02-13 14:27:55 +01:00
d114a6f78e [Modular] skip modular checks based on diff (#36130)
skip modular checks based on diff
2025-02-13 12:53:21 +00:00
6397916dd2 Remove loading custom kernel for RT-DETRv2 (#36098)
* Remove loading custom kernels

* Remove config param

* Fixup
2025-02-13 12:01:53 +00:00
efe72fe21f Adding FP8 Quantization to transformers (#36026)
* first commit

* adding kernels

* fix create_quantized_param

* fix quantization logic

* end2end

* fix style

* fix imports

* fix consistency

* update

* fix style

* update

* udpate after review

* make style

* update

* update

* fix

* update

* fix docstring

* update

* update after review

* update

* fix scheme

* update

* update

* fix

* update

* fix docstring

* add source

* fix test

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-13 13:01:19 +01:00
c82319b493 Helium documentation fixes (#36170)
* Helium documentation fixes

* Update helium.md

* Update helium.md

* Update helium.md
2025-02-13 12:20:53 +01:00
8f137b2427 Move DataCollatorForMultipleChoice from the docs to the package (#34763)
* Add implementation for DataCollatorForMultipleChoice based on docs.

* Add DataCollatorForMultipleChoice to import structure.

* Remove custom DataCollatorForMultipleChoice implementations from example scripts.

* Remove custom implementations of DataCollatorForMultipleChoice from docs in English, Spanish, Japanese and Korean.

* Refactor torch version of DataCollatorForMultipleChoice to be more easily understandable.

* Apply suggested changes and run make fixup.

* fix copies, style and fixup

* add missing documentation

* nits

* fix docstring

* style

* nits

* isort

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-02-13 12:01:28 +01:00
35c155052d Fix PretrainedTokenizerFast check => Fix PretrainedTokenizerFast Save (#35835)
* Fix the bug in tokenizer.save_pretrained when saving tokenizer_class to tokenizer_config.json

* Update tokenization_utils_base.py

* Update tokenization_utils_base.py

* Update tokenization_utils_base.py

* add tokenizer class type test

* code review

* code opt

* fix bug

* Update test_tokenization_fast.py

* ruff check

* make style

* code opt

* Update test_tokenization_fast.py

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
2025-02-13 12:00:33 +01:00
3c912c9089 docs: fix return type annotation of get_default_model_revision (#35982) 2025-02-13 11:59:15 +01:00
6a1ab634b6 qwen2.5vl: fix bugs when using flash2+bf16 or num_return_sequences>1 (#36083)
* qwen2.5vl: fix bugs when using flash2+bf16 or num_return_sequences>1

* fix

* fix

* fix

* fix

* add tests

* fix test bugs

* fix

* fix failed tests

* fix
2025-02-13 11:35:28 +01:00
d419862889 Fix tests for vision models (#35654)
* Trigger tests

* [run-slow] beit, detr, dinov2, vit, textnet

* Fix BEiT interpolate_pos_encoding

* Fix DETR test

* Update DINOv2 test

* Fix textnet

* Fix vit

* Fix DPT

* fix data2vec test

* Fix textnet test

* Update interpolation check

* Fix ZoeDepth tests

* Update interpolate embeddings for BEiT

* Apply suggestions from code review
2025-02-13 10:28:37 +00:00
e60ae0d078 Replace deprecated update_repo_visibility (#35970) 2025-02-13 11:27:55 +01:00
9065cf0d92 Fix Gemma2 dtype issue when storing weights in float16 precision (#35398)
fix gemma2 dtype issue when storing weights in float16 precision
2025-02-13 11:17:37 +01:00
08ab1abff4 Add reminder config to issue template and print DS version in env (#35156)
* update env command to log deepspeed version

* suppress deepspeed import logging

* Add reminder to include configs to repro description in bug report.

* make fixup

* [WIP] update import utils for deepspeed

* Change to using is_deepspeed_available() from integrations.

* make fixup
2025-02-13 10:55:49 +01:00
950cfb0b4f Fix PaliGemma Pad Token Masking During Training #35855 (#35859)
* change order of unmasking of tokens

* library import

* class setup

* test function

* refactor

* add commit message

* test modified

* explict initiliasation of weights + made model smaller

* removed sepete testing file

* fixup

* fixup core

* test attention mask with token types

* tests fixup

* removed PaliGemmaAttentionMaskTest class

---------

Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>
2025-02-13 10:11:44 +01:00
1614d196e8 Mllama fsdp (#36000)
* pixel input assignment revoked

* double send

* Update src/transformers/models/mllama/modeling_mllama.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-02-13 09:49:39 +01:00
847854b023 Add git LFS to AMD docker image (#36016)
Add git lfs to AMD docker image
2025-02-12 22:27:21 +01:00
9985d06add skip test_initialization for VitPoseBackboneModelTest for now (#36154)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-12 18:24:24 +01:00
4a5a7b991a Fix test fetcher (#36129)
* fix

* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-12 17:35:41 +01:00
1fae54c721 Add more rigerous non-slow grad accum tests (#35668)
* Add more rigerous non-slow grad accum tests

* Further nits

* Re-add space

* Readbility

* Use tinystories instead

* Revert transformer diff

* tweak threshs
2025-02-12 10:26:21 -05:00
f869d486d3 Update doc re list of models supporting TP (#35864)
Update doc about models' TP support
2025-02-12 15:53:27 +01:00
281c0c8b5b adding option to save/reload scaler (#34932)
* Adding option to save/reload scaler

* Removing duplicate variable

* Adding save/reload test

* Small fixes on deterministic algorithm call

* Moving LLM test to another file to isolate its environment

* Moving back to old file and using subprocess to run test isolated

* Reverting back accidental change

* Reverting back accidental change
2025-02-12 15:48:16 +01:00
a33ac830af Fix multi gpu loss sync condition, add doc and test (#35743)
* Fix multi gpu loss sync condition, add doc and test

* rename function and class

* loss should not scale during inference

* fix typo
2025-02-12 15:41:31 +01:00
08c4959a23 Optim: APOLLO optimizer integration (#36062)
* Added APOLLO optimizer integration

* fix comment

* Remove redundancy: Modularize low-rank optimizer construction

* Remove redundancy: Remove useless comment

* Fix comment: Add typing

* Fix comment: Rewrite apollo desc
2025-02-12 15:33:43 +01:00
2440512723 multi-gpu: fix tensor device placements for various models (#35763)
* milti-gpu: fix inputs_embeds + position_embeds

Fixing the following errors in few models:
```
>       hidden_states = inputs_embeds + pos_embeds
E       RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:2 and xpu:3!
```

Fixes: #35762
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* multi-gpu: fix tensor device placements for various models

Fixes: #35762
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* Apply make fix-copies

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

---------

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-02-12 15:28:18 +01:00
befea8c4f0 🚨 Remove cache migration script (#35810)
* Remove cache migration script

* remove dummy move_cache
2025-02-12 15:12:38 +01:00
d52a9d08ce Bump cryptography from 43.0.1 to 44.0.1 in /examples/research_projects/decision_transformer (#36142)
Bump cryptography in /examples/research_projects/decision_transformer

Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.1 to 44.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/43.0.1...44.0.1)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-12 13:34:52 +00:00
31e4831b98 Bump transformers from 4.38.0 to 4.48.0 in /examples/research_projects/vqgan-clip (#36136)
Bump transformers in /examples/research_projects/vqgan-clip

Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.48.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.48.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-12 13:21:09 +00:00
243aeb7c4a Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters (#35898)
Replace In-Place Operations for Deberta and Deberta-V2
2025-02-12 14:21:01 +01:00
8a2f062eac [commands] remove deprecated/inoperational commands (#35718)
rm deprecated/inoperational commands
2025-02-12 12:23:58 +00:00
8fc6ecba4f VLM: enable skipped tests (#35746)
* fix cached tests

* fix some tests

* fix pix2struct

* fix
2025-02-12 12:55:46 +01:00
d6897b46bd Add utility for Reload Transformers imports cache for development workflow #35508 (#35858)
* Reload transformers fix form cache

* add imports

* add test fn for clearing import cache

* ruff fix to core import logic

* ruff fix to test file

* fixup for imports

* fixup for test

* lru restore

* test check

* fix style changes

* added documentation for usecase

* fixing

---------

Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>
2025-02-12 12:45:11 +01:00
1cc7ca3295 Whisper: remove redundant assisted generation tests (#34814)
* remove redundant test

* delete another test

* revert default max_length

* (wrong place, moving)
2025-02-12 11:37:19 +00:00
0cd5e2dfd0 added warning to Trainer when label_names is not specified for PeftModel (#32085)
* feat: added warning to Trainer when label_names is not specified for PeftModel

* Update trainer.py

* feat: peft detectw ith `_is_peft_model`

* Update src/transformers/trainer.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Applied formatting in trainer.py

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-02-12 12:34:47 +01:00
377d8e2b9c add RAdamScheduleFree optimizer (#35313)
* add RAdamScheduleFree optimizer

* revert schedulefree version to the minimum requirement

* refine is_schedulefree_available so that it can take min_version

* refine documents

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-02-12 11:31:51 +01:00
f5fff672db Add pipeline parallel plan to PretrainedConfig and PreTrainedModel (#36091)
* Add `base_model_pp_plan` to `PretrainedConfig`

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add `_pp_plan` to `PreTrainedModel`

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add both to Llama for testing

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Fix type error

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Update to suggested schema

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* `_pp_plan` keys are not patterns

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Simplify schema

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Fix typing error

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Update input name for Llama

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to Aria

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to Bamba

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to Cohere 1 & 2

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to diffllama and emu3

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to Gemma 1 & 2

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to GLM and GPT NeoX

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to Granite and Helium

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to Mistral and Mixtral

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to OLMo 1 & 2

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan to Phi and Phi 3

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan for Qwen 2, 2 MoE, 2 VL and 2.5 VL

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add pp plan for Starcoder 2

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Add enum for accessing inputs and outputs

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Update type hints to use tuples

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Change outer list to tuple

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-12 10:51:48 +01:00
11afab19c0 [docs] update awq doc (#36079)
* update awq doc

* Update docs/source/en/quantization/awq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/awq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/awq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/awq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add note for inference

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-02-11 10:35:28 -08:00
9b69986e8a [docs] minor doc fix (#36127)
fix
2025-02-11 10:31:12 -08:00
1b57de8dcf Make output_dir Optional in TrainingArguments #27866 (#35735)
* make output_dir optional

* inintaied a basic testing module to validate and verify the changes

* Test output_dir default to 'tmp_trainer' when  unspecified.

* test existing functionality of output_dir.

* test that output dir only created when needed

* final check

* added doc string and changed the tmp_trainer to trainer_output

* amke style fixes to test file.

* another round of fixup

---------

Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>
2025-02-11 18:54:36 +01:00
03534a92f8 update tiktoken integ to use converted (#36135) 2025-02-11 18:27:22 +01:00
3a5c328fd8 Fix CI issues (#35662)
* make explicit gpu dep

* [run-slow] bamba
2025-02-11 18:17:01 +01:00
775252abd4 Fix max size deprecated warning (#34998)
* Remove unused `max_size` variable in processor which was always `None` and triggered unnecessary deprecated warning

* Remove unused `max_size` variable in processor which was always `None` and triggered unnecessary deprecated warning

* Remove deprecated warnings and eliminate `max_size` usage

* Test use `int` as argument for `size`
Add a test to ensure test can pass successfully and backward compatibility

* The test pipelines still use `max_size`
Remove `max_size` from test pipelines and replace by `size` by a `Dict` with `'shortest_edge'` `'longest_edge'` as keys

* Reformatting

* Reformatting

* Revert "Reformatting"

This reverts commit c3040acee75440357cffd1f60c9d29ff5b2744b8.

* Revert "Reformatting"

This reverts commit ac4522e5c9a02d2d0c298295026db68ea26453df.

* Revert "The test pipelines still use `max_size`"

This reverts commit eaed96f041ffc32459536e1524d87f7a12ddee29.

* Revert "Test use `int` as argument for `size`"

This reverts commit 1925ee38c7c5eabb11832316712df1d4ba8043d0.

* Revert "Remove deprecated warnings and eliminate `max_size` usage"

This reverts commit d8e7e6ff9025931468fc1f3827cda1fa391003d5.

* Change version `4.26` to "a future version"

* Reformatting

* Revert "Change version `4.26` to "a future version""

This reverts commit 2b53f9e4
2025-02-11 18:14:31 +01:00
5489fea557 update awesome-transformers.md. (#36115)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-02-11 15:55:49 +00:00
76048be419 fix: typos in documentation files (#36122)
* Update tools.py

* Update text_generation.py

* Update question_answering.py
2025-02-11 13:47:20 +00:00
f42d46ccb4 Add common test for torch.export and fix some vision models (#35124)
* Add is_torch_greater_or_equal test decorator

* Add common test for torch.export

* Fix bit

* Fix focalnet

* Fix imagegpt

* Fix seggpt

* Fix swin2sr

* Enable torch.export test for vision models

* Enable test for video models

* Remove json

* Enable for hiera

* Enable for ijepa

* Fix detr

* Fic conditional_detr

* Fix maskformer

* Enable test maskformer

* Fix test for deformable detr

* Fix custom kernels for export in rt-detr and deformable-detr

* Enable test for all DPT

* Remove custom test for deformable detr

* Simplify test to use only kwargs for export

* Add comment

* Move compile_compatible_method_lru_cache to utils

* Fix beit export

* Fix deformable detr

* Fix copies data2vec<->beit

* Fix typos, update test to work with dict

* Add seed to the test

* Enable test for vit_mae

* Fix beit tests

* [run-slow] beit, bit, conditional_detr, data2vec, deformable_detr, detr, focalnet, imagegpt, maskformer, rt_detr, seggpt, swin2sr

* Add vitpose test

* Add textnet test

* Add dinov2 with registers

* Update tests/test_modeling_common.py

* Switch to torch.testing.assert_close

* Fix masformer

* Remove save-load from test

* Add dab_detr

* Add depth_pro

* Fix and test RT-DETRv2

* Fix dab_detr
2025-02-11 11:37:31 +00:00
1779f5180e Fix nighlty CIs: missing atols (#35903)
fix osme missing atols
2025-02-11 10:49:21 +01:00
1feebb5b41 AutoformerForPrediction test add atol (#36017) 2025-02-10 19:22:24 +01:00
be2ac0916a [generate] shape checks in tests compatible with fixed-length caches (+ some minor fixes) (#35993)
* shape checks compatible with static cache

* add test

* tmp

* manually turn on eager attn when we want to output attn

* typo

* generalize to encoder-decoder models

* force compilation on cpu

* tmp commit

* fix static cache shape checks

* models with odd caches

* fix copies

* shorter cache search loop

* use decoder_past_key_values everywhere

* better test variable names and comments

* signature

* rename _check_outputs into _check_generate_outputs

* add comments

* HybridCache future test note
2025-02-10 17:50:54 +00:00
9510ae39d9 fix bnb warning (#36116)
fix
2025-02-10 17:34:50 +01:00
09261ccf12 [Bugfix] fix file name of docstring in utils/check_table.py (#36108)
fix file name

Co-authored-by: kkscilife <qa-caif-cicd@pjlab.org.cn>
2025-02-10 15:48:02 +00:00
d4a6b4099b Revert checkpoint tmp dir (#36112)
* Revert "Fix OS err (#36094)"

This reverts commit ba29a439adbe6f371710d0514659127264ae24b3.

* Revert "Save checkpoint to temporary directory to handle partial saves during failures (#35580)"

This reverts commit 20d17358c468b7aefca9e54c3461eb88d1ee34f9.
2025-02-10 16:22:03 +01:00
0baf003915 Refactor OPT model (#36101)
* remove cross attention

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* remove is_decoder

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix pkv

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-02-10 14:27:16 +01:00
924f1c717a Remove Multi-threaded image conversion for fast image processors (#36105)
remove multithreaded image conversion

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-10 07:59:34 -05:00
3897f2caf8 Enable pytest live log and show warning logs on GitHub Actions CI runs (#35912)
* fix

* remove

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-10 13:36:20 +01:00
48a309d0d2 Support constant lr with cooldown (#35453)
* Add support for constant learning rate with cooldown

* Add support for constant learning rate with cooldown

* Add support for constant learning rate with cooldown

* Add support for constant learning rate with cooldown

* Add support for constant learning rate with cooldown

* Add support for constant learning rate with cooldown

* Add support for constant learning rate with cooldown

* Add more warmup and cooldown methods to 'get_wsc_schedule'

* Add more warmup and cooldown methods to 'get_wsc_schedule'

* Add more warmup and cooldown methods to 'get_wsc_schedule'

* Add more warmup and cooldown methods to 'get_wsc_schedule'

* Add more warmup and decay methods to 'get_wsd_schedule'

* support num_training_steps and num_stable_steps for get_wsd_schedule

* support num_training_steps and num_stable_steps for get_wsd_schedule

* get wsd scheduler before the `num_training_steps` decision

* fix code_quality

* Update stable branch logic

* fix code_quality

* Move stable stage decide to `get_wsd_schedule`

* Update docstring of `get_wsd_schedule`

* Update `num_train_steps` to optional

* Update `num_train_steps` to optional

* Update docstring of `get_wsd_schedule`

* Update src/transformers/optimization.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-02-10 13:21:55 +01:00
9a6be63fdb Add Apple's Depth-Pro for depth estimation (#34583)
* implement config and model building blocks

* refactor model architechture

* update model outputs

* update init param to include use_fov_model

* update param name in config

* fix hidden_states and attentions outputs for fov

* sort config

* complete minor todos

* update patching

* update config for encoder

* fix config

* use correct defaults in config

* update merge for compatibility with different image size

* restructure encoder for custom configuration

* make fov model compatible with custom config

* replace word "decoder" with "fusion"

* weight conversion script

* fix fov squeeze

* update conversion script (without test)

* upload ruff image processing

* create fast image processing

* use torch interpolation for image processing

* complete post_process_depth_estimation

* config: fix imports and sort args

* apply inference in weight conversion

* use mllama script instead for weight conversion

* clean weight conversion script

* add depth-pro status in other files

* fill docstring in config

* formatting

* more formatting

* formatting with ruff

* formatting with style

* fix copied classes

* add examples; update weight convert script

* fix using check_table.py and isort

* fix config docstring

* add depth pro to sdpa docs

* undo unintentional changes in configuration_gemma.py

* minor fixes

* test image processing

* fixes and tests

* more fixes

* use output states from image_encoder instead

* Revert "use output states from image_encoder instead"

This reverts commit 2408ec54e4f27d2abbecdb8374e58f34d91d8e96.

* make embeddings dynamic

* reshape output hidden states and attentions as part of computation graph

* fix ruff formating

* fix docstring failure

* use num_fov_head_layers in tests

* update doc

* check consistency with config

* ruff formatting

* update test case

* fix ruff formatting

* add tests for fov

* use interpolation in postprocess

* run and fix slow tests locally

* use scaled_images_features for image and fov encoder

* return fused_hidden_states in fusion stage

* fix example

* fix ruff

* fix copyright license for all files

* add __all__ for each file

* minor fixes
- fix download spell
- add push_to_hub option
- fix Optional type hinting
- apply single loop for DepthProImageProcessor.preprocess

* return list in post_process_depth_estimation

* minor fixes
- capitalize start of docstring
- use ignore copy
- fix examples
- move docstring templates and custom output classes to top
- remove "-> None" typehinting from __init__
- type hinting for forward passes
- fix docstrings for custom output classes

* fix "ruff check"

* update upsample and projection

* major changes: (image size and merge optimization)
- add support for images of any size
- optimize merge operation
- remove image_size from config
- use full names instead of B, C, H, W
- remove interpolation from fusion stage
- add interpolation after merge
- move validations to config
- update integration test
- add type hints for functions

* fix push_to_hub option in weights conversion

* remove image_size in weights conversion

* major changes in the architecture
- remove all DepthProViT modules and support different backbones using the AutoModel API
- set default use_fov_model to False
- validate parameters in configuration
- update interpolate function: use "nearest" for faster computation
- update reshape_feature function: remove all special tokens, possible from different backbones
- update merge function: use padding from config instead of merge_out_size
- remove patch_to_batch and batch_to_patch conversions for now
- calculate out_size dynamically in the encoder
- leave head_mask calculation to the backbone
- fix bugs with merge
- add more comments
- update tests

* placeholder for unused config attributes

* improve docs amid review

* minor change in docs

* further optimize merge

* fix formatting

* remove unused patch/batch convertion functions

* use original F.interpolate

* improve function naming

* minor chages
- use torch_int instead of int
- use proper for newly initialized tensors
- use user provided return_dict for patch_encoder
- use if-else block instead in self.use_fov_model

* rearchitect upsample block for improved modularity

* update upsample keys in weight conversion

* improve padding in merge_patches

* use double-loop for merge

* update comments

* create feature_extractor, reduce some forward code

* introduce config.use_mask_token in dinov2

* minor fixes

* minor fixes for onnx

* update __init__ to latest format

* remove DepthProConfig.to_dict()

* major changes in backbone

* update config in weight conversion

* formatting

* converted model is fp32

* improve naming and docs for feature_extractor->reconstruct_feature_maps

* minor fixes; amid review

* create intermediate vars in func call

* use torch.testing.assert_close

* use ModuleList instead of Sequential and ModuleDict

* update docs

* include fov in integraiton tests

* update docs

* improve initialization of convolution layers

* fix unused fov keys

* update tests

* ruff format

* fix test, amid kaimming initialization

* add depthpro to toctree

* add residual layer to _no_split_modules

* architecture rework

* Update src/transformers/models/depth_pro/image_processing_depth_pro.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/depth_pro/image_processing_depth_pro_fast.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* update docs

* improve merge_patches

* use flatten with fov_output

* ruff formatting

* update resources section in docs

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix typo "final_kernal_size"

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix output typehint for DepthProDepthEstimator

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* residual operation in 2 steps

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* use image_size instead of global patch_size in interpolation

* replace all Sequential with ModuleList

* update fov

* update heads

* fix and update conversion script for heads

* ruff formatting

* remove float32 conversion

* use "Fov" instead of "FOV" in class names

* use "Fov" instead of "FOV" in config docs

* remove prune_heads

* update fusion stage

* use device in examples

* update processor

* ruff fixes

* add do_rescale in image_processor_dict

* skip test: test_fast_is_faster_than_slow

* ruff formatting

* DepthProImageProcessorFast in other files

* revert antialias removal

* add antialias in BaseImageProcessorFast

* Revert "revert antialias removal"

This reverts commit 5caa0bd8f9f7463b98410c04e6cfe8fef3adee18.

* Revert "add antialias in BaseImageProcessorFast"

This reverts commit 3ae1134780ae236872985523d9c0a444eabcc179.

* update processor for grouping and antialias

* try test_fast_is_faster_than_slow without "skip" or "flanky"

* update checkpoint

* update checkpoint

* use @is_flanky for processor test

* update checkpoint to "apple/DepthPro-hf"

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-02-10 11:32:45 +00:00
c399921965 Paligemma: revert #36084 (#36113)
* revert

* type check
2025-02-10 12:04:24 +01:00
eebd2c972c Chat template: update for processor (#35953)
* update

* we need batched nested input to always process correctly

* update a bit

* fix copies
2025-02-10 09:52:19 +01:00
5bd7694781 Processors: allow tuples of images when checking (#36084)
allow tuples of images
2025-02-10 09:35:13 +01:00
3a3b06ace4 fix MllamaVisionAttention typehint (#35975)
* fix MllamaVisionAttention typehint

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* Update src/transformers/models/mllama/modeling_mllama.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* fix suggestion

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2025-02-10 09:17:10 +01:00
6b55046213 [docs] fix not-working example code in perf_infer_gpu_one.md (#36087)
* bug fix

* update memory limit
2025-02-07 12:42:22 -08:00
14ca7f1452 [docs] fix typo (#36080)
typo fix
2025-02-07 12:42:09 -08:00
c361b1e3d9 [docs] fix model checkpoint name (#36075)
update model name
2025-02-07 12:41:52 -08:00
ba29a439ad Fix OS err (#36094)
* Try via local_main_process first

* try 2
2025-02-07 09:57:43 -05:00
a18b7fdd9e Move audio top_k tests to the right file and add slow decorator (#36072)
* Move audio top_k tests to the right file and add slow decorator because we load a real model

* empty commit to trigger tests
2025-02-07 14:32:30 +00:00
014047e1c8 Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL (#36065) 2025-02-07 10:43:45 +01:00
006d9249ec Adding RT-DETRv2 for object detection (#34773)
* cookiecutter add rtdetrv2

* make modular working

* working modelgit add .

* working modelgit add .

* finalize moduar inheritence

* finalize moduar inheritence

* Update src/transformers/models/rtdetrv2/modular_rtdetrv2.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* update modular and add rename

* remove output ckpt

* define loss_kwargs

* fix CamelCase naming

* fix naming + files

* fix modular and convert file

* additional changes

* fix modular

* fix import error (switch to lazy)

* fix autobackbone

* make style

* add

* update testing

* fix loss

* remove old folder

* fix testing for v2

* update docstring

* fix docstring

* add resnetv2 (with modular bug to fix)

* remove resnetv2 backbone

* fix changes

* small fixes

* remove rtdetrv2resnetconfig

* add rtdetrv2 name to convert

* make style

* Update docs/source/en/model_doc/rt_detr_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/rt_detr_v2/modular_rt_detr_v2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix modular typo after review

* add reviewed changes

* add final review changes

* Update docs/source/en/model_doc/rt_detr_v2.md

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/rt_detr_v2/__init__.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/rt_detr_v2/convert_rt_detr_v2_weights_to_hf.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* add review changes

* remove rtdetrv2 resnet

* removing this weird project change

* change ckpt name from jadechoghari to author

* implement review and update testing

* update naming and remove wrong ckpt

* name

* make fix-copies

* Fix RT-DETR loss

* Add resources, fix name

* Fix repo in docs

* Fix table name

---------

Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: qubvel <qubvel@gmail.com>
2025-02-06 19:28:45 +00:00
6246c03260 [docs] fix outdated example code in trainer.md (#36066)
fix bugs
2025-02-06 10:54:22 -08:00
4563ba2c6f Fix StopStringCriteria to handle tokens above len(tokenizer) (#35797)
* Fix StopStringCriteria to handle tokens above len(tokenizer)

This fixes #35244 by clipping token IDs to be within the tokenizer's vocabulary size before performing the embedding lookup. This prevents index errors when model.config.vocab_size > len(tokenizer).

The fix:
1. Adds a clamp operation to ensure token IDs are within bounds
2. Adds a test case to verify the behavior

* Use self.stop_strings instead of stop_strings

* Handle clipping correctly

* make fixup

* Update test to the new embedding vecs

* Use much bigger values in the mismatch test

* Typo fix

* Slight simplification

---------

Co-authored-by: openhands <openhands@all-hands.dev>
2025-02-06 16:53:28 +00:00
28f73bc307 Fix model kwargs (#35875)
* Save state

* Make a failing test

* Better test

* mpt -> done, many more to go

* Rm extranious

* Bamba

* Bert

* big_bird

* biogpt

* bloom

* codegen

* ctrl

* data2vec

* dbrx

* Through up to Dbrx

* electra

* ernie

* falcon

* Fuyu/persimmon

* Include noop kwargs to base models

* Rebase

* Skip musigen

* Refactor/skip mllama

* Revert makefile

* Rm file

* Fix PT failing, need to modify rest of loss funcs to not resize

* Propagate some

* Continue

* More

* More options

* Mostly fixed

* Proved that it's the same

* Bloom is good

* Make ability to override loss func possible

* Fixup

* Clean

* Fix xglm

* Quality tests

* Skip OCR2

* Make specific loss for xglm

* Make order the same/line up 1:1

* xglm

* Skip fx output loss bloom model

* Didn't pass in pad_token_id

* Fix quality
2025-02-06 11:35:25 -05:00
1590c66430 Fix words typos in ggml test. (#36060)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-02-06 15:32:40 +00:00
1ce0e2992e Nail in edge case of torch dtype being overriden permantly in the case of an error (#35845)
* Nail in edge case of torch dtype

* Rm unused func

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Refactor tests to only mock what we need, don't introduce injection functions

* SetUp/TearDown

* Do super

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-02-06 09:05:23 -05:00
e3458af726 Save checkpoint to temporary directory to handle partial saves during failures (#35580)
Save checkpoint to temporary folder first

Since partial/missing files due to failures throw error during load
2025-02-06 08:48:05 -05:00
3dd1de39bb Paligemma: fix generation with Gemma2 (#36044)
* fix paligemma

* nit

* use `kwargs` in models that can load any LM
2025-02-06 14:31:32 +01:00
dce9970884 Update test_flash_attn_2_can_dispatch_composite_models (#36050)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-06 12:09:49 +01:00
37faa97d9b Fix repo consistency (#36063)
* fix 1

* fix 2

* fix modular

* simplify at the same time

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-02-06 11:53:15 +01:00
ed98ad35e6 Fix usage of unpad_input function (#35925)
Fix usage of unpad_function

See https://github.com/huggingface/transformers/issues/35899

In the [commit](cdbbe844b1) return type of `unpad_input` was changed.
Now the code support older and newer versions

Co-authored-by: Pavel Gein <pavel.gein@gmail.com>
2025-02-06 11:33:42 +01:00
7aee036e54 Iterative generation using Input embeds and past_key_values (#35890)
* Iterative generation using input embeds

* ruff fix

* Added Testcase

* Updated comment

* ♻️ Refactored testcase

* Skip test for these models

* Continue generation using input embeds and cache

* Skip generate_continue_from_embeds test

* Refactor `prepare_input_for_generation` func

* Continue generation using input embeds and cache

* Modular changes fix

* Overwrite 'prepare_inputs_for_generation' function
2025-02-06 11:06:05 +01:00
b5f327f350 Add Qwen2VLImageProcessorFast into Qwen2VLProcessor (#35987)
* Add `Qwen2VLImageProcessorFast` into `Qwen2VLProcessor`

* Use `AutoImageProcessor` instead

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-02-06 10:03:09 +01:00
0de15c988b Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736 (#35771)
* added condition for top_k Doc mismatch fix

* initilation of test file for top_k changes

* added test for returning all labels

* added test for few labels

* tests/test_audio_classification_top_k.py

* final fix

* ruff fix

---------

Co-authored-by: sambhavnoobcoder <indosambahv@gmail.com>
2025-02-05 16:25:08 +00:00
694aaa7fbc Fix how we compute the final non-padding token for ForSequenceClassification models (#35911)
* Fix how we compute the final non-padding token for Gemma (and probably other models)

* .size() -> .shape[]

* Propagating changes to other models

* Propagating changes to other models

* Change it for all ForSequenceClassification models

* Fix batch dim

* More TF fixes

* Copy the TF fix around as well

* Correct layer name for TFCTRL

* Cleaner .to()

* Clean up the nested if-else

* Use argmax() instead of .max().values
2025-02-05 16:23:33 +00:00
531d1511f5 [docs] no hard-coding cuda (#36043)
make device-agnostic
2025-02-05 08:22:33 -08:00
7399f8021e [docs] fix bugs in the bitsandbytes documentation (#35868)
* fix doc

* update model
2025-02-05 08:21:20 -08:00
0a1a8e3c7e [docs] no hard coding cuda as bnb has multi-backend support (#35867)
* change cuda to DEVICE

* Update docs/source/en/llm_tutorial.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-02-05 08:20:02 -08:00
9dc1efa5d4 DeepSpeed github repo move sync (#36021)
deepspeed github repo move
2025-02-05 08:19:31 -08:00
c772bff31a add support for empty list as input to create_model_card (#36042)
handle cases where it is list
2025-02-05 13:29:17 +01:00
315a9f494e Add XPU type for work-around -inf mask causing sdpa NaN issue in modeling files (#35647)
* add xpu for unmask

* change modular for generated matching

* add lastest modeling for helium
2025-02-05 13:28:31 +01:00
d8080d55c7 Fix synced multi-GPU generation with LLMs and VLMs (#35893)
* Fix synced multi-GPU generation

* fix copies

---------

Co-authored-by: Davit Manukyan <ManukyanD>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-02-05 11:15:11 +01:00
4831a94ee7 Fix Gemma2 synced multi-GPU generation (#35232)
* Fix Gemma2 synced multi-GPU generation

* Fix import ordering in modular_gemma2.py
2025-02-05 10:07:50 +01:00
fa56dcc2ab Refactoring of ImageProcessorFast (#35069)
* add init and base image processing functions

* add add_fast_image_processor to transformers-cli

* add working fast image processor clip

* add fast image processor to doc, working tests

* remove "to be implemented" SigLip

* fix unprotected import

* fix unprotected vision import

* update ViTImageProcessorFast

* increase threshold slow fast ewuivalence

* add fast img blip

* add fast class in tests with cli

* improve cli

* add fast image processor convnext

* add LlavaPatchingMixin and fast image processor for llava_next and llava_onevision

* add device kwarg to ImagesKwargs for fast processing on cuda

* cleanup

* fix unprotected import

* group images by sizes and add batch processing

* Add batch equivalence tests, skip when center_crop is used

* cleanup

* update init and cli

* fix-copies

* refactor convnext, cleanup base

* fix

* remove patching mixins, add piped torchvision transforms for ViT

* fix unbatched processing

* fix f strings

* protect imports

* change llava onevision to class transforms (test)

* fix convnext

* improve formatting (following Pavel review)

* fix handling device arg

* improve cli

* fix

* fix inits

* Add distinction between preprocess and _preprocess, and support for arbitrary kwargs through valid_extra_kwargs

* uniformize qwen2_vl fast

* fix docstrings

* add add fast image processor llava

* remove min_pixels max_pixels from accepted size

* nit

* nit

* refactor fast image processors docstrings

* cleanup and remove fast class transforms

* update add fast image processor transformers cli

* cleanup docstring

* uniformize pixtral fast and  make _process_image explicit

* fix prepare image structure llava next/onevision

* Use typed kwargs instead of explicit args

* nit fix import Unpack

* clearly separate pops and gets in base preprocess. Use explicit typed kwargs

* make qwen2_vl preprocess arguments hashable
2025-02-04 17:52:31 -05:00
8d73a38606 Add DAB-DETR for object detection (#30803)
* initial commit

* encoder+decoder layer changes WIP

* architecture checks

* working version of detection + segmentation

* fix modeling outputs

* fix return dict + output att/hs

* found the position embedding masking bug

* pre-training version

* added iamge processors

* typo in init.py

* iterupdate set to false

* fixed num_labels in class_output linear layer bias init

* multihead attention shape fixes

* test improvements

* test update

* dab-detr model_doc update

* dab-detr model_doc update2

* test fix:test_retain_grad_hidden_states_attentions

* config file clean and renaming variables

* config file clean and renaming variables fix

* updated convert_to_hf file

* small fixes

* style and qulity checks

* return_dict fix

* Merge branch main into add_dab_detr

* small comment fix

* skip test_inputs_embeds test

* image processor updates + image processor test updates

* check copies test fix update

* updates for check_copies.py test

* updates for check_copies.py test2

* tied weights fix

* fixed image processing tests and fixed shared weights issues

* added numpy nd array option to get_Expected_values method in test_image_processing_dab_detr.py

* delete prints from test file

* SafeTensor modification to solve HF Trainer issue

* removing the safetensor modifications

* make fix copies and hf uplaod has been added.

* fixed index.md

* fixed repo consistency

* styel fix and dabdetrimageprocessor docstring update

* requested modifications after the first review

* Update src/transformers/models/dab_detr/image_processing_dab_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* repo consistency has been fixed

* update copied NestedTensor function after main merge

* Update src/transformers/models/dab_detr/modeling_dab_detr.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* temp commit

* temp commit2

* temp commit 3

* unit tests are fixed

* fixed repo consistency

* updated expected_boxes varible values based on related notebook results in DABDETRIntegrationTests file.

* temporarialy config modifications and repo consistency fixes

* Put dilation parameter back to config

* pattern embeddings have been added to the rename_keys method

* add dilation comment to config + add as an exception in check_config_attributes SPECIAL CASES

* delete FeatureExtractor part from docs.md

* requested modifications in modeling_dab_detr.py

* [run_slow] dab_detr

* deleted last segmentation code part, updated conversion script and changed the hf path in test files

* temp commit of requested modifications

* temp commit of requested modifications 2

* updated config file, resolved codepaths and refactored conversion script

* updated decodelayer block types and refactored conversion script

* style and quality update

* small modifications based on the request

* attentions are refactored

* removed loss functions from modeling file, added loss function to lossutils, tried to move the MLP layer generation to config but it failed

* deleted imageprocessor

* fixed conversion script + quality and style

* fixed config_att

* [run_slow] dab_detr

* changing model path in conversion file and in test file

* fix Decoder variable naming

* testing the old loss function

* switched back to the new loss function and testing with the odl attention functions

* switched back to the new last good result modeling file

* moved back to the version when I asked the review

* missing new line at the end of the file

* old version test

* turn back to newest mdoel versino but change image processor

* style fix

* style fix after merge main

* [run_slow] dab_detr

* [run_slow] dab_detr

* added device and type for head bias data part

* [run_slow] dab_detr

* fixed model head bias data fill

* changed test_inference_object_detection_head assertTrues to torch test assert_close

* fixes part 1

* quality update

* self.bbox_embed in decoder has been restored

* changed Assert true torch closeall methods to torch testing assertclose

* modelcard markdown file has been updated

* deleted intemediate list from decoder module

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-02-04 17:28:27 +00:00
fe52679e74 Update tests regarding attention types after #35235 (#36024)
* update

* update

* update

* dev-ci

* more changes

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-04 18:04:47 +01:00
014a1fa2c8 CircleCI with python 3.9 (#36027)
update docker files

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-04 17:40:20 +01:00
c98b467905 feat(ci): ignore trufflehog unverified results (#36031) 2025-02-04 16:39:36 +01:00
9855acb9c5 Hotfix for self-comment-ci.yml (#36030)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-04 16:28:05 +01:00
9f486badd5 Display warning for unknown quants config instead of an error (#35963)
* add supports_quant_method check

* fix

* add test and fix suggestions

* change logic slightly

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-02-04 15:17:01 +01:00
f19bfa50e7 Commont bot CI for other jobs (generation / quantization) (#35341)
* quantization CI on PRs

* fix

* fix

* add 2 members

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-02-04 14:42:51 +01:00
a93b80588b Fix RMSNormGated in Zamba2 (#35943)
* First commit

* Finish model implementation

* First commit

* Finish model implementation

* Register zamba2

* generated modeling and configuration

* generated modeling and configuration

* added hybrid cache

* fix attention_mask in mamba

* dropped unused loras

* fix flash2

* config docstrings

* fix config and fwd pass

* make fixup fixes

* text_modeling_zamba2

* small fixes

* make fixup fixes

* Fix modular model converter

* added inheritances in modular, renamed zamba cache

* modular rebase

* new modular conversion

* fix generated modeling file

* fixed import for Zamba2RMSNormGated

* modular file cleanup

* make fixup and model tests

* dropped inheritance for Zamba2PreTrainedModel

* make fixup and unit tests

* Add inheritance of rope from GemmaRotaryEmbedding

* moved rope to model init

* drop del self.self_attn and del self.feed_forward

* fix tests

* renamed lora -> adapter

* rewrote adapter implementation

* fixed tests

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Dropped adapter in-place sum

* removed rope from attention init

* updated rope

* created get_layers method

* make fixup fix

* make fixup fixes

* make fixup fixes

* update to new attention standard

* update to new attention standard

* make fixup fixes

* minor fixes

* cache_position

* removed cache_position postion_ids use_cache

* remove config from modular

* removed config from modular (2)

* import apply_rotary_pos_emb from llama

* fixed rope_kwargs

* Instantiate cache in Zamba2Model

* fix cache

* fix @slow decorator

* small fix in modular file

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* several minor fixes

* inherit mamba2decoder fwd and drop position_ids in mamba

* removed docstrings from modular

* reinstate zamba2 attention decoder fwd

* use regex for tied keys

* Revert "use regex for tied keys"

This reverts commit 9007a522b1f831df6d516a281c0d3fdd20a118f5.

* use regex for tied keys

* add cpu to slow forward tests

* dropped config.use_shared_mlp_adapter

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* re-convert from modular

* extended Zamba2RMSNormGated to n_groups>1

* removed einops import

* set _supports_sdpa = True

* add use_mem_eff_path flag for fused mamba2 fwd

* added docstring for use_mem_eff_ath flag

---------

Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-02-04 14:28:04 +01:00
bc9a6d8302 Fix device mismatch error in Whisper model during feature extraction (#35866)
* Fix device mismatch error in whisper feature extraction

* Set default device

* Address code review feedback

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-02-04 12:23:08 +01:00
9afb904b15 Refactor (and fix) gpt_neox (#35610)
* start a nice modular

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* Update modular_gpt_neox.py

* update

* Update modular_gpt_neox.py

* convert

* fix attribute

* fix attrs

* oups

* fix

* fix

* fix

* fix

* fix

* fix order to pass test (see with accelerate team)

* trigger CIs

* modular

* update

* up

* Update test_modeling_gpt_neox.py

* Update test_modeling_gpt_neox.py

* trigger CIs

* correctly pass arg

* simplify

* remove key warning

* update tp -> it's compatible since the view is before

* trigger CIs
2025-02-04 11:18:43 +01:00
ad30598923 Update Mistral converter (#35967)
* Update convert_mistral_weights_to_hf.py

* Update convert_mistral_weights_to_hf.py

* update

* style

* move it to integrations

* style

* trigger CIs

* trigger CIs
2025-02-04 11:13:12 +01:00
b1954fd64a layernorm_decay_fix (#35927)
* layernorm_decay_fix

* W293 fix

* ruff format fix

* black format

* ruff format

* erase last layer

* add test_get_parameter_names_rmsnorm

* rmsnorm fix
2025-02-04 11:01:49 +01:00
2ba040a71f apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True (#35582)
* apply_chat_template: consistent return_tensors behaviour with return_assistant_tokens_mask flag

* test_chat_template_return_assistant_tokens_mask: support tokenizers with no attention mask

* test_chat_template_return_assistant_tokens_mask: skip tokenizers with no padding token

* test_chat_template_return_assistant_tokens_mask: force tokenizer padding_side=right

---------

Co-authored-by: Eduard Allakhverdov <goncharova@airi.net>
Co-authored-by: d.tarasov <d.tarasov@airi.net>
2025-02-04 10:27:52 +01:00
9c02cb6233 Fix custom kernel for DeformableDetr, RT-Detr, GroindingDINO, OmDet-Turbo in Pytorch 2.6.0 (#35979)
Updates type().is_cuda() -> .is_cuda(); .data<> -> .data_ptr<>
2025-02-04 09:07:25 +00:00
5d75a25b03 Qwen2-VL: fix rope delta calculation (#36013)
* fix rope delats calculation

* add test

* style
2025-02-04 09:48:29 +01:00
e284c7e954 Update Granite Vision Model Path / Tests (#35998)
* Update granite vision model path

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* Enable granite vision test

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

---------

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
2025-02-03 20:06:03 +01:00
Gar
9d2056f12b Add mean_resizing for every VLMs' resizing_token_embeddings() (#35717)
* refine all resize_token_embedding()

* ruff format

* hotfix
2025-02-03 15:03:49 +01:00
7eecdf2a86 Update-tp test (#35844)
* update test for now

* up

* cleanup

* update todo
2025-02-03 09:37:02 +01:00
62db3e6ed6 use torch 2.6 for daily CI (#35985)
use torch 2.6 for CI

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-31 18:58:23 +01:00
2b46943195 Add GOT-OCR 2.0 to Transformers (#34721)
* init modular got_ocr2

* Get correct got_ocr architecture

* add processing

* run modular with processing

* add working inference

* apply modular

* Refactor and fix style

* Refactor, cleanup, fix style

* fix init order

* Fix docs

* add base modeling tests

* fix style and consistency

* rename doc file

* fix repo consistency

* fix inference with box

* add image processing and support for crop_to_multi_page

* Fix batch inference

* add tests

* fixup

* fix slow test

* fix docstrings

* Add model doc

* update to new init

* fix input autocast pixel_values dtype

* update doc

* move doc to multimodal

* Reformat crop_image_to_patches and add docstrings

* Fix example in forward docstring

* Address Pablo review

* [run slow] got_ocr2

* remove defaults defined twice

* apply modular

* add torch_device to integration tests

* update modular

* follow-up Pavel review

* add device variable in doc

* fix doc multi-page

* Force eager attention for vision encoder to avoid attn implementation conflict

* revert qwen2vl doc changes

* use Qwen2ForCausalLM instead of Qwen2Model

* make fixup

* refactor gotocr2 to llava style

* uniformize function names and reduce checks

* final nits

* fix pixel_values dtype error

* change checkpoint names

* fix modular
2025-01-31 11:28:13 -05:00
5bbee12ac9 [Moshi] disable automatic compilation if the model can't compile (#35992)
moshi cant compile
2025-01-31 15:53:06 +00:00
e6f4a4ebbf [Moonshine] compute head_dim_padding at init (#35984)
compute head_dim_padding at init
2025-01-31 14:26:52 +01:00
d7188ba600 Add support for nested images to LLava and VipLLava (#35558)
* move make_flat_list_of_images and make_batched_videos to image_utils

* remove unnecessary is_vision_available

* move make_nested_list_of_images to image_utils

* fix fast pixtral image processor

* fix import mllama

* fix make_nested_list_of_images

* add tests

* convert 4d arrays/tensors to list

* add test_make_batched_videos

* add support nested batch of videos

* fix image processing qwen2vl
2025-01-30 16:49:20 -05:00
e4227eb4d4 Handle empty change indices in SAM's mask to rle conversion (#35665)
* Handle empty change indices in RLE conversion for masks

* [test] Add unit tests for RLE encoding of masks in SamProcessor

* [test] Update RLE conversion tests to use TensorFlow implementation

* [test] Fix formatting in SamProcessorTest according to check_code_quality action

* [test] Fix formatting in SamProcessorTest according to check_code_quality

* [test] Refactored rle test cases into one test and used tf tensors in tf test cases

* [test] Fix: removed self parameter from refactored methods

* [test] Removed nested methods in run-length encoding tests for PyTorch and TensorFlow

* [test] Added description to individual to run-length encoding tests for PyTorch and TensorFlow.
2025-01-30 19:08:38 +00:00
47bd4296d6 not to use A100 for benchmark.yml (#35974)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-30 18:55:36 +01:00
693328f2bc Support batching for UsefulSensors Moonshine (#35922)
* Add support for attention masking in moonshine.

Tested against Open ASR Leaderboard with batch size 256.

* Update comments and ensure attention masks are passed everywhere.

Perform attention mask downsampling inside of moonshine forward call.

* Hide padding behind conditional. Fix encoder/decoder masking.

- Correctly pipe encoder attention mask into decoder
- Add correct scaling factor if one is not already provided.
- Fix formatting with ruff

* Add auto generated modeling_moonshine file.

* Update formatting in generated model file.

* Address review comments.

* Fix typo.

* Add `pad_head_dim_to_multiple_of` to moonshine config.

* Correct args order for MooonshineConfig.

* Update configuration moonshine too.

* Update src/transformers/models/moonshine/modular_moonshine.py

* Update src/transformers/models/moonshine/configuration_moonshine.py

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-01-30 17:08:07 +01:00
5757681837 Less flaky for TimmBackboneModelTest::test_batching_equivalence (#35971)
* fix

* remove is_flaky

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-30 16:56:26 +01:00
e320d5542e Revert p_mask to a list in DQA pipeline (#35964)
* p_mask back to being a list

* Remove breakpoint
2025-01-30 15:37:59 +00:00
365fecb4d0 Whisper: fix static cache CI (#35852)
* fix

* remove overriden method

* small change
2025-01-30 12:43:00 +01:00
9725e5be2f Pixtral: vectorize patch embeddings and enable tests (#35122)
* initial POC

* - batch mix feature

* fix tests

* fix tests

* make style

* do not skip and instead fix tests

* update

* return back the test

* correct text with the correct ckpt
2025-01-30 12:40:18 +01:00
8bc4c89ee9 [bart] minor test fixes (#35965)
fix tests
2025-01-30 10:00:11 +00:00
19f2ec80cf Fix is_causal being a tensor (#35791)
* fix is_causal being a tensor

* convert in sdpa attention only when  jit tracing
2025-01-30 09:22:33 +01:00
7547f55e5d fix iterator overflow when gradient accumulation is 1 (#35960) 2025-01-29 14:45:09 -05:00
4d3b1076a1 [generate] move max time tests (#35962)
* move max time tests to their right place

* move test to the right place
2025-01-29 17:56:46 +00:00
4d1d489617 Update README.md (#35958)
There should be a dot after pip install .
2025-01-29 15:46:26 +00:00
f0ae65c198 [tests] further fix Tester object has no attribute '_testMethodName' (#35781)
* bug fix

* update with more cases

* more entries

* Fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-29 16:05:33 +01:00
ec7790f0d3 update docker file transformers-pytorch-deepspeed-latest-gpu (#35940)
update docker file for deepspeed

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-29 16:01:27 +01:00
5d257111c1 Trainer Refactor: Part 1 (#35567)
* start

* So far: 30%

* Small fix

* Continuing update

* Continuing

* Forgot to check if not None

* Continuing refactor

* Fix if else

* Fix ref

* Should make tests pass

* Keep grad norm same

* Document

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Err instead of info for logging RNG state error

* Seperate out to func

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-01-29 09:50:54 -05:00
23d782ead2 Output dicts support in text generation pipeline (#35092)
* Support for generate_argument: return_dict_in_generate=True, instead of returning a error

* fix: call test with return_dict_in_generate=True

* fix: Only import torch if it is present

* update: Encapsulate output_dict changes

* fix: added back original comments

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-01-29 14:44:46 +00:00
cf90404807 Fix flaky test_assisted_decoding_matches_greedy_search (#35951)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-29 14:50:07 +01:00
692afa102d Update squad_convert_example_to_features to work with numpy v2 (#35955)
* Fix

* Fix

* Fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-29 14:33:06 +01:00
c600e89f5c Update unwrap_and_save_reload_schedule to use weights_only=False (#35952)
* fix

* Fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-29 14:30:57 +01:00
42c8ccfd4c fix test_generated_length_assisted_generation (#34935)
fix test_generated_length_assisted_generation
2025-01-29 12:03:45 +00:00
ec7afad609 use torch constraints to check if covariance is positive definite during mean resizing. (#35693)
* use torch constraints to check for psd

* small nit

* Small change

* Small change for the ci

* nit
2025-01-28 17:33:42 +01:00
61cbb723fc Remove INC notebook reference in documentation (#35936)
remove INC notebook in documentation
2025-01-28 17:10:02 +01:00
478c4f2d0d fix(FA): QKV not being casted to target_dtype for FA with dpo lora (#35834)
fix(FA): QKV not being casted to target_dtype due to dtype check
2025-01-28 17:06:56 +01:00
ece8c42488 Test: generate with torch.compile(model.forward) as a fast test (#34544) 2025-01-28 14:10:38 +00:00
f48ecd7608 Fix TP initialization (#35860)
* fix tp

* Update modeling_utils.py

* style

* style

* Update test_tp.py

* Update test_tp.py

* style

* Update test_tp.py

* Update test_tp.py

* Update test_tp.py

* Update test_tp.py
2025-01-28 15:07:37 +01:00
f85ba20449 Qwen-2-5-VL: fix CI (#35935)
fix
2025-01-28 14:51:57 +01:00
3f860dba55 Fix mask slicing for models with HybridCache (#35681)
* correctly slice

* check mask

* Update modular_gemma2.py

* fix

* add tests

* fix typo

* finally fix mask slicing

* Finally correctly slice in all cases!!

* add test for all attention functions

* small fix in tests

* trick around dynamo tracing issue

* last update

* more robust

* kwargs propagation

* make it explicit for checkpointing

* apply modular
2025-01-28 14:35:00 +01:00
b764c20b09 Fix: loading DBRX back from saved path (#35728)
* fix dtype as dict for some models + add test

* add comment in tests
2025-01-28 11:38:45 +01:00
3613f568cd Add default TP plan for all models with backend support (#35870)
* Add some tp plans!

* More tp plans!

* Add it in the comment

* style

* Update configuration_mixtral.py

* Update configuration_phi.py

* update the layout according to special archs

* fix mixtral

* style

* trigger CIs

* trigger CIs

* CIs

* olmo2

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-28 11:20:58 +01:00
96625d85fd Use rocm6.2 for AMD images (#35930)
* Use rocm6.2 as rocm6.3 only has nightly pytorch wheels atm

* Use stable wheel index for torch libs
2025-01-28 11:10:28 +01:00
bf16a182ba Remove _supports_static_cache = True for some model classes (#34975)
* use mask_fill

* remove comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-28 10:42:10 +01:00
86d7564611 [docs] Fix Zamba2 (#35916)
fix code block
2025-01-27 11:44:10 -08:00
414658f94f Close Zamba2Config code block (#35914)
* close zamba2 code block

* Add Zamba2 to toctree
2025-01-27 19:09:42 +00:00
63e9c941eb Fix the config class comparison for remote code models (#35592)
* Fix the config class comparison when repeatedly saving and loading remote code models

* once again you have committed your debug breakpoint
2025-01-27 18:37:30 +00:00
c550a1c640 [docs] uv install (#35821)
uv install
2025-01-27 08:49:28 -08:00
cd6591bfb2 Fix typing in audio_utils.chroma_filter_bank (#35888)
* Fix typing in audio_utils.chroma_filter_bank

* Apply make style

---------

Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>
2025-01-27 16:06:03 +00:00
e57b459997 Split and clean up GGUF quantization tests (#35502)
* clean up ggml test

Signed-off-by: Isotr0py <2037008807@qq.com>

* port remaining tests

Signed-off-by: Isotr0py <2037008807@qq.com>

* further cleanup

Signed-off-by: Isotr0py <2037008807@qq.com>

* format

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix broken tests

Signed-off-by: Isotr0py <2037008807@qq.com>

* update comment

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* reorganize tests

Signed-off-by: Isotr0py <2037008807@qq.com>

* k-quants use qwen2.5-0.5B

Signed-off-by: Isotr0py <2037008807@qq.com>

* move ggml tokenization test

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove dead code

Signed-off-by: Isotr0py <2037008807@qq.com>

* add assert for serilization test

Signed-off-by: Isotr0py <2037008807@qq.com>

* use str for parameterize

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-27 15:46:57 +01:00
5c576f5a66 🚨🚨🚨 image-classification pipeline single-label and multi-label prob type squashing fns (sigmoid vs softmax) are backwards (#35848)
single-label and multi-label prob type squashing fns (sigmoid vs softmax) were backwards for image-classification pipeline
2025-01-27 15:34:57 +01:00
5450e7c84a 🔴 🔴 🔴 Added segmentation maps support for DPT image processor (#34345)
* Added `segmentation_maps` support for DPT image processor

* Added tests for dpt image processor

* Moved preprocessing into separate functions

* Added # Copied from statements

* Fixed # Copied from statements

* Added `segmentation_maps` support for DPT image processor

* Added tests for dpt image processor

* Moved preprocessing into separate functions

* Added # Copied from statements

* Fixed # Copied from statements
2025-01-27 15:14:00 +01:00
a50befa9b9 Update deepspeed amd image (#35906) 2025-01-27 14:32:36 +01:00
33cb1f7b61 Add Zamba2 (#34517)
* First commit

* Finish model implementation

* First commit

* Finish model implementation

* Register zamba2

* generated modeling and configuration

* generated modeling and configuration

* added hybrid cache

* fix attention_mask in mamba

* dropped unused loras

* fix flash2

* config docstrings

* fix config and fwd pass

* make fixup fixes

* text_modeling_zamba2

* small fixes

* make fixup fixes

* Fix modular model converter

* added inheritances in modular, renamed zamba cache

* modular rebase

* new modular conversion

* fix generated modeling file

* fixed import for Zamba2RMSNormGated

* modular file cleanup

* make fixup and model tests

* dropped inheritance for Zamba2PreTrainedModel

* make fixup and unit tests

* Add inheritance of rope from GemmaRotaryEmbedding

* moved rope to model init

* drop del self.self_attn and del self.feed_forward

* fix tests

* renamed lora -> adapter

* rewrote adapter implementation

* fixed tests

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Fix torch_forward in mamba2 layer

* Dropped adapter in-place sum

* removed rope from attention init

* updated rope

* created get_layers method

* make fixup fix

* make fixup fixes

* make fixup fixes

* update to new attention standard

* update to new attention standard

* make fixup fixes

* minor fixes

* cache_position

* removed cache_position postion_ids use_cache

* remove config from modular

* removed config from modular (2)

* import apply_rotary_pos_emb from llama

* fixed rope_kwargs

* Instantiate cache in Zamba2Model

* fix cache

* fix @slow decorator

* small fix in modular file

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* several minor fixes

* inherit mamba2decoder fwd and drop position_ids in mamba

* removed docstrings from modular

* reinstate zamba2 attention decoder fwd

* use regex for tied keys

* Revert "use regex for tied keys"

This reverts commit 9007a522b1f831df6d516a281c0d3fdd20a118f5.

* use regex for tied keys

* add cpu to slow forward tests

* dropped config.use_shared_mlp_adapter

* Update docs/source/en/model_doc/zamba2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* re-convert from modular

---------

Co-authored-by: root <root@node-2.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-27 10:51:23 +01:00
14a9bb520e Fix fast image processor warnings in object detection examples (#35892)
Have the DETR examples default to using the fast image  processor
2025-01-27 08:32:44 +00:00
f11f57c925 [doctest] Fixes (#35863)
doctest fixes
2025-01-26 15:26:38 -08:00
fc269f77da Add Rocketknight1 to self-comment-ci.yml (#35881)
my bad

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-24 19:07:07 +00:00
bcb841f007 add xpu device check in device_placement (#35865)
add xpu device
2025-01-24 19:13:07 +01:00
b912f5ee43 use torch.testing.assertclose instead to get more details about error in cis (#35659)
* use torch.testing.assertclose instead to get more details about error in cis

* fix

* style

* test_all

* revert for I bert

* fixes and updates

* more image processing fixes

* more image processors

* fix mamba and co

* style

* less strick

* ok I won't be strict

* skip and be done

* up
2025-01-24 16:55:28 +01:00
72d1a4cd53 Fix Llava-NeXT / Llava-NeXT Video / Llava-OneVision's token unpadding mismatch (#35779)
* Fix Llava OneVision's token padding

* Fix Llava next and Llava next video's token unpadding for consistency
2025-01-24 09:10:27 +01:00
b5aaf87509 Fix test_pipelines_video_classification that was always failing (#35842)
* Fix test_pipelines_video_classification that was always failing

* Update video pipeline docstring to reflect actual return type

---------

Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>
2025-01-23 19:22:32 +01:00
328e2ae4c0 fix apply_chat_template() padding choice (#35828)
fix apply_chat_template() padding choice to bool, str, PaddingStrategy and the docstring of pad()
2025-01-23 17:32:32 +00:00
d2a424b550 Fix typo (#35854) 2025-01-23 17:32:18 +00:00
045c02f209 [DOC] Fix contamination and missing paragraph in translation (#35851)
Fix contamination and missing paragraph in translation
2025-01-23 08:33:44 -08:00
71cc8161b2 Granite Vision Support (#35579)
* Add multimodal granite support

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

Support multiple image feature layres

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Remove failing validation for visual encoders with no cls

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Update llava based models / configs to support list of feature layers

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Add tests for multiple feature layers

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Use conditional instead of except for misaligned feature shapes

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* crop cls from each hidden state

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* Fix formatting

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Support single vision feature int in vipllava

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>

* Fix typo in vision feature selection strategy validation

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* Add tentative integration test for granite vision models

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* Add granite vision docs

Replace multimodal granite refs with granite vision

Add granite vision / llava next alias

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

* Use image url in granitevision example

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

---------

Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
2025-01-23 17:15:52 +01:00
8f1509a96c Fix more CI tests (#35661)
add tooslow for the fat ones
2025-01-23 14:45:42 +01:00
0a950e0bbe Fix uploading processors/tokenizers to WandB on train end (#35701)
* rename tokenizer to processing_class in WandbCallback.on_train_end

* rename tokenizer to processing_class in ClearMLCallback and DVCLiveCallback
2025-01-23 13:32:15 +01:00
4ec425ffad Fix GA loss for Deepspeed (#35808)
* Fix GA loss for Deepspeed

* Turn off loss scaling in DeepSpeed engine by scale_wrt_gas

* Add comment linking to PR
2025-01-23 11:45:02 +01:00
f3f6c86582 add qwen2.5vl (#35569)
* add qwen2.5vl

* fix

* pass check table

* add modular file

* fix style

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>

* padd copy check

* use modular

* fix

* fix

* fix

* update flashatt2&sdpa support_list

* Update docs/source/en/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/qwen2_5_vl.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update config

* update

* fix hf path

* rename Qwen2_5_VLVideosKwargs

* fix

* fix

* update

* excuted modular

* rollback init

* fix

* formated

* simpler init

* fix

* fix

* fix

* fix

* fix

* update docs

* fix

* fix

* update Qwen2VLRotaryEmbedding for yarn

* fix

---------

Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: gewenbin0992 <gewenbin292@163.com>
Co-authored-by: gewenbin0992 <67409248+gewenbin0992@users.noreply.github.com>
2025-01-23 11:23:00 +01:00
d3af76df58 [Backend support] Allow num_logits_to_keep as Tensor + add flag (#35757)
* support

* Update modeling_utils.py

* style

* most models

* Other models

* fix-copies

* tests + generation utils
2025-01-23 09:47:54 +01:00
8736e91ad6 [ tests] remove some flash attention class tests (#35817)
remove class from tests
2025-01-23 09:44:21 +01:00
2c3a44f9a7 Fix NoneType type as it requires py>=3.10 (#35843)
fix type
2025-01-22 15:56:53 +00:00
fdcc62c855 Add PyTorch version check for FA backend on AMD GPUs (#35813)
Disable FA backend for SDPA on AMD GPUs (PyTorch < 2.4.1)
2025-01-22 16:09:23 +01:00
3b9770581e Fix compatibility issues when using auto_gptq with these older versions (#35830)
convert_model method of optimum only accepts a single nn.Module type model parameter for versions less than 1.23.99.
2025-01-22 15:46:47 +01:00
62bd83947a [chat] docs fix (#35840)
docs fix
2025-01-22 14:32:27 +00:00
487e2f63bd Fix head_dim in config extracted from Gemma2 GGUF model (#35818)
fix gemma2 head dim

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-01-22 15:22:04 +01:00
b3d6722469 [Chat] Add Chat from TRL 🐈 (#35714)
* tmp commit

* add working chat

* add docts

* docs 2

* use auto dtype by default
2025-01-22 13:30:12 +00:00
a7738f5a89 Fix : Nemotron tokenizer for GGUF format (#35836)
fix nemotron gguf
2025-01-22 12:28:40 +01:00
ec28957f94 [pipeline] missing import regarding assisted generation (#35752)
missing import
2025-01-22 10:34:28 +00:00
36c9181f5c [gpt2] fix generation tests (#35822)
fix gpt2 generation tests
2025-01-22 09:41:04 +00:00
f439e28d32 Hotfix: missing working-directory in self-comment-ci.yml (#35833)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-22 10:25:50 +01:00
373e50e970 Init cache on meta device (#35164)
* init cache on meta device

* offloaded static + enable tests

* tests weren't running before  :(

* update

* fix mamba

* fix copies

* update

* address comments and fix tests

* fix copies

* Update src/transformers/cache_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update

* mamba fix

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-22 09:49:17 +01:00
870e2c8ea0 Another security patch for self-comment-ci.yml (#35816)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-22 09:29:54 +01:00
f4f33a20a2 Remove pyav pin to allow python 3.11 to be used (#35823)
* Remove pyav pin to allow python 3.11 to be used

* Run make fixup

---------

Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>
2025-01-21 20:16:18 +00:00
90b46e983f Remove old benchmark code (#35730)
* remove traces of the old deprecated benchmarks

* also remove old tf benchmark example, which uses deleted code

* run doc builder
2025-01-21 17:56:43 +00:00
870eb7b41b [Mimi] update test expected values for t4 runners (#35696)
update values for t4
2025-01-21 18:23:36 +01:00
8ac851b0b3 Improve modular documentation (#35737)
* start a nice doc

* keep improving the doc

* Finalize doc

* Update modular_transformers.md

* apply suggestion
2025-01-21 17:53:30 +01:00
107f9f5127 add Qwen2-VL image processor fast (#35733)
* add qwen2_vl image processor fast

* add device to ImagesKwargs

* remove automatic fix copies

* fix fast_is_faster_than_slow

* remove unnecessary import
2025-01-21 11:49:05 -05:00
3df90103b8 move fastspeech to audio models (#35788) 2025-01-21 08:32:09 -08:00
741d55237a [i18n-ar] Translated file: docs/source/ar/tasks/masked_language_modeling.md into Arabic (#35198)
* إضافة الترجمة العربية: masked_language_modeling.md

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/masked_language_modeling.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update _toctree.yml

* Update _toctree.yml

* Add language_modeling.md

* Add Sequence_classifiation.md

* Update _toctree.yml

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2025-01-21 08:29:58 -08:00
568941bf11 Optimized set_initialized_submodules. (#35493) 2025-01-21 17:01:28 +01:00
7051c5fcc8 Remove deprecated get_cached_models (#35809)
* Remove deprecated get_cached_models

* imports
2025-01-21 16:08:31 +01:00
97fbaf0861 Fixed typo in autoawq version number in an error message for IPEX backend requirements. (#35815)
Fixed typo in version number for IPEX backend required minimal autoawq version
2025-01-21 14:42:44 +00:00
dbd8474125 Fix : BLOOM tie_word_embeddings in GGUF (#35812)
* fix bloom ggml

* fix falcon output

* make style
2025-01-21 15:35:54 +01:00
678bd7f1ce Auto-add timm tag to timm-wrapper models. (#35794)
Works for fine-tuned or exported models:

```py
from transformers import AutoModelForImageClassification

checkpoint = "timm/vit_base_patch16_224.augreg2_in21k_ft_in1k"
model = AutoModelForImageClassification.from_pretrained(checkpoint)

model.push_to_hub("pcuenq/tw1")
```

The uploaded model will now show snippets for both the timm and the
transformers libraries.
2025-01-21 14:34:45 +01:00
dc10f7906a Support adamw_torch_8bit (#34993)
* var

* more

* test
2025-01-21 14:17:49 +01:00
f82b19cb6f add a new flax example for Bert model inference (#34794)
* add a new example for flax inference cases

* Update examples/flax/language-modeling/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/flax/language-modeling/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/flax/language-modeling/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/flax/language-modeling/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/flax/language-modeling/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update examples/flax/language-modeling/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix for "make fixup"

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-21 14:09:29 +01:00
edbabf6b82 [Doc] Adding blog post to model doc for TimmWrapper (#35744)
* adding blog post to model doc

* Update docs/source/en/model_doc/timm_wrapper.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* review suggestions

* review suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-21 12:32:39 +00:00
fd8d61fdb2 Byebye test_batching_equivalence's flakiness (#35729)
* fix

* fix

* skip

* better error message

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-21 13:11:33 +01:00
78f5ee0217 Add LlavaImageProcessor (#33191)
* First draft

* Add equivalence test

* Update docstrings

* Add tests

* Use numpy

* Fix tests

* Improve variable names

* Improve docstring

* Add link

* Remove script

* Add copied from

* Address comment

* Add note in docs

* Add docstring, data format

* Improve test

* Add test

* update

* Update src/transformers/models/llava/image_processing_llava.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/llava/image_processing_llava.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* loop once only

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-21 12:47:04 +01:00
8e4cedd9ca Update AMD Docker image (#35804) 2025-01-21 12:11:23 +01:00
705aeaaa12 Fix "test_chat_template_dict" in video LLMs (#35660)
* fix  "test_chat_template_dict" in llava_onevision

* Update src/transformers/models/llava_next_video/processing_llava_next_video.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* get one video calles once

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-21 10:23:40 +01:00
e867b97443 Deterministic sorting in modular converter when adding new functions (#35795)
deterministic sort
2025-01-21 09:38:48 +01:00
920f34a772 modular_model_converter bugfix on assignments (#35642)
* added bugfix in modular converter to keep modular assignments for docstrings, expected outputs etc.

* revert stracoder2 docstring copying, add forward in EMU3 to enable docstring assingment, remove verbatim assignments in modular converter

* added _FOR_DOC in assignments to keep, corrected wrong checkpoint name in ijepa's configuration
2025-01-21 08:06:44 +01:00
234168c4dc Fixes, improvements to timm import behaviour (#35800)
* Fix timm dummy import logic

* Add requires to TimmWrapperConfig.from_dict so users see a helpful import error message if timm not installed
2025-01-20 13:17:01 -08:00
44393df089 Tool calling: support more types (#35776)
* Tool calling: support NoneType for function return type
2025-01-20 19:15:34 +01:00
f19135afc7 fix low-precision audio classification pipeline (#35435)
* fix low-precision audio classification pipeline

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torch import

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torch import

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-01-20 16:20:51 +00:00
641238eb76 Fix vits low-precision dtype (#35418)
* fix vits dtype

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* use weight dtype

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-01-20 16:19:31 +00:00
729b569531 fix document qa bf16 pipeline (#35456)
* fix document qa bf16 pipeline

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix test

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-01-20 16:18:07 +00:00
ec97417827 Don't import torch.distributed when it's not available (#35777)
This is a continuation of 217c47e31bc0cd442443e5b4a62c8bc2785d53ee but
for another module. This issue was spotted in nixpkgs (again) when
building lm-eval package that used a different path in transformers
library to reach the same failure.

Related: #35133
2025-01-20 17:10:35 +01:00
5f0f4b1b93 Patch moonshine (#35731)
* udpate expected logits for T4 runners

* update doc

* correct order of the args for better readability

* remove generate wrap

* convert modular
2025-01-20 16:19:29 +01:00
a142f16131 transformers.image_transforms.normalize wrong types (#35773)
transformers.image_transforms.normalize documents and checks for the wrong type for std and mean arguments

Co-authored-by: Louis Groux <louis.cal.groux@gmail.com>
2025-01-20 15:00:46 +00:00
3998fa8aab [fix] cannot import name 'Pop2PianoFeatureExtractor' from 'transformers' (#35604)
* update pop2piano __init__

* add lib check

* update fix

* revert
2025-01-20 15:21:45 +01:00
b80e334e71 Skip Falcon 7B GGML Test (#35783)
skip test
2025-01-20 15:00:34 +01:00
68947282fc remove code owners as it was generating too much noise BUT (#35784)
remove code owners
2025-01-20 14:18:03 +01:00
135e86aa54 Remove read_video and run 2025-01-20 13:40:57 +01:00
88b95e6179 [generate] update docstring of SequenceBiasLogitsProcessor (#35699)
* fix docstring

* space
2025-01-20 11:00:15 +00:00
56afd2f488 fix register_buffer in MimiEuclideanCodebook (#35759)
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-01-20 11:54:58 +01:00
abe57b6f17 Add SuperGlue model (#29886)
* Initial commit with template code generated by transformers-cli

* Multiple additions to SuperGlue implementation :

- Added the SuperGlueConfig
- Added the SuperGlueModel and its implementation
- Added basic weight conversion script
- Added new ImageMatchingOutput dataclass

* Few changes for SuperGlue

* Multiple changes :
- Added keypoint detection config to SuperGlueConfig
- Completed convert_superglue_to_pytorch and succesfully run inference

* Reverted unintentional change

* Multiple changes :
 - Added SuperGlue to a bunch of places
 - Divided SuperGlue into SuperGlueForImageMatching and SuperGlueModel
 - Added testing images

* Moved things in init files

* Added docs (to be finished depending on the final implementation)

* Added necessary imports and some doc

* Removed unnecessary import

* Fixed make fix-copies bug and ran it

* Deleted SuperGlueModel
Fixed convert script

* Added SuperGlueImageProcessor

* Changed SuperGlue to support batching pairs of images and modified ImageMatchingOutput in consequences

* Changed convert_superglue_to_hf.py script to experiment different ways of reading an image and seeing its impact on performances

* Added initial tests for SuperGlueImageProcessor

* Added AutoModelForImageMatching in missing places and tests

* Fixed keypoint_detector_output instructions

* Fix style

* Adapted to latest main changes

* Added integration test

* Fixed bugs to pass tests

* Added keypoints returned by keypoint detector in the output of SuperGlue

* Added doc to SuperGlue

* SuperGlue returning all attention and hidden states for a fixed number of keypoints

* Make style

* Changed SuperGlueImageProcessor tests

* Revert "SuperGlue returning all attention and hidden states for a fixed number of keypoints"
Changed tests accordingly

This reverts commit 5b3b669c

* Added back hidden_states and attentions masked outputs with tests

* Renamed ImageMatching occurences into KeypointMatching

* Changed SuperGlueImageProcessor to raise error when batch_size is not even

* Added docs and clarity to hidden state and attention grouping function

* Fixed some code and done refactoring

* Fixed typo in SuperPoint output doc

* Fixed some of the formatting and variable naming problems

* Removed useless function call

* Removed AutoModelForKeypointMatching

* Fixed SuperGlueImageProcessor to only accept paris of images

* Added more fixes to SuperGlueImageProcessor

* Simplified the batching of attention and hidden states

* Simplified stack functions

* Moved attention instructions into class

* Removed unused do_batch_norm argument

* Moved weight initialization to the proper place

* Replaced deepcopy for instantiation

* Fixed small bug

* Changed from stevenbucaille to magic-leap repo

* Renamed London Bridge images to Tower Bridge

* Fixed formatting

* Renamed remaining "london" to "tower"

* Apply suggestions from code review

Small changes in the docs

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Added AutoModelForKeypointMatching

* Changed images used in example

* Several changes to image_processing_superglue and style

* Fixed resample type hint

* Changed SuperGlueImageProcessor and added test case for list of 2 images

* Changed list_of_tuples implementation

* Fix in dummy objects

* Added normalize_keypoint, log_sinkhorn_iterations and log_optimal_transport docstring

* Added missing docstring

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Moved forward block at bottom

* Added docstring to forward method

* Added docstring to match_image_pair method

* Changed test_model_common_attributes to test_model_get_set_embeddings test method signature

* Removed AutoModelForKeypointMatching

* Removed image fixtures and added load_dataset

* Added padding of images in SuperGlueImageProcessor

* Cleaned up convert_superglue_to_hf script

* Added missing docs and fixed unused argument

* Fixed SuperGlueImageProcessor tests

* Transposed all hidden states from SuperGlue to reflect the standard (..., seq_len, feature_dim) shape

* Added SuperGlueForKeypointMatching back to modeling_auto

* Fixed image processor padding test

* Changed SuperGlue docs

* changes:
 - Abstraction to batch, concat and stack of inconsistent tensors
 - Changed conv1d's to linears to match standard attention implementations
 - Renamed all tensors to be tensor0 and not tensor_0 and be consistent
 - Changed match image pair to run keypoint detection on all image first, create batching tensors and then filling these tensors matches after matches
 - Various changes in docs, etc

* Changes to SuperGlueImageProcessor:
- Reworked the input image pairs checking function and added tests accordingly
- Added Copied from statements
- Added do_grayscale tag (also for SuperPointImageProcessor)
- Misc changes for better code

* Formatting changes

* Reverted conv1d to linear conversion because of numerical differences

* fix: changed some code to be more straightforward (e.g. filtering keypoints) and converted plot from opencv to matplotlib

* fix: removed unnecessary test

* chore: removed commented code and added back hidden states transpositions

* chore: changed from "inconsistent" to "ragged" function names as suggested

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* docs: applied suggestions

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* docs: updated to display matched output

* chore: applied suggestion for check_image_pairs_input function

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* chore: changed check_image_pairs_input function name to validate_and_format_image_pairs and used validate_preprocess_arguments function

* tests: simplified tests for image input format and shapes

* feat: converted SuperGlue's use of Conv1d with kernel_size of 1 with Linear layers. Changed tests and conversion script accordingly

* feat: several changes to address comments

Conversion script:
- Reverted fuse batchnorm to linear conversion
- Changed all 'nn.Module' to respective SuperGlue models
- Changed conversion script to use regex mapping and match other recent scripts

Modeling SuperGlue:
- Added batching with mask and padding to attention
- Removed unnecessary concat, stack and batch ragged pairs functions
- Reverted batchnorm layer
- Renamed query, key, value and merge layers into q, k, v, out proj
- Removed Union of different Module into nn.Module in _init_weights method typehint
- Changed several method's signature to combine image0 and image1 inputs with appropriate doc changes
- Updated SuperGlue's doc with torch.no_grad()

Updated test to reflect changes in SuperGlue model

* refactor: changed validate_and_format_image_pairs function with clarity

* refactor: changed from one SuperGlueMLP class to a list of SuperGlueMLP class

* fix: fixed forgotten init weight change from last commit

* fix: fixed rebase mistake

* fix: removed leftover commented code

* fix: added typehint and changed some of arguments default values

* fix: fixed attribute default values for SuperGlueConfig

* feat: added SuperGlueImageProcessor post process keypoint matching method with tests

* fix: fixed SuperGlue attention and hidden state tuples aggregation

* chore: fixed mask optionality and reordered tensor reshapes to be cleaner

* chore: fixed docs and error message returned in validate_and_format_image_pairs function

* fix: fixed returned keypoints to be the ones that SuperPoint returns

* fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue

* fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue (bis)

* fix: Changed SuperGlueMultiLayerPerceptron instantiation to avoid if statement

* fix: Changed convert_superglue_to_hf script to reflect latest SuperGlue changes and got rid of nn.Modules

* WIP: implement Attention from an existing class (like BERT)

* docs: Changed docs to include more appealing matching plot

* WIP: Implement Attention

* chore: minor typehint change

* chore: changed convert superglue script by removing all classes and apply conv to linear conversion in state dict + rearrange keys to comply with changes in model's layers organisation

* Revert "Fixed typo in SuperPoint output doc"

This reverts commit 2120390e827f94fcd631c8e5728d9a4980f4a503.

* chore: added comments in SuperGlueImageProcessor

* chore: changed SuperGlue organization HF repo to magic-leap-community

* [run-slow] refactor: small change in layer instantiation

* [run-slow] chore: replaced remaining stevenbucaille org to magic-leap-community

* [run-slow] chore: make style

* chore: update image matching fixture dataset HF repository

* [run-slow] superglue

* tests: overwriting test_batching_equivalence

* [run-slow] superglue

* tests: changed test to cope with value changing depending on cuda version

* [run-slow] superglue

* tests: changed matching_threshold value

* [run-slow] superglue

* [run-slow] superglue

* tests: changed tests for integration

* [run-slow] superglue

* fix: Changed tensor view and permutations to match original implementation results

* fix: updated convert script and integration test to include last change in model

* fix: increase tolerance for CUDA variances

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* [run-slow] superglue

* chore: removed blank whitespaces

* [run-slow] superglue

* Revert SuperPoint image processor accident changes

* [run-slow] superglue

* refactor: reverted copy from BERT class

* tests: lower the tolerance in integration tests for SuperGlue

* [run-slow] superglue

* chore: set do_grayscale to False in SuperPoint and SuperGlue image processors

* [run-slow] superglue

* fix: fixed imports in SuperGlue files

* chore: changed do_grayscale SuperGlueImageProcessing default value to True

* docs: added typehint to post_process_keypoint_matching method in SuperGlueImageProcessor

* fix: set matching_threshold default value to 0.0 instead of 0.2

* feat: added matching_threshold to post_process_keypoint_matching method

* docs: update superglue.md to include matching_threshold parameter

* docs: updated SuperGlueConfig docstring for matching_threshold default value

* refactor: removed unnecessary parameters in SuperGlueConfig

* fix: changed from matching_threshold to threshold

* fix: re-revert changes to make SuperGlue attention classes copies of BERT

* [run-slow] superglue

* fix: added missing device argument in post_processing method

* [run-slow] superglue

* fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching (and docstring)

* fix: add device to image_sizes tensor instantiation

* tests: added checks on do_grayscale test

* chore: reordered and added Optional typehint to KeypointMatchingOutput

* LightGluePR suggestions:
- use `post_process_keypoint_matching` as default docs example
- add `post_process_keypoint_matching` in autodoc
- add `SuperPointConfig` import under TYPE_CHECKING condition
- format SuperGlueConfig docstring
- add device in convert_superglue_to_hf
- Fix typo
- Fix KeypointMatchingOutput docstring
- Removed unnecessary line
- Added missing SuperGlueConfig in __init__ methods

* LightGluePR suggestions:
- use batching to get keypoint detection

* refactor: processing images done in 1 for loop instead of 4

* fix: use @ instead of torch.einsum for scores computation

* style: added #fmt skip to long tensor values

* refactor: rollbacked validate_and_format_image_pairs valid and invalid case to more simple ones

* refactor: prepare_imgs

* refactor: simplified `validate_and_format_image_pairs`

* docs: fixed doc

---------

Co-authored-by: steven <steven.bucaillle@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-20 10:32:39 +00:00
872dfbdd46 [ViTPose] Convert more checkpoints (#35638)
* Convert more checkpoints

* Update docs, convert huge variant

* Update model name

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Remove print statements

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Link to collection

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-20 11:29:47 +01:00
332fa024d6 Security fix for self-comment-ci.yml (#35548)
* Revert "Disable  `.github/workflows/self-comment-ci.yml` for now (#35366)"

This reverts commit ccc4a5a59b2d4134a49971915db0710e7a8c7824.

* fix

* fix

* fix

* least permission

* add env

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-20 11:16:03 +01:00
8571bb145a Fix CI for VLMs (#35690)
* fix some easy test

* more tests

* remove logit check here also

* add require_torch_large_gpu in Emu3
2025-01-20 11:15:39 +01:00
5fa3534475 Use AMD CI workflow defined in hf-workflows (#35058)
* Use AMD CI workflow defined in hf-workflows
2025-01-17 20:52:57 +01:00
7d4b3ddde4 ci: fix xpu skip condition for test_model_parallel_beam_search (#35742)
`return unittest.skip()` used in the `test_model_parallel_beam_search` in
skip condition for xpu did not actually mark test to be skipped running
under pytest:
* 148 passed, 1 skipped

Other tests use `self.skipTest()`. Reusing this approach and moving the
condition outside the loop (since it does not depend on it) allows to skip
for xpu correctly:
* 148 skipped

Secondly, `device_map="auto"` is now implemented for XPU for IPEX>=2.5 and
torch>=2.6, so we can now enable these tests for XPU for new IPEX/torch
versions.

Fixes: 1ea3ad1ae ("[tests] use `torch_device` instead of `auto` for model testing (#29531)")

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-01-17 16:47:27 +01:00
8ad6bd0f1b Stop mutating input dicts in audio classification pipeline (#35754) 2025-01-17 15:41:56 +00:00
936a731534 Revert "Unable to use MimiModel with DeepSpeed ZeRO-3" (#35755)
Revert "Unable to use `MimiModel` with DeepSpeed ZeRO-3 (#34735)"

This reverts commit 54fd7e92604e8ecb2f4601aae2f75322af9184c5.
2025-01-17 16:29:26 +01:00
10e8cd0d63 Restore is_torch_greater_or_equal_than for backward compatibility (#35734)
* Restore is_torch_greater_or_equal_than for backward compatibility

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

* review comments

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

---------

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-17 16:22:44 +01:00
099d93d2e9 Grounding DINO Processor standardization (#34853)
* Add input ids to model output

* Add text preprocessing for processor

* Fix snippet

* Add test for equivalence

* Add type checking guard

* Fixing typehint

* Fix test for added `input_ids` in output

* Add deprecations and "text_labels" to output

* Adjust tests

* Fix test

* Update code examples

* Minor docs and code improvement

* Remove one-liner functions and rename class to CamelCase

* Update docstring

* Fixup
2025-01-17 14:18:16 +00:00
42b2857b01 OmDet Turbo processor standardization (#34937)
* Fix docstring

* Fix docstring

* Add `classes_structure` to model output

* Update omdet postprocessing

* Adjust tests

* Update code example in docs

* Add deprecation to "classes" key in output

* Types, docs

* Fixing test

* Fix missed clip_boxes

* [run-slow] omdet_turbo

* Apply suggestions from code review

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Make CamelCase class

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-01-17 14:10:19 +00:00
94ae9a8da1 OwlViT/Owlv2 post processing standardization (#34929)
* Refactor owlvit post_process_object_detection + add text_labels

* Fix copies in grounding dino

* Sync with Owlv2 postprocessing

* Add post_process_grounded_object_detection method to processor, deprecate post_process_object_detection

* Add test cases

* Move text_labels to processors only

* [run-slow] owlvit owlv2

* [run-slow] owlvit, owlv2

* Update snippets

* Update docs structure

* Update deprecated objects for check_repo

* Update docstring for post processing of image guided object detection
2025-01-17 13:58:28 +00:00
add5f0566c Added liger_kernel compatibility with PeftModel (#35680)
* Added liger_kernel compatibility with `PeftModel`

* Amending based on review comments

* Amending based on review comments
2025-01-17 14:43:20 +01:00
df6d42a914 check is added for the report_to variable in TrainingArguments (#35403)
check for report_to variable is added
2025-01-17 14:39:32 +01:00
54fd7e9260 Unable to use MimiModel with DeepSpeed ZeRO-3 (#34735)
use torch.tensor(), not torch.Tensor()

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-01-17 14:06:20 +01:00
ab1afd56f5 Fix some tests (#35682)
* cohere tests

* glm tests

* cohere2 model name

* create decorator

* update

* fix cohere2 completions

* style

* style

* style

* add cuda in comments
2025-01-17 12:10:43 +00:00
8c1b5d3782 🚨🚨🚨 An attempt to fix #29554. Include 'LayerNorm.' in gamma/beta rename scope, optimize string search. (#35615)
* An attempt to fix #29554. Include 'LayerNorm.' in gamma/beta rename scope, reduce number of characters searched on every load considerably.

* Fix fix on load issue

* Fix gamma/beta warning test

* A style complaint

* Improve efficiency of weight norm key rename. Add better comments about weight norm and layer norm renaming.

* Habitual elif redunant with the return
2025-01-16 17:25:44 -08:00
02a492a838 Added resource class configuration option for check_circleci_user job (#32866)
Added resource class configuration option for check_circleci_user job.
2025-01-16 21:31:18 +01:00
94af1c0aa2 [generate] return Cache object even if passed in a legacy format (#35673)
* generate returns a Cache object by default

* fix tests

* fix test for encoder-decoder models
2025-01-16 17:06:24 +00:00
2818307e93 [generate] can instantiate GenerationConfig(cache_implementation="static") (#35679)
fix failing instantiation
2025-01-16 17:04:54 +00:00
aaa969e97d Remove pt_to_tf (#35672)
* rm command

* remove exception
2025-01-16 17:03:37 +00:00
80dbbd103c 🧹 remove generate-related objects and methods scheduled for removal in v4.48 (#35677)
* remove things scheduled for removal

* make fixup
2025-01-16 17:03:20 +00:00
aeeceb9916 [cache] add a test to confirm we can use cache at train time (#35709)
* add test

* augment test as suggested

* Update tests/utils/test_modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* rerun tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-16 17:02:34 +00:00
57bf1a12a0 Remove batch size argument warning when unjustified (#35519)
* use max batch size

* revert unneccessary change

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-01-16 17:48:11 +01:00
91be6a5eb2 Modular: support for importing functions from any file (#35692)
* fix function imports

* improve comment

* Update modeling_switch_function.py

* make checks more robust

* improvement

* rename

* final test update
2025-01-16 16:37:53 +00:00
8ebe9d7166 Optimize ForCausalLMLoss by removing unnecessary contiguous() call to reduce memory overhead (#35646)
Optimize ForCausalLMLoss by removing unnecessary contiguous() calls to reduce memory overhead
2025-01-16 15:47:43 +00:00
1302c32a84 Add proper jinja2 error (#35533)
* Cleanup jinja2 imports

* Raise a proper error if Jinja is missing

* make fixup
2025-01-16 15:31:11 +00:00
3292e96a4f [generation] fix type hint (#35725)
fix type hint
2025-01-16 15:09:59 +00:00
8b78d9d6e7 Fix the bug that Trainer cannot correctly call torch_jit_model_eval (#35722)
Fix the bug that the accelerator.autocast does not pass parameters correctly when calling torch_jit_model_eval (#35706)
2025-01-16 15:53:37 +01:00
2cbcc5877d Fix condition when GA loss bug fix is not performed (#35651)
* fix condition when GA loss bug fix is not performed

* max loss diff is 2.29

* fix typo

* add an extra validation that loss should not vary too much
2025-01-16 13:59:53 +01:00
fd4f14c968 Fix: Falcon tie_word_embeddings in GGUF (#35715)
* fix falcon tie_word_embeddings

* fix style
2025-01-16 13:18:22 +01:00
bef7dded22 Replace deprecated batch_size with max_batch_size when using HybridCache (#35498)
* Replace deprecated batch_size with max_batch_size

- Functionality remains the same, because property getter batch_size(self) returned max_batch_size anyways.
- This change just avoids an unnecessary warning about deprecation.

* Use max_batch_size instead of deprecated batch_size with HybridCache

* Use max_batch_size instead of deprecated batch_size with HybridCache

- Change generated code to match original source
2025-01-16 11:48:41 +00:00
99e0ab6ed8 Fix typo in /docs/source/ja/model_doc/decision_transformer.md URL (#35705)
doc: Update original code repository URL
2025-01-15 07:36:50 -08:00
12dfd99007 Fix : Nemotron Processor in GGUF conversion (#35708)
* fixing nemotron processor

* make style
2025-01-15 14:25:44 +01:00
387663e571 Enable gptqmodel (#35012)
* gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update readme

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* gptqmodel need use checkpoint_format (#1)

* gptqmodel need use checkpoint_format

* fix quantize

* Update quantization_config.py

* Update quantization_config.py

* Update quantization_config.py

---------

Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* Revert quantizer_gptq.py (#2)

* revert quantizer_gptq.py change

* pass **kwargs

* limit gptqmodel and optimum version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix warning

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix version check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert unrelated changes

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix requires gptq

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Fix Transformer compat (#3)

* revert quantizer_gptq.py change

* pass **kwargs

* add meta info

* cleanup

* cleanup

* Update quantization_config.py

* hf_select_quant_linear pass checkpoint_format and meta

* fix GPTQTestCUDA

* Update test_gptq.py

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* cleanup

* add backend

* cleanup

* cleanup

* no need check exllama version

* Update quantization_config.py

* lower checkpoint_format and backend

* check none

* cleanup

* Update quantization_config.py

* fix self.use_exllama == False

* spell

* fix unittest

* fix unittest

---------

Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format again

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gptqmodel version (#6)

* update gptqmodel version

* update gptqmodel version

* fix unit test (#5)

* update gptqmodel version

* update gptqmodel version

* "not self.use_exllama" is not equivalent to "self.use_exllama==False"

* fix unittest

* update gptqmodel version

* backend is loading_attibutes (#7)

* fix format and tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix memory check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device mismatch

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix result check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* update tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* review: update docs (#10)

* review: update docs (#12)

* review: update docs

* fix typo

* update tests for gptqmodel

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update document (#9)

* update overview.md

* cleanup

* Update overview.md

* Update overview.md

* Update overview.md

* update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

---------

Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

* typo

* doc note for asymmetric quant

* typo with apple silicon(e)

* typo for marlin

* column name revert: review

* doc rocm support

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-15 14:22:49 +01:00
615bf9c5e4 Add future import for Py < 3.10 (#35666)
* Add future import for Py < 3.10

* make fixup

* Same issue in convert_olmo2_weights_to_hf.py
2025-01-15 12:45:43 +00:00
09d5f76274 Clean-up composite configs (#34603)
* remove manual assignment tie-word-embeddings

* remove another unused attribute

* fix tests

* fix tests

* remove unnecessary overwrites

* fix

* decoder=True

* clean pix2struct

* run-all

* forgot `_tied_weights_keys` when adding Emu3

* also Aria + fix-copies

* and clean aria
2025-01-15 10:04:07 +01:00
c61fcde910 Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities (#35251)
* DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing

* DataCollatorForLanguageModeling class was updated with new parameters that provides more control over the token masking and relacing

* Addressed review comments, modified the docstring and made a test for the DataCollatorForLanguageModeling
2025-01-14 17:01:10 +00:00
b0cdbd9119 Enhanced Installation Section in README.md (#35094)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

Enhanced installation section with troubleshooting, GPU setup, and OS-specific details.

* Update README.md

Enhanced installation section with troubleshooting, GPU setup, and OS-specific details.

* Update installation.md

Updated installation.md to include virtual environment and GPU setup instructions.

* Update installation.md

Updated installation.md to include virtual environment and GPU setup instructions.

* Update installation.md

Updated installation.md to include virtual environment, troubleshooting and GPU setup instructions.

* Update installation.md

* Update installation.md

* Update installation.md

* Update installation.md

Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions.

* Update installation.md

Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions.

* Update installation.md

Updated installation.md to include virtual environment, troubleshooting functions and GPU setup instructions.

* Update README.md

Removed numbering from README.md.

* Update README.md

Removed unnecessary "a)" formatting as per maintainer feedback.

* Update README.md

Added blank lines around code snippets for better readability.

* Update README.md

Removed the line "b) Install a backend framework:" from README.md as per feedback.

* Update README.md

Simplified "For Windows:" to "Windows" in README.md as per feedback as well as "For macOS/Linux:" to "macOS/Linux"

* Update README.md

Removed unnecessary heading and retained valid code snippet.

* Update README.md

Removed unnecessary heading "d) Optional: Install from source for the latest updates" as per feedback.

* Update README.md

Removed "GPU Setup (Optional)" section to align with minimal design feedback.

* Update installation.md

Removed "Create and Activate a Virtual Environment" section from installation.md as per feedback.

* Update installation.md

Adjusted "Troubleshooting" to a second-level heading and added an introductory line as per feedback.

* Update installation.md

Updated troubleshooting section with simplified headings and formatted code blocks as per feedback.

* Update installation.md

Integrated GPU setup instructions into the "Install with pip" section for better content flow.

* Update README.md

Removed Troubleshooting section from README.md for minimalism as per maintainer feedback.
2025-01-14 08:05:08 -08:00
a11041ffad Fix : add require_read_token for gemma2 gated model (#35687)
fix gemma2 gated model test
2025-01-14 11:47:05 +01:00
df2a812e95 Fix expected output for ggml test (#35686)
fix expected output
2025-01-14 11:46:55 +01:00
050636518a Fix : HQQ config when hqq not available (#35655)
* fix

* make style

* adding require_hqq

* make style
2025-01-14 11:37:37 +01:00
715fdd6459 Update torchao.md: use auto-compilation (#35490)
* Update torchao.md: use auto-compilation

* Update torchao.md: indicate updating transformers to the latest

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-01-14 11:33:48 +01:00
4b8d1f7fca Fix : adding einops lib in the CI docker for some bitsandbytes tests (#35652)
* fix docker

* fix
2025-01-14 07:36:10 +01:00
34f76bb62b Fix zero_shot_image_classification documentation guide link in SigLIP (#35671) 2025-01-13 11:08:17 -08:00
c23a1c1932 Add-helium (#35669)
* Add the helium model.

* Add a missing helium.

* And add another missing helium.

* Use float for the rmsnorm mul.

* Add the Helium tokenizer converter.

* Add the pad token as suggested by Arthur.

* Update the RMSNorm + some other tweaks.

* Fix more rebase issues.

* fix copies and style

* fixes and add helium.md

* add missing tests

* udpate the backlink

* oups

* style

* update init, and expected results

* small fixes

* match test outputs

* style fixup, fix doc builder

* add dummies and we should be good to go!z

* update sdpa and fa2 documentation

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2025-01-13 18:41:15 +01:00
a3f82328ed [i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic (#35193)
* Create token_classification.md

* Update token_classification.md

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/token_classification.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update _toctree.yml

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2025-01-13 09:32:15 -08:00
2fa876d2d8 [tests] make cuda-only tests device-agnostic (#35607)
* intial commit

* remove unrelated files

* further remove

* Update test_trainer.py

* fix style
2025-01-13 14:48:39 +01:00
e6f9b03464 [Compile] Only test compiling model forward pass (#35658)
* rename test to only compile forward!

* style emu
2025-01-13 13:43:29 +01:00
84a6789145 Enable different torch dtype in sub models (#34873)
* fix

* fix test

* add tests

* add more tests

* fix tests

* supposed to be a torch.dtype test

* handle BC and make fp32 default
2025-01-13 13:42:08 +01:00
87089176d9 [Phi] bias should be True (#35650)
bias should be True
2025-01-13 13:15:07 +01:00
91f14f1fc4 Removed some duplicated code (#35637)
* Removed duplicate class field definition.

* Removed duplicate code in try-except block.

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-01-13 12:34:21 +01:00
b8c34d97fc Fix whisper compile (#35413)
Fix compile error

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-01-13 11:31:51 +01:00
cd44bdb4b8 Fix device in rope module when using dynamic updates (#35608)
fix rope device
2025-01-13 10:11:17 +01:00
15bd3e61f8 Update codeowners with individual model owners (#35595)
* Update codeowners with individual model owners

* rip yoach

* add comment

* Replace - with _

* Add @qubvel for zero-shot object-detection

* Update CODEOWNERS

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update CODEOWNERS

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update CODEOWNERS

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update CODEOWNERS

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add yoni for omdet-turbo

* Update CODEOWNERS

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Refactor / comment the CODEOWNERS file

* Capture modular files as well

* Add dummies without owner

* More cleanup

* Set Niels on a few more models that he added

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-01-10 17:59:36 +00:00
1e3c6c1f7d Skip MobileNetV1ModelTest::test_batching_equivalence for now (#35614)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-10 18:32:36 +01:00
04eae987f3 Fix flaky test_beam_search_low_memory (#35611)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-10 17:31:03 +01:00
b02828e4af Let EarlyStoppingCallback not require load_best_model_at_end (#35101)
* Bookmark

* Add warning
2025-01-10 10:25:32 -05:00
0aaf124fb9 Added error when sequence length is bigger than max_position_embeddings (#32156)
* Added error when sequence length is bigger than max_position_embeddings

* Fixed formatting

* Fixed bug

* Changed copies to match

* Fixed bug

* Applied suggestions

* Removed redundant code

* Fixed bugs

* Bug fix

* Bug fix

* Added requested Changes

* Fixed bug

* Fixed unwanted change

* Fixed unwanated changes

* Fixed formatting
2025-01-10 15:23:54 +00:00
1211e616a4 Use inherit tempdir makers for tests + fix failing DS tests (#35600)
* Use existing APIs to make tempdir folders

* Fixup deepspeed too

* output_dir -> tmp_dir
2025-01-10 10:01:58 -05:00
bbc00046b9 Fix flaky test_custom_4d_attention_mask (#35606)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-10 15:40:04 +01:00
f63829c87b v4.49.0-dev 2025-01-10 12:31:11 +01:00
52e1f87c7d [WIP] Emu3: add model (#33770)
* model can convert to HF and be loaded back

* nit

* works in single batch generation but hallucinates

* use the image tokens

* add image generation

* now it works

* add tests

* update

* add modulare but it doesn't work for porting docstring :(

* skip some tests

* add slow tests

* modular removed the import?

* guess this works

* update

* update

* fix copies

* fix test

* fix copies

* update

* docs

* fix tests

* last fix tests?

* pls

* repo consistency

* more style

* style

* remove file

* address comments

* tiny bits

* update after the new modular

* fix tests

* add one more cond in check attributes

* decompose down/up/mid blocks

* allow static cache generation in VLMs

* nit

* fix copies

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/emu3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix VAE upsampling

* Update src/transformers/models/emu3/modular_emu3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments

* state overwritten stuff explicitly

* fix copies

* add the flag for flex attn

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-10 12:23:00 +01:00
ccc0381d36 Fix flex_attention in training mode (#35605)
* fix flex

* add test

* style
2025-01-10 11:49:12 +01:00
a9bd1e6284 Remove benchmark.py after #34275 2025-01-10 11:09:06 +01:00
e0646f3dce Chat template: return vectorized output in processors (#34275)
* update chat template

* style

* fix tests

* Update src/transformers/image_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* typehints + docs

* fix tests

* remove unnecessary warnings

* forgot code style :(

* allow users to pass backend and num frames

* Update docs/source/en/chat_templating.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/image_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/image_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/image_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/image_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/image_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/image_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/processing_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* typo fix

* style

* address comments

* align with "pipeline" template

* update docs

* update docs

* unpack for all kwargs?

* wrong conflict resolution while rebasing

* tmp

* update docs

* Update docs/source/en/chat_templating.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-10 11:05:29 +01:00
5f087d1335 Add Moonshine (#34784)
* config draft

* full encoder forward

* full decoder forward

* fix sdpa and FA2

* fix sdpa and FA2

* moonshine model

* moonshine model forward

* fix attention with past_key_values

* add MoonshineForConditionalGeneration

* fix cache handling and causality for cross attention

* no causal attention mask for the encoder

* model addition (imports etc)

* small nit

* nits

* Update src/transformers/models/moonshine/convert_usefulsensors_to_hf.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* add rope_theta

* nits

* model doc

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Joshua Lochner <admin@xenova.com>

* imports

* add MODEL_FOR_SPEECH_SEQ_2_SEQ_MAPPING_NAMES

* updates modular

* make

* make fix-copies

* ruff check examples fix

* fix check_modular_conversion

* nit

* nits

* nits

* copied from -> imports

* imports fix

* integrate attention refacto

* modular edge case

* remove encoder

* convolutions params in config

* run modular_model_converter

* make

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

* MoonshineModelTest

* correct typo

* make style

* integration tests

* make

* modular convert

* name conversion update (up_proj -> fc1 etc)

* update config

* update MLP

* update attention

* update encoder layer

* update decoder layer

* update convolutions parameters

* update encoder

* remove INPUTS_DOCSTRING

* update decoder

* update conditional generation

* update pretrained model

* imports

* modular converted

* update doc

* fix

* typo

* update doc

* update license

* update init

* split config in file

* two classes for MLP

* attention from GLM

* from GlmRotaryEmbedding

* split MLP

* apply arthur's review suggestions

* apply arthur's review suggestions

* apply arthur's review suggestions

* auto feature extractor

* convert modular

* fix + make

* convert modular

* make

* unsplit config

* use correct checkpoint

* wrap generate

* update tests

* typos

* make

* typo

* update doc

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
2025-01-10 11:00:54 +01:00
6f127d3f81 Skip torchscript tests if a cache object is in model's outputs (#35596)
* fix 1

* fix 1

* comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-10 10:46:03 +01:00
6b73ee8905 ModernBert: reuse GemmaRotaryEmbedding via modular + Integration tests (#35459)
* Introduce 5 integration tests for the 4 model classes + torch export

* ModernBert: reuse GemmaRotaryEmbedding via modular

* Revert #35589, keep rope_kwargs; rely on them in modular_modernbert

* Revert "Revert #35589, keep rope_kwargs; rely on them in modular_modernbert"

This reverts commit 11b44b9ee83e199cbfb7c5ba2d11f7a7fdbba2d3.

* Don't set rope_kwargs; override 'self.rope_init_fn' call instead
2025-01-10 10:25:10 +01:00
8de7b1ba8d Add flex_attn to diffllama (#35601)
Add sdpa to diffllama
2025-01-09 20:49:11 +01:00
1e3ddcb2d0 ModernBERT bug fixes (#35404)
* bug fixes

* organize imports

* wrap cpu warning in reference_compile

* Avoid needing repad_logits_with_grad, always repad with grads when training

I'm not 100% that the conditional with "or labels is None" makes sense though - not sure what the intention is there. Perhaps we can remove that?

* Revert "Avoid needing repad_logits_with_grad, always repad with grads when training"

This reverts commit cedcb4e89bcea199a1135a0933e71f534b656239.

* Fix grammar: keep -> keeps

* Propagate grammar fix with modular_model_converter

---------

Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
2025-01-09 20:15:38 +01:00
e97d7a5be5 add _supports_flex_attn = True for models that do support it (#35598)
* add `_supports_flex_attn = True`

* fix repo consistency
2025-01-09 20:03:33 +01:00
c9c682d19c [doc] deepspeed universal checkpoint (#35015)
* universal checkpoint

* Update docs/source/en/deepspeed.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/deepspeed.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/deepspeed.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-01-09 09:50:51 -08:00
3a4ae6eace Refactor/fix Cohere2 (#35594)
* refactor/fix cohere2

* add kwargs

* tests

* remove func and import it
2025-01-09 17:54:57 +01:00
32e0db8a69 [tokenizers] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593)
* Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer

in PreTrainedTokenizerFast, rather than relying on subclasses to take care of this.

* Simplify setting self.add_prefix_space, ensure pre_tok exists

* Wrap in try-except to catch 'Custom PreTokenizer cannot be serialized'

862d1a346a/bindings/python/src/pre_tokenizers.rs (L672) produces the Exception. They're triggered by the roformer tests, as the RoFormerTokenizerFast uses a custom PreTokenizer.

* Propagate add_prefix_space in T5TokenizerFast to superclass
2025-01-09 17:46:50 +01:00
46276f9a7f Fix modular edge case + modular sorting order (#35562)
* look-ahead negation

* re add examples by default

* Fix the bug in topological sort

* Update create_dependency_mapping.py

* start adding test

* finalize test

* more tests

* style

* style
2025-01-09 17:17:52 +01:00
d3fe9fa3fe PR for Issue #22694: Fixed Training Evaluation table display for VSCode (#35557) 2025-01-09 15:05:47 +00:00
395b114bd1 Small fix rope kwargs (#35589)
* don't know why this keeps popping up?

* remove unused rope_kwargs
2025-01-09 15:40:36 +01:00
82dd6c14bb Fix flaky SwitchTransformersModelTest::test_training_gradient (#35587)
* fix

* Update tests/models/switch_transformers/test_modeling_switch_transformers.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-09 15:36:22 +01:00
eb4579cf43 tokenizer train from iterator without pre_tokenizers (#35396)
* fix if else issues

* add a test

* fix the test

* style
2025-01-09 15:34:43 +01:00
320512df46 feat: add TP plan for granite (#35573)
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-01-09 15:25:55 +01:00
633da1b10e [Idefics3] Move image features to same device as input embeds (#35100)
* [Idefics3] Move image features to same device as input embeds

* Update src/transformers/models/idefics3/modeling_idefics3.py

* make style

---------

Co-authored-by: Saif Rehman Nasir <shyshin@github.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-01-09 14:25:36 +01:00
832c6191ed Add inputs_embeds param to ModernBertModel (#35373)
* update modular_modernbert -- add inputs_embeds param to ModernBertModel

* Fix implementation issues; extend to other classes; docstring

First of all, the inputs_embeds shouldn't fully replace `self.embeddings(input_ids)`, because this call also does layer normalization and dropout. So, now both input_ids and inputs_embeds is passed to the ModernBertEmbeddings, much like how BertEmbeddings is implemented.

I also added `inputs_embeds` to the docstring, and propagated the changes to the other model classes.

I also introduced an error if input_ids and input_embeds are both or neither provided.

Lastly, I fixed an issue with device being based solely on input_ids with attention_mask.

* Propagate inputs_embeds to ModernBertForMaskedLM correctly

Also reintroduce inputs_embeds test

---------

Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
2025-01-09 14:17:26 +01:00
1b2f942af7 Fix flaky test_batching_equivalence (#35564)
* yes!

* oh no!!!

* oh no!!!

* style

* oh no!!!

* oh no!!!

* oh no!!!

* oh no!!!

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-09 14:00:08 +01:00
4adc415b6d Setup loss_type in config at model init time (#34616)
* setup loss_type in config at model init time

ensures no additional graph break introduced when torch.compile'ed

fixes #34615

Signed-off-by: ChanderG <mail@chandergovind.org>

* lookup loss mapping at init time instead of manual setup

Signed-off-by: ChanderG <mail@chandergovind.org>

* remove redundant lookup at loss_function time

* overwride losstype at init time

---------

Signed-off-by: ChanderG <mail@chandergovind.org>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-01-09 13:32:21 +01:00
c8ab6ce6ce Re-add missing __all__ for Cohere and Phi3 (#35578)
re-add missing __all__
2025-01-09 11:29:31 +01:00
487c31a21f Minor fix in video text 2 text docs (#35546)
minor fix in docs
2025-01-09 11:20:36 +01:00
965a2fb320 More model refactoring! (#35359)
* cohere

* style

* phi3

* style

* small fix

* small fix

* phi3 longrope

* oups

* Update rope (only for phi3 still)

* Update test_modeling_rope_utils.py

* Update modeling_phi3.py

* fix

* fix copies

* style

* Fix copied from bad renaming
2025-01-09 11:09:09 +01:00
137965ca7d Don't show warning for inv_freq buffers (#35255)
dont show warning
2025-01-09 10:46:01 +01:00
8cad65a698 Fix multi-gpu loss (#35395)
push to device
2025-01-09 10:14:31 +01:00
2e2f8015c0 update code owners (#35576)
update
2025-01-09 09:55:41 +01:00
a6256ec098 [i18n-ar] Translated file: docs/source/ar/tasks/multiple_choice.md into Arabic (#35199)
* إضافة الترجمة العربية: multiple_choice.md

* Update multiple_choice.md

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/multiple_choice.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update _toctree.yml

* Add files via upload

* Update _toctree.yml

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2025-01-08 14:17:58 -08:00
b32938aeee Fix all output_dir in test_trainer.py to use tmp_dir (#35266)
* update codecarbon

* replace directly-specified-test-dirs with tmp_dir

* pass tmp_dir to all get_regression_trainer

* test_trainer.py: Use tmp_dir consistently for all output_dir arguments

* fix some with...as tmp_dir blocks

* reflect the comments to improve test_trainer.py

* refresh .gitignore
2025-01-08 19:44:39 +01:00
76da6ca034 Pipeline: simple API for assisted generation (#34504)
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-01-08 17:08:02 +00:00
3f483beab9 [PixtralLarge] Update Pixtral conversion script to support large format! (#34801)
* update conversion script

* update for bias again

* remove pdv

* use my dir

* Update how we initialize the tokenizer

* Convert in bfloat16

* Undo that one again

* fix config dump

* .to() was broken for BatchMixFeature

* quick debug breakpoint

* put the breakpoint in the right place

* Add a config flag for the multimodal projector bias

* Add a config flag for the multimodal projector bias

* Conversion script can load chat templates

* Indent config for comparison

* Stop clobbering the config

* Re-enable the config clobber

* Get rid of the config manual save - it has no effect!

* Handle adapter bias correctly

* Default vision transformer activation to silu

* Remove legacy processing path

* One commit with all the debug breakpoints before I delete them all, in case I need to revert

* Update conversion

* Remove vLLM debugging instrumentation

* Drop xformers

* Remove debug enumerates

* make fixup

* make fixup

* Break copied from in pixtral

* Propagate multimodal_projector_bias change

* Propagate multimodal_projector_bias change

* Remove debug device .to()

* Restore attention weights output

* Fix Pixtral test

* Drop image_seq_length

* Drop image_seq_length

* Put the legacy processing code back

* Add the bias option to the llava_next_video config

* Add the bias option to the llava_next_video config

* Make certain args required in converter

* Make certain args required in converter

* typo

* make fixup

* Reverting some dtype changes since it seems to work without them

---------

Co-authored-by: arthur@huggingface.co <arthur@ip-26-0-166-244.ec2.internal>
Co-authored-by: Matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-01-08 17:39:47 +01:00
4c2c12b3de [docs] Remove Hiera from AUDIO MODELS in docs (#35544)
Remove Hiera from AUDIO MODELS

Hiera is a visual model and should not appear in audio model...
2025-01-08 16:33:21 +00:00
854dc7941b ovewrite top_k when crate audio classification pipeline (#35541)
* ovewrite top_k when crate audio classification pipeline

* Update src/transformers/pipelines/audio_classification.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-01-08 16:32:27 +00:00
8c555ca3d7 add code owners (#35528)
* add co owners

* normal processing

* /src/transformers/models/*/*_modeling*

* Update CODEOWNERS

* Update CODEOWNERS

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Update CODEOWNERS

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* nit

* Apply suggestions from code review

Co-authored-by: Alvaro Moran <6949769+tengomucho@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* Update CODEOWNERS

* rather put `@Rocketknight1`

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Alvaro Moran <6949769+tengomucho@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
2025-01-08 17:14:44 +01:00
8490d3159c Add ViTPose (#30530)
* First draft

* Make fixup

* Make forward pass worké

* Improve code

* More improvements

* More improvements

* Make predictions match

* More improvements

* Improve image processor

* Fix model tests

* Add classic decoder

* Convert classic decoder

* Verify image processor

* Fix classic decoder logits

* Clean up

* Add post_process_pose_estimation

* Improve post_process_pose_estimation

* Use AutoBackbone

* Add support for MoE models

* Fix tests, improve num_experts%

* Improve variable names

* Make fixup

* More improvements

* Improve post_process_pose_estimation

* Compute centers and scales

* Improve postprocessing

* More improvements

* Fix ViTPoseBackbone tests

* Add docstrings, fix image processor tests

* Update index

* Use is_cv2_available

* Add model to toctree

* Add cv2 to doc tests

* Remove script

* Improve conversion script

* Add coco_to_pascal_voc

* Add box_to_center_and_scale to image_transforms

* Update tests

* Add integration test

* Fix merge

* Address comments

* Replace numpy by pytorch, improve docstrings

* Remove get_input_embeddings

* Address comments

* Move coco_to_pascal_voc

* Address comment

* Fix style

* Address comments

* Fix test

* Address comment

* Remove udp

* Remove comment

* [WIP] need to check if the numpy function is same as cv

* add scipy affine_transform

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* refactor convert

* add output_shape

* add atol 5e-2

* Use hf_hub_download in conversion script

* make box_to_center more applicable

* skipt test_get_set_embedding

* fix to accept array and fix CI

* add co-contributor

* make it to tensor type output

* add torch

* change to torch tensor

* add more test

* minor change

* CI test change

* import torch should be above ImageProcessor

* make style

* try not use torch in def

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/configuration_vitpose_backbone.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix

* fix

* add caution

* make more detail about dataset_index

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>

* add docs

* Update docs/source/en/model_doc/vitpose.md

* Update src/transformers/models/vitpose/configuration_vitpose.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Revert "Update src/transformers/__init__.py"

This reverts commit 7ffa504450bb9dbccf9c7ea668441b98a1939d5c.

* change name

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/vitpose/test_modeling_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* move vitpose only function to image_processor

* raise valueerror when using timm backbone

* use out_indices

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove camel-case of def flip_back

* rename vitposeEstimatorOutput

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix confused camelcase of MLP

* remove in-place logic

* clear scale description

* make consistent batch format

* docs update

* formatting docstring

* add batch tests

* test docs change

* Update src/transformers/models/vitpose/image_processing_vitpose.py

* Update src/transformers/models/vitpose/configuration_vitpose.py

* chagne ViT to Vit

* change to enable MoE

* make fix-copies

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* extract udp

* add more described docs

* simple fix

* change to accept target_size

* make style

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/vitpose/configuration_vitpose.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change to `verify_backbone_config_arguments`

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove unnecessary copy

* make config immutable

* enable gradient checkpointing

* update inappropriate docstring

* linting docs

* split function for visibility

* make style

* check isinstances

* change to acceptable use_pretrained_backbone

* make style

* remove copy in docs

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/vitpose/modeling_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* simple fix + make style

* change input config of activation function to string

* Update docs/source/en/model_doc/vitpose.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* tmp docs

* delete index.md

* make fix-copies

* simple fix

* change conversion to sam2/mllama style

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/vitpose/image_processing_vitpose.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* refactor convert

* add supervision

* Update src/transformers/models/vitpose_backbone/modeling_vitpose_backbone.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* remove reduntant def

* seperate code block for visualization

* add validation for num_moe

* final commit

* add labels

* [run-slow] vitpose, vitpose_backbone

* Update src/transformers/models/vitpose/convert_vitpose_to_hf.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* enable all conversion

* final commit

* [run-slow] vitpose, vitpose_backbone

* ruff check --fix

* [run-slow] vitpose, vitpose_backbone

* rename split module

* [run-slow] vitpose, vitpose_backbone

* fix pos_embed

* Simplify init

* Revert "fix pos_embed"

This reverts commit 2c56a4806e30bc9b5753b142fa04b913306c54ff.

* refactor single loop

* allow flag to enable custom model

* efficiency of MoE to not use unused experts

* make style

* Fix range -> arange to avoid warning

* Revert MOE router, a new one does not work

* Fix postprocessing a bit (labels)

* Fix type hint

* Fix docs snippets

* Fix links to checkpoints

* Fix checkpoints in tests

* Fix test

* Add image to docs

---------

Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-08 16:02:14 +00:00
4349a0e401 fix: Qwen2-VL generate with inputs_embeds (#35466)
* fix: Qwen2-VL generate with inputs_embeds

* change: optional input_ids in get_rope_index
2025-01-08 16:36:03 +01:00
88e18b3c63 Update doc for metric_for_best_model when save_strategy="best". (#35389)
* Updated docstring for _determine_best_metric.

* Updated docstring for metric_for_best_model.

* Added test case for save strategy.

* Updated incorrect test case.

* Changed eval_strategy to match save_strategy.

* Separated test cases for metric.

* Allow load_best_model when save_strategy == "best".

* Updated docstring for metric_for_best_model.
2025-01-08 16:32:35 +01:00
jp
29e74b7cbc Add: num_additional_image_tokens to models (#35052)
* Add: num_additional_image_tokens to models

* docs: update docstring for num_additional_image_tokens in configuration files

* Add num_additional_image_tokens to LlavaNextVideo model and update feature selection logic

* revert

* Fix: adjust num_image_tokens calculation in LlavaProcessor

* Remove num_additional_image_tokens initialization from configuration files

* Fix test error

* revert

* Fix: adjust num_image_tokens calculation in LlavaNextVideoProcessor

* fix conflict

* Fix: adjust num_image_tokens calculation in VideoLlavaProcessor

* make style

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-01-08 16:20:01 +01:00
657bb14f98 Enable auto task for timm models in pipeline (#35531)
* Enable auto task for timm models

* Add pipeline test
2025-01-08 15:14:17 +00:00
1a6c1d3a9a Bump torch requirement to >= 2 (#35479)
Bump torch requirement, follow-up of #35358
2025-01-08 15:59:32 +01:00
59e5b3f01b Timm wrapper label names (#35553)
* Add timm wrapper label names mapping

* Add index to classification pipeline

* Revert adding index for pipelines

* Add custom model check for loading timm labels

* Add tests for labels

* [run-slow] timm_wrapper

* Add note regarding label2id mapping
2025-01-08 14:09:46 +00:00
f1639ea51d Update missing model error message (#35370)
* Update missing model error message

* Update missing model error message

* Update missing model error message

* Fix capitalization
2025-01-08 15:05:06 +01:00
bd39b0627b Update doc and default value of TextNetImageProcessor (#35563)
update doc and default value
2025-01-08 13:47:52 +00:00
651cfb400f Add support for modular with fast image processors (#35379)
* Add support for modular with fast image processors

* fix order and remove copied from

* add comment for "image_processing*_fast"
2025-01-08 08:37:57 -05:00
430d3d43a5 [Docs] links to logits-processor-zoo (#35552)
links to logits-processor-zoo
2025-01-08 13:36:30 +00:00
3c1895aa65 Fix Qwen2VL processor to handle odd number of frames (#35431)
* fix: processing odd number of frames

* feat: add test case

* update: test one frame

* feat: support custom patch size

* fix: test with videos

* revert: change on patch repeat

* fix: much wow

* update: fixups

* fixup pls

* ruff fixup

* fix typo at least
2025-01-08 13:49:00 +01:00
3fde88b19d support chat generator as input of TextGenerationPipeline (#35551)
* support chat generator as input of TextGenerationPipeline

* missing import

* fix tests

* again

* simpler

* add test
2025-01-08 13:27:07 +01:00
ebdd1ad400 Pass correct num_items_in_batch value into the training_step function (#35438)
pass correct `num_items_in_batch` to compute_loss
2025-01-08 13:16:03 +01:00
0e0516c119 MODERNBERT_INPUTS_DOCSTRING: past_key_values are ignored (#35513)
* MODERNBERT_INPUTS_DOCSTRING: past_key_values are ignored

* sync to modular_modernbert.py
2025-01-08 11:45:40 +01:00
d1681ec2b6 VLMs: major clean up 🧼 (#34502)
only lllava models are modified
2025-01-08 10:35:23 +01:00
7176e06b52 Add TextNet (#34979)
* WIP

* Add config and modeling for Fast model

* Refactor modeling and add tests

* More changes

* WIP

* Add tests

* Add conversion script

* Add conversion scripts, integration tests, image processor

* Fix style and copies

* Add fast model to init

* Add fast model in docs and other places

* Fix import of cv2

* Rename image processing method

* Fix build

* Fix Build

* fix style and fix copies

* Fix build

* Fix build

* Fix Build

* Clean up docstrings

* Fix Build

* Fix Build

* Fix Build

* Fix build

* Add test for image_processing_fast and add documentation tests

* some refactorings

* Fix failing tests

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Introduce TextNet

* Fix failures

* Refactor textnet model

* Fix failures

* Add cv2 to setup

* Fix failures

* Fix failures

* Add CV2 dependency

* Fix bugs

* Fix build issue

* Fix failures

* Remove textnet from modeling fast

* Fix build and other things

* Fix build

* some cleanups

* some cleanups

* Some more cleanups

* Fix build

* Incorporate PR feedbacks

* More cleanup

* More cleanup

* More cleanup

* Fix build

* Remove all the references of fast model

* More cleanup

* Fix build

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix Build

* Fix build

* Fix build

* Fix build

* Fix build

* Fix build

* Incorporate PR feedbacks

* Fix style

* Fix build

* Incorporate PR feedbacks

* Fix image processing mean and std

* Incorporate PR feedbacks

* fix build failure

* Add assertion to image processor

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* fix style failures

* fix build

* Fix Imageclassification's linear layer, also introduce TextNetImageProcessor

* Fix build

* Fix build

* Fix build

* Fix build

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix build

* Incorporate PR feedbacks

* Remove some script

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Incorporate PR feedbacks

* Fix image processing in textnet

* Incorporate PR Feedbacks

* Fix CI failures

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Fix failing test

* Add textnet to readme

* Improve readability

* Incorporate PR feedbacks

* fix code style

* fix key error and convert working

* tvlt shouldn't be here

* fix test modeling test

* Fix tests, make fixup

* Make fixup

* Make fixup

* Remove TEXTNET_PRETRAINED_MODEL_ARCHIVE_LIST

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update tests/models/textnet/test_image_processing_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* space typo

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* improve type annotation

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/configuration_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* make conv layer kernel sizes and strides default to None

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix keyword bug

* add batch init and make fixup

* Make fixup

* Update integration test

* Add figure

* Update textnet.md

* add testing and fix errors (classification, imgprocess)

* fix error check

* make fixup

* make fixup

* revert to original docstring

* add make style

* remove conflict for now

* Update modeling_auto.py

got a confusion in `timm_wrapper` - was giving some conflicts

* Update tests/models/textnet/test_modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update tests/models/textnet/test_modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/textnet/modeling_textnet.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* add changes

* Update textnet.md

* add doc

* add authors hf ckpt + rename

* add feedback: classifier/docs

---------

Co-authored-by: raghavanone <opensourcemaniacfreak@gmail.com>
Co-authored-by: jadechoghari <jadechoghari@users.noreply.huggingface.co>
Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-01-08 09:52:51 +01:00
b05df6611e [docs] Remove sortish_sampler (#35539)
remove
2025-01-07 12:06:19 -08:00
a7d1441d65 Correctly list the chat template file in the Tokenizer saved files list (#34974)
* Correctly list the chat template file in the saved files list

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add save file checking to test

* make fixup

* better filename handling

* make fixup

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-01-07 19:11:02 +00:00
cdca3cf9e3 [Whisper] fix docstrings typo (#35338)
fix typo
2025-01-07 09:20:27 -08:00
7f7677307c [Qwen2Audio] handle input ids expansion during processing (#35534)
* add audio_token attribute to proc

* expand input_ids

* and legacy and expanded input_ids

* test update

* split lines

* add possibility not to provide eos and bos audio tokens

* raise errors

* test incorrect number of audio tokens

* add example

* fmt

* typo
2025-01-07 16:47:27 +01:00
628cd838a3 Release GPU memory after Optuna trial (#35440)
* Release GPU memory after trial

* Update to use release_memory from accelerate.utils.memory after suggestion
2025-01-07 16:26:28 +01:00
665a4942e4 Check whether rescale is requested before checking is_scaled_image (#35439) 2025-01-07 11:39:45 +00:00
f408d55448 Fix bug when requesting input normalization with EnCodec (#34756)
* EnCodec: unsqueeze padding mask

* add test for normalization
2025-01-07 11:50:02 +01:00
96bf3d6cc5 Add diffllama (#34083)
* first adding diffllama

* add Diff Attention and other but still with errors

* complate make attention Diff-Attention

* fix some bugs which may be caused by transformer-cli while adding model

* fix a bug caused by forgetting KV cache...

* Update src/transformers/models/diffllama/modeling_diffllama.py

You don't need to divide by 2 if we use same number of attention heads as llama. instead you can just split in forward.

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fit to changeing "num_heads // 2" place

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

new codes are more meaningful than before

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

new codes are more meaningful than before

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fit to changeing "num_heads // 2" place

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fix 2times divide by sqrt(self.head_dim)

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fix 2times divide by sqrt(self.head_dim)

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* Update src/transformers/models/diffllama/modeling_diffllama.py

fit to changeing "num_heads // 2" place.
and more visible

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* I found Attention missed implemented from paper still on e072544a3bfc69b8a903e062729f861108ffecd3.

* re-implemented

* adding groupnorm

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* align with transformers code style

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* fix typo

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* adding groupnorm

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* change SdpaAttention to DiffSdpaAttention

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* fix bug

* Update src/transformers/models/diffllama/modeling_diffllama.py

resolve "not same outputs" problem

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* fix bugs of places of "GroupNorm with scale" and etc

* Revert "fix bugs of places of "GroupNorm with scale" and etc"

This reverts commit 26307d92f6acd55e9fe89f2facff350f05760960.

* simplify multiple of attention (matmul) operations into one by repeating value_states

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* simplify multiple of attention (matmul) operations into one by repeating value_states

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* simplify multiple of attention (matmul) operations into one by repeating value_states

Co-authored-by: Minho Ryu <ryumin93@gmail.com>

* remove missed type

* add diffllama model_doc

* apply make style/quality

* apply review comment about model

* apply review comment about test

* place diffllama alphabetically on the src/transformers/__init__.py

* fix forgot code

* Supports parameters that are not initialized with standard deviation 0 in the conventional method

* add DiffLlamaConfig to CONFIG_CLASSES_TO_IGNORE_FOR_DOCSTRING_CHECKPOINT_CHECK on utils/check_config_docstrings.py

* remove unused property of config

* add to supported model list

* add to spda supported model list

* fix copyright, remove pretraining_tensor_parallel, and modify for initialization test

* remove unused import and etc.

* empty commit

* empty commit

* empty commit

* apply modular transformers but with bugs

* revert prev commit

* create src/transformers/model/diffllama/modular_diffllama.py

* run utils/modular_model_converter.py

* empty commit

* leaner modular diffllama

* remove more and more in modular_diffllama.pt

* remove more and more in modular_diffllama.pt

* resolve missing docstring entries

* force reset

* convert modular

---------

Co-authored-by: Minho Ryu <ryumin93@gmail.com>
2025-01-07 11:34:56 +01:00
ed73ae210b NPU support SDPA (#35165)
Co-authored-by: root <weichunyude@163.com>
2025-01-07 11:30:05 +01:00
02ed609285 Replace tokenizer to processing_class in Seq2SeqTrainer (#35452) 2025-01-07 09:51:12 +00:00
9fd123ac31 ci: mark model_parallel tests as cuda specific (#35269)
`parallelize()` API is deprecated in favor of accelerate's `device_map="auto"`
and therefore is not accepting new features. At the same time `parallelize()`
implementation is currently CUDA-specific. This commit marks respective
ci tests with `@require_torch_gpu`.

Fixes: #35252

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-01-07 10:16:34 +01:00
bd442c6d3a Zamba new attention standard (#35375)
* updated zamba to new attention standard

* make fixup fixes
2025-01-07 10:08:45 +01:00
12ba96aa3c [Dinov2 with Registers] Some fixes (#35411)
* First draft

* Thanks claude

* Remove print statement

* Use torch_int

* Address comments

* Address comment
2025-01-06 21:10:59 +01:00
ca00950057 added logic for deleting adapters once loaded (#34650)
* added logic for deleting adapters once loaded

* updated to the latest version of transformers, merged utility function into the source

* updated with missing check

* added peft version check

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* changes according to reviewer

* added test for deleting adapter(s)

* styling changes

* styling changes in test

* removed redundant code

* formatted my contributions with ruff

* optimized error handling

* ruff formatted with correct config

* resolved formatting issues

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-01-06 18:36:40 +00:00
1650e0e514 Fixed typo in Llama configuration docstring (#35520)
Update configuration_llama.py

There is no `num_heads` parameter, only `num_attention_heads`
2025-01-06 09:54:08 -08:00
3b1be043cd 🌐 [i18n-KO] Remove duplicates in toctree (#35496)
fix(docs): remove duplicates in toctree
2025-01-06 09:14:22 -08:00
3951da1a6b [GGUF] Refactor and decouple gguf checkpoint loading logic (#34385)
* draft load_gguf refactor

* update

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove llama mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove qwen2 mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove unused function

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate stablelm mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate phi3 mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate t5 mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate bloom mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix bloom

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate starcoder2 mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate gpt2 mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate mistral mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate nemotron mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate mamba mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* deprecate mamba mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* code format

Signed-off-by: Isotr0py <2037008807@qq.com>

* code format

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix mamba

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix qwen2moe

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove qwen2moe mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* clean up

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove falcon 7b map

Signed-off-by: Isotr0py <2037008807@qq.com>

* remove all ggml tensors mapping

Signed-off-by: Isotr0py <2037008807@qq.com>

* add comments

Signed-off-by: Isotr0py <2037008807@qq.com>

* update messages

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix tensors in parsed parameters

Signed-off-by: Isotr0py <2037008807@qq.com>

* add gguf check

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-06 18:02:38 +01:00
86fa3cedad Bump jinja2 from 3.1.4 to 3.1.5 in /examples/research_projects/decision_transformer (#35408)
Bump jinja2 in /examples/research_projects/decision_transformer

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.4 to 3.1.5.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.4...3.1.5)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-06 16:58:29 +00:00
44a26c871c Update llm_optims docs for sdpa_kernel (#35481)
update: use sdpa_kernel
2025-01-06 08:54:31 -08:00
18e896bd8f 🌐 [i18n-KO] Translated altclip.md to Korean (#34594)
* docs: ko: model_doc/timesformer.md

* feat: nmt draft

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com>

* Update docs/source/ko/model_doc/altclip.md

* add snippet

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: timdalxx <48753785+jeongiin@users.noreply.github.com>
2025-01-06 08:45:26 -08:00
a821b9c7ab Add check for if num_items_in_batch is not None (#35102) 2025-01-06 10:11:21 -05:00
203e978826 Add position_ids in XLMRobertaXLForCausalLM.prepare_inputs_for_generation (#35044)
* fix

* fix

* cleanup

* style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-01-06 16:10:21 +01:00
c451a72cd7 Add French translation of task_summary and tasks_explained (#33407)
* Add French translation of task_summary and tasks_explained

---------

Co-authored-by: Aymeric Roucher <69208727+aymeric-roucher@users.noreply.github.com>
2025-01-06 14:23:52 +01:00
9895f7df81 Idefics: fix docstring (#35079)
nit: fix docstring
2025-01-06 10:58:04 +01:00
32aa2db04a Fix Llava conversion for models that use safetensors to store weights (#35406)
* fix llava-med-v1.5-mistral-7b conversion

Signed-off-by: Isotr0py <2037008807@qq.com>

* add weights_only=True

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-06 09:59:38 +01:00
b2f2977533 Applies the rest of the init refactor except to modular files (#35238)
* [test_all] Applies the rest of the init refactor except to modular files

* Revert modular that doesn't work

* [test_all] TFGPT2Tokenizer
2025-01-05 18:30:08 +01:00
e5fd865eba Add Gemma2 GGUF support (#34002)
* initial setup for ggml.py

* initial setup of GGUFGemma2Converter class

* Add gemma2 model to gguf.md doc

* Partial work on GGUF_TENSOR_MAPPING

* initial setup of GGUF_TENSOR_MAPPING for Gemma2

* refactor: rename GemmaConvert class to GemmaConverter for naming consistency

* feat: complete gemma2 tensor mapping implementation

* feat: add initial implementation of GGUFGemmaConverter

* feat: complete GGUFGemmaConverter implementation

* feat: add test code for gemma2

* refactor: minor code cleanup

* refactor: minor code cleanup

* fix: resolve suggestions

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Isotr0py <2037008807@qq.com>

---------

Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-03 14:50:07 +01:00
1fe2d53d4e Reuse "if not" logic in image_processing. (#35405) 2025-01-03 14:44:57 +01:00
30a9971632 Use sdpa_kernel in tests (#35472)
* update: use sdpa_kernel

* update: rerun test
2025-01-03 14:39:52 +01:00
cba49cb2a6 Change is_soundfile_availble to is_soundfile_available (#35030) 2025-01-03 14:37:42 +01:00
42865860ec Fix paligemma warning message (#35486)
fix log input
2025-01-02 11:36:53 +01:00
b2b04e86e7 Fix docs typos. (#35465)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-01-02 11:29:46 +01:00
6b1e86fd4d Fix new BNB test failures (#35345) 2025-01-02 11:24:52 +01:00
5b516b06c8 Reintroduce Python 3.9 support for ModernBERT (#35458)
Co-authored-by: Koichi Yasuoka <yasuoka@kanji.zinbun.kyoto-u.ac.jp>
2025-01-02 11:23:07 +01:00
919220dab1 Update translated docs for sdpa_kernel (#35461)
* docs: update sdpa_kernel for translation

* fix: nn.attention

* update: infer many
2024-12-31 08:37:58 -08:00
eb2b452432 [i18n-ar] Translated file: docs/source/ar/tasks/summarization.md into Arabic (#35195)
* إضافة الترجمة العربية: summarization.md

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/summarization.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update _toctree.yml

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2024-12-31 08:35:54 -08:00
d5aebc6465 [i18n-ar] Translated file: docs/source/ar/tasks/question_answering.md into Arabic (#35196)
* إضافة الترجمة العربية: question_answering.md

* Update question_answering.md

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tasks/question_answering.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update _toctree.yml

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2024-12-30 11:56:05 -08:00
b5f97977ed Update docs for sdpa_kernel (#35410)
update: sdp_kernel -> sdpa_kernel
2024-12-30 09:50:34 -08:00
5cabc75b4b Add compute_loss_func to Seq2SeqTrainer (#35136) 2024-12-29 15:01:35 +01:00
90f256c90c Update perf_infer_gpu_one.md: fix a typo (#35441) 2024-12-29 14:57:08 +01:00
5c75087aee Fix model_accepts_loss_kwargs for timm model (#35257)
* Fix for timm model

* Add comment
2024-12-27 16:33:44 +00:00
3b0a94ef9e Fix f-string to show ACCELERATE_MIN_VERSION on error (#35189)
fix f-string to show ACCELERATE_MIN_VERSION on error

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-12-27 13:21:44 +01:00
f63da20a9f CLIP conversion script - Change fairseq to OpenAI (#35384)
Change fairseq to OpenAI
2024-12-27 13:12:32 +01:00
7f97d01675 Fix: Rename keyword argument in_channels to num_channels (#35289)
Fix: Rename keyword argument in_channels to num_channels in some default backbone configs
2024-12-27 13:07:31 +01:00
4eb17b26e7 Drop inplace operation for loss computation with gradient accumulation (#35416)
Fix inplace loss computation
2024-12-26 14:58:53 +01:00
24c91f095f [GPTQ, CompressedTensors] Fix unsafe imports and metada check (#34815)
* fix gptq creation when optimum is not installed + fix metadata checking

* fix compressed tensors as well

* style

* pray for ci luck on flaky tests :prayge:

* trigger ci

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2024-12-24 19:32:44 +01:00
6e0515e99c Add DINOv2 with registers (#35348)
* added changes from 32905

* fixed mistakes caused by select all paste

* rename diff_dinov2...

* ran tests

* Fix modular

* Fix tests

* Use new init

* Simplify drop path

* Convert all checkpoints

* Add figure and summary

* Update paths

* Update docs

* Update docs

* Update toctree

* Update docs

---------

Co-authored-by: BernardZach <bernardzach00@gmail.com>
Co-authored-by: Zach Bernard <132859071+BernardZach@users.noreply.github.com>
2024-12-24 13:21:59 +01:00
d8c1db2f56 enable non-cuda awq model support without modify version (#35334)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2024-12-24 12:36:00 +01:00
ccc4a5a59b Disable .github/workflows/self-comment-ci.yml for now (#35366)
* disable

* disable

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-24 10:53:57 +01:00
93aafdc620 Add compile test for fast image processor (#35184)
* add compile test for fast image processor

* override pixtral test
2024-12-23 13:12:45 -05:00
82fcac0a7e Adding logger.info about update_torch_dtype in some quantizers (#35046)
adding logger.info
2024-12-23 17:01:00 +01:00
a1780b7ba5 bugfix Idefics3 processor - handle gracefully cases with text and no images (#35363)
* bugfix processing empty images

* fix

* fix

* Update src/transformers/models/idefics3/processing_idefics3.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* adding tests

* fix

* fix

* fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2024-12-23 16:59:01 +01:00
64c05eecd6 HIGGS Quantization Support (#34997)
* higgs init

* working with crunches

* per-model workspaces

* style

* style 2

* tests and style

* higgs tests passing

* protecting torch import

* removed torch.Tensor type annotations

* torch.nn.Module inheritance fix maybe

* hide inputs inside quantizer calls

* style structure something

* Update src/transformers/quantizers/quantizer_higgs.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* reworked num_sms

* Update src/transformers/integrations/higgs.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* revamped device checks

* docstring upd

* Update src/transformers/quantizers/quantizer_higgs.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* edited tests and device map assertions

* minor edits

* updated flute cuda version in docker

* Added p=1 and 2,3bit HIGGS

* flute version check update

* incorporated `modules_to_not_convert`

* less hardcoding

* Fixed comment

* Added docs

* Fixed gemma support

* example in docs

* fixed torch_dtype for HIGGS

* Update docs/source/en/quantization/higgs.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Collection link

* dequantize interface

* newer flute version, torch.compile support

* unittest message fix

* docs update compile

* isort

* ValueError instead of assert

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2024-12-23 16:54:49 +01:00
ef1f54a0a7 add bnb support for Ascend NPU (#31512)
* add bnb support for Ascend NPU

* delete comment
2024-12-23 16:36:16 +01:00
59178780a6 Fix : VPTQ test (#35394)
fix_test
2024-12-23 16:27:46 +01:00
3a4ced9ab4 Fix typing in docstring for PaliGemmaProcessor (#35278)
Updated typing for `tokenizer` in the `PaliGemmaProcessor` to be `GemmaTokenizerFast` instead of `LlamaTokenizerFast`
2024-12-23 16:22:04 +01:00
3cd3cd50ac Scale loss before backward (#35207) 2024-12-23 16:16:38 +01:00
f5264a86ee Deprecate _is_quantized_training_enabled (#34991)
deperecate

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-12-23 15:51:31 +01:00
e10be82b71 uniformize kwargs for SAM (#34578)
* Make kwargs uniform for SAM

* Remove unused attribute

* Make point_pad_value part of image_kwargs

* Update annotations

* Code review - use existing methods

* Use ProcessorTesterMixin

* Do not add ProcessorTesterMixin everywhere
2024-12-23 13:54:57 +01:00
2bb60982ac Patch GPTNeoX to use adequate FA2 if position_ids is provided (#35318) 2024-12-23 13:45:55 +01:00
5e7aedebeb make LlamaModel._update_causal_mask torch compilable (#35187)
* make LlamaModel._update_causal_mask torch compilable

* chore: lint (make fix-copies)

* fix-copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-12-23 13:10:00 +01:00
401aa39d7b bitsandbytes: simplify 8bit dequantization (#35068) 2024-12-23 13:04:59 +01:00
05260a1fc1 Fix new FA2 if is_causal is passed explicitly (#35390)
* fix

* Update modeling_decision_transformer.py

* Update flash_attention.py
2024-12-22 20:00:07 +01:00
8f38f58f3d owlvit/2 dynamic input resolution (#34764)
* owlvit/2 dynamic input resolution.

* adapt box grid to patch_dim_h patch_dim_w

* fix ci

* clarify variable naming

* clarify variable naming..

* compute box_bias dynamically inside box_predictor

* change style part of code

* [run-slow] owlvit, owlv2
2024-12-21 08:51:09 +00:00
608e163b52 [docs] Follow up register_pipeline (#35310)
example json
2024-12-20 09:22:44 -08:00
UV
94fe0b915b Improved Documentation Of Audio Classification (#35368)
* Improved Documentation Of Audio Classification

* Updated documentation as per review

* Updated audio_classification.md

* Update audio_classification.md
2024-12-20 09:17:28 -08:00
c96cc039c3 Improve modular transformers documentation (#35322)
* Improve modular transformers documentation

- Adds hints to general contribution guides
- Lists which utils scripts are available to generate single-files from modular files and check their content

* Show commands in copyable code cells

---------

Co-authored-by: Joel Koch <joel@bitcrowd.net>
2024-12-20 09:16:02 -08:00
504c4d3692 Make test_generate_with_static_cache even less flaky (#34995)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-20 16:03:26 +01:00
0fc2970363 Use weights_only=True with torch.load for transfo_xl (#35241)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-20 15:40:55 +01:00
6fae2a84ae Update test fetcher when we want to test all (#35364)
* [test-all]

* style

* [test-all]

* [test_all]

* [test_all]

* style
2024-12-20 15:10:43 +01:00
34ad1bd287 update codecarbon (#35243)
* update codecarbon

* replace directly-specified-test-dirs with tmp_dir

* Revert "replace directly-specified-test-dirs with tmp_dir"

This reverts commit 310a6d962ec83db3f6d4f96daeeba5c6746f736c.

* revert the change of .gitignore

* Update .gitignore

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2024-12-20 15:04:36 +01:00
40292aa4e9 bugfix: torch.export failure caused by _make_causal_mask (#35291)
* bugfix: torch.export failure caused by `_make_causal_mask`

Recent changes in torch dynamo prevent mutations on tensors converted with aten::_to_copy. To address this, we can clone such tensor before performing in-place operation `masked_fill_` only when the code is being compiled by torch dynamo.
(relevant issue: https://github.com/pytorch/pytorch/issues/127571)

* chore: use `is_torchdynamo_compiling` instead of `torch._dynamo.is_compiling`
2024-12-20 14:37:04 +01:00
05de764e9c Aurevoir PyTorch 1 (#35358)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-20 14:36:31 +01:00
4567ee8057 fix zoedepth initialization error under deepspeed zero3 (#35011)
fix zoe bug in deepspeed zero3
2024-12-20 11:42:40 +00:00
c3a43594b7 Add Tensor Parallel support for Qwen2VL (#35050)
feat: add parallel support for qwen2vl
2024-12-20 12:40:38 +01:00
0d51d65905 Cleaner attention interfaces (#35342)
* cleaner attention interfaces

* correctly set the _attn_implementation when adding other functions to it

* update

* Update modeling_utils.py

* CIs
2024-12-20 12:09:34 +01:00
eafbb0eca7 Implement AsyncTextIteratorStreamer for asynchronous streaming (#34931)
* Add AsyncTextIteratorStreamer class

* export AsyncTextIteratorStreamer

* export AsyncTextIteratorStreamer

* improve docs

* missing import

* missing import

* doc example fix

* doc example output fix

* add pytest-asyncio

* first attempt at tests

* missing import

* add pytest-asyncio

* fallback to wait_for and raise TimeoutError on timeout

* check for TimeoutError

* autodoc

* reorder imports

* fix style

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-20 12:08:12 +01:00
b5a557e5fe Reduce CircleCI usage (#35355)
* reduce 1

* reduce 1

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-20 10:18:15 +01:00
4e27a4009d FEAT : Adding VPTQ quantization method to HFQuantizer (#34770)
* init vptq

* add integration

* add vptq support

fix readme

* add tests && format

* format

* address comments

* format

* format

* address comments

* format

* address comments

* remove debug code

* Revert "remove debug code"

This reverts commit ed3b3eaaba82caf58cb3aa6e865d98e49650cf66.

* fix test

---------

Co-authored-by: Yang Wang <wyatuestc@gmail.com>
2024-12-20 09:45:53 +01:00
5a2aedca1e [Mamba2] Fix caching, slow path, and multi-gpu (#35154)
* fixup mamba2 - caching and several other small fixes

* fixup cached forward

* correct fix this time

* fixup cache - we do not need to extend the attn mask it's handled by generate (gives total ids + mask at each step)

* remove unnecessary (un)squeeze

* fixup cache position

* simplify a few things

* [run-slow] mamba2

* multi gpu attempt two

* [run-slow] mamba2

* [run-slow] mamba2

* [run-slow] mamba2

* [run-slow] mamba2

* add newer slow path fix

* [run-slow] mamba2
2024-12-20 09:27:47 +01:00
ff9141bb85 fix onnx export of speech foundation models (#34224)
* added expanded attention/padding masks prior to indexing the hidden_states

* consistency fix in WavLMForSequenceClassification

---------

Co-authored-by: Nikos Antoniou <nikosantoniou@Nikos-MacBook-Pro.local>
2024-12-20 09:22:05 +01:00
f42084e641 [docs] Add link to ModernBERT Text Classification GLUE finetuning script (#35347)
Add link to ModernBERT Text Classification GLUE finetuning script
2024-12-19 14:45:52 -08:00
0ade1caa35 Modernbert Release Fixes (#35344)
* fix ForSequenceClassification

* unmodularize rope layer

* fix linting warning

* Avoid complex PoolingHead, only one prediction head needed

---------

Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
2024-12-19 17:22:37 +01:00
1fa807fa63 Fix some fa2 tests (#35340)
* remove fa2 test

* remove other failing tests

* style
2024-12-19 17:05:25 +01:00
667ed5635e Add ModernBERT to Transformers (#35158)
* initial cut of modernbert for transformers

* small bug fixes

* fixes

* Update import

* Use compiled mlp->mlp_norm to match research implementation

* Propagate changes in modular to modeling

* Replace duplicate attn_out_dropout in favor of attention_dropout

cc @warner-benjamin let me know if the two should remain separate!

* Update BOS to CLS and EOS to SEP

Please confirm @warner-benjamin

* Set default classifier bias to False, matching research repo

* Update tie_word_embeddings description

* Fix _init_weights for ForMaskedLM

* Match base_model_prefix

* Add compiled_head to match research repo outputs

* Fix imports for ModernBertForMaskedLM

* Just use "gelu" default outright for classifier

* Fix config name typo: initalizer -> initializer

* Remove some unused parameters in docstring. Still lots to edit there!

* Compile the embeddings forward

Not having this resulted in very slight differences - so small it wasn't even noticed for the base model, only for the large model.

But the tiny difference for large propagated at the embedding layer through the rest of the model, leading to notable differences of ~0.0084 average per value, up to 0.2343 for the worst case.

* Add drafts for ForSequenceClassification/ForTokenClassification

* Add initial SDPA support (not exactly equivalent to FA2 yet!)

During testing, FA2 and SDPA still differ by about 0.0098 per value in the token embeddings. It still predicts the correct mask fills, but I'd like to get it fully 1-1 if possible.

* Only use attention dropout if training

* Add initial eager attention support (also not equivalent to FA2 yet!)

Frustratingly, I also can't get eager to be equivalent to FA2 (or sdpa), but it does get really close, i.e. avg ~0.010 difference per value.

Especially if I use fp32 for both FA2&eager, avg ~0.0029 difference per value

The fill-mask results are good with eager.

* Add initial tests, output_attentions, output_hidden_states, prune_heads

Tests are based on BERT, not all tests pass yet: 23 failed, 79 passed, 100 skipped

* Remove kwargs from ModernBertForMaskedLM

Disable sparse_prediction by default to match the normal HF, can be enabled via config

* Remove/adjust/skip improper tests; warn if padding but no attn mask

* Run formatting etc.

* Run python utils/custom_init_isort.py

* FlexAttention with unpadded sequences(matches FA2 within bf16 numerics)

* Reformat init_weights based on review

* self -> module in attention forwards

* Remove if config.tie_word_embeddings

* Reformat output projection on a different line

* Remove pruning

* Remove assert

* Call contiguous() to simplify paths

* Remove prune_qkv_linear_layer

* Format code

* Keep as kwargs, only use if needed

* Remove unused codepaths & related config options

* Remove 3d attn_mask test; fix token classification tuple output

* Reorder: attention_mask above position_ids, fixes gradient checkpointing

* Fix usage if no FA2 or torch v2.5+

* Make torch.compile/triton optional

Should we rename 'compile'? It's a bit vague

* Separate pooling options into separate functions (cls, mean) - cls as default

* Simplify _pad_modernbert_output, remove unused labels path

* Update tied weights to remove decoder.weight, simplify decoder loading

* Adaptively set config.compile based on hf_device_map/device/resize, etc.

* Update ModernBertConfig docstring

* Satisfy some consistency checks, add unfinished docs

* Only set compile to False if there's more than 1 device

* Add docstrings for public ModernBert classes

* Dont replace docstring returns - ends up being duplicate

* Fix mistake in toctree

* Reformat toctree

* Patched FlexAttention, SDPA, Eager with Local Attention

* Implement FA2 -> SDPA -> Eager attn_impl defaulting, crucial

both to match the original performance, and to get the highest inference speed without requiring users to manually pick FA2

* Patch test edge case with Idefics3 not working with 'attn_implementation="sdpa"'

* Repad all_hidden_states as well

* rename config.compile to reference_compile

* disable flex_attention since it crashes

* Update modernbert.md

* Using dtype min to mask in eager

* Fully remove flex attention for now

It's only compatible with the nightly torch 2.6, so we'll leave it be for now. It's also slower than eager/sdpa.

Also, update compile -> reference_compile in one more case

* Call contiguous to allow for .view()

* Copyright 2020 -> 2024

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update/simplify __init__ structure

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Remove "... if dropout_prob > 0 else identity"

As dropout with 0.0 should be efficient like identity

* re-use existing pad/unpad functions instead of creating new ones

* remove flexattention method

* Compute attention_mask and local_attention_mask once in modeling

* Simplify sequence classification prediction heads, only CLS now

Users can make custom heads if they feel like it

Also removes the unnecessary pool parameter

* Simplify module.training in eager attn

* Also export ModernBertPreTrainedModel

* Update the documentation with links to finetuning scripts

* Explain local_attention_mask parameter in docstring

* Simplify _autoset_attn_implementation, rely on super()

* Keep "in" to initialize Prediction head

Doublechecked with Benjamin that it's correct/what we used for pretraining

* add back mean pooling

* Use the pooling head in TokenClassification

* update copyright

* Reset config._attn_implementation_internal on failure

* Allow optional attention_mask in ForMaskedLM head

* fix failing run_slow tests

* Add links to the paper

* Remove unpad_no_grad, always pad/unpad without gradients

* local_attention_mask -> sliding_window_mask

* Revert "Use the pooling head in TokenClassification"

This reverts commit 99c38badd1dbce01d7aef41095fbf2f5cce87279.

There was no real motivation, no info on whether having this bigger head does anything useful.

* Simplify pooling, 2 options via if-else

---------

Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Co-authored-by: Said Taghadouini <taghadouinisaid@gmail.com>
Co-authored-by: Benjamin Clavié <ben@clavie.eu>
Co-authored-by: Antoine Chaffin <ant54600@hotmail.fr>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-19 14:03:35 +01:00
56ff1e92fd PaliGemma: Make sure to add <eos> to suffix if <image> is present in text (#35201)
Move suffix processing code to out of if statement
2024-12-19 09:53:48 +01:00
4592cc9e98 Update comment CI bot (#35323)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-19 09:45:27 +01:00
d19b11f59b Fix documentation for ColPali (#35321)
* docs: fix typo quickstart snippet in ColPali's model card

* docs: clean the ColPali's model card

* docs: make the `ColPaliForRetrieval`'s docstring more concise

* docs: add missing bash command used to convert weights for `vidore/colpali-v1.3-hf`
2024-12-19 09:08:28 +01:00
9613933b02 Add the Bamba Model (#34982)
* initial commit for PR

Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>

* rename dynamic cache

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* add more unit tests

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* add integration test

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* add integration test

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* Add modular bamba file

* Remove trainer changes from unrelated PR

* Modify modular and cofig to get model running

* Fix some CI errors and beam search

* Fix a plethora of bugs from CI/docs/etc

* Add bamba to models with special caches

* Updat to newer mamba PR for mamba sublayer

* fix test_left_padding_compatibility

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* fix style

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* fix remaining tests

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* missed this test

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* ran make style

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* move slow tag to integration obj

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* make style

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* address comments

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* fix modular

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* left out one part of modular

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* change model

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* Make Rotary modular as well

* Update bamba.md

Added overview, update Model inference card and added config

* Update bamba.md

* Update bamba.md

* Update bamba.md

Minor fixes

* Add docs for config and model back

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Add warning when using fast kernels

* replaced generate example

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>

* Address comments from PR

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Propagate attention fixes

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Fix attention interfaces to the new API

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Fix API for decoder layer

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

* Remove extra weights

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

---------

Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com>
Co-authored-by: Antoni Viros i Martin <aviros@ibm.com>
Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com>
Co-authored-by: Antoni Viros <ani300@gmail.com>
2024-12-18 20:18:17 +01:00
9a94dfe123 feat: add benchmarks_entrypoint.py (#34495)
* feat: add `benchmarks_entrypoint.py`

Adding `benchmarks_entrypoint.py` file, which will be run from the
benchmarks CI.

This python script will list all python files from the `benchmark/`
folder and run the included `run_benchmark` function, allowing people to
add new benchmarks scripts.

* feat: add `MetricsRecorder`

* feat: update dashboard

* fix: add missing arguments to `MetricsRecorder`

* feat: update dash & add datasource + `default.yml`

* fix: move responsibility to create `MetricsRecorder` in bench script

* fix: update incorrect datasource UID

* fix: incorrect variable values

* debug: benchmark entrypoint script

* refactor: update log level

* fix: update broken import

* feat: add debug log in `MetricsRecorder`

* debug: set log level to debug

* fix: set connection `autocommit` to `True`
2024-12-18 18:59:07 +01:00
2c47618c1a 🚨All attention refactor🚨 (#35235)
* refactor LlamaAttention

* minimal changes

* fix llama

* update

* modular gemmas

* modular nits

* modular updates

* nits

* simplify

* gpt2

* more modualr and fixes

* granite

* modular modular modular

* nits

* update

* qwen2 + starcoder2

* mostly gemma2

* Update image_processing_auto.py

* fix

* Update modular_starcoder2.py

* fix

* remove all copied from attentions

* remove gcv

* make fix-copies

* oups

* oups2.0

* fix some modulars + all copied from

* should be good now

* revert unwanted changes

* Update modeling_decision_transformer.py

* finish cleanup

* Update modeling_olmo.py

* consistency

* re-add gradient checkpointing attribute

* fix

* style

* make config necessary

* bis

* bis

* Update modeling_my_new_model2.py

* is_causal attr

* fix

* remove past kv return from decoder layer

* fix

* default rope config

* correctly fix rope config

* fix bias

* fix gpt2 attention output

* fix test

* fix inits

* fix default sdpa

* fix default sdpa implementation

* harmonize classes

* fix mistral

* fix sliding window models

* mixtral

* be more explicit

* style

* fix

* several fixes

* Update modeling_dbrx.py

* fix test

* olmo + phi

* rotary

* syle

* phi

* phi again

* again

* kwargs

* Update test_modeling_common.py

* skip fx tracing tests

* Update modeling_utils.py

* gemma 2

* again

* Update modeling_recurrent_gemma.py

* gemma2

* granite

* style

* starcoder

* Update sdpa_attention.py

* switch args

* Update modeling_mllama.py

* fix

* cache type tests

* gpt2

* Update test_modeling_common.py

* fix

* consistency

* fix shape with encoder

* should be the last one

* tests non model

* most comments

* small oupsi

* be more explicit in modulars

* more explicit modulars

* CIs! it works locally

* add kwargs to _flash_attention_forward

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2024-12-18 16:53:39 +01:00
75be5a0a5b [Whisper] fix docstrings typo (#35319)
typos docstring
2024-12-18 16:38:19 +01:00
69e31eb1bf change bnb tests (#34713)
* fix training tests

* fix xpu check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm pdb

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix 4bit logits check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix 4bit logits check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add xpu check on int8 training

* fix training tests

* add llama test on bnb

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* only cpu and xpu disable autocast training

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Titus <9048635+Titus-von-Koeller@users.noreply.github.com>
2024-12-18 09:49:59 -05:00
da334bcfa8 [Whisper] 🚨 Fix whisper decoding 🚨 (#34135)
* do not remove decoder_input_ids for the first segment

* do not remove eos token in generate_with_fallback

* when removing padding tokens, do not remove eos token

* remove eos token in generate (and not in generate_with_fallback!)

* reconciliate short-from/ long-form behavior

* correct avg_logprobs calculation

* handle eos token in segments

* handle decoder_input_ids and eos token in _prepare_decoder_input_ids

* fix incorrect time precision

* always remove eos token

* always remove decoder_input_ids

* no need to handle decoder_inputs_ids and eos token

* no need to remove decoder_input_ids

* no need to handle eos token

* fix num_beams in _retrieve_logit_processors

* remove todo unconsistency

* no need to add eos token

* last_timestamp_pos should indeed be timestamp token pos

* patch generate to enable compatibility with GenerationTesterMixin tests

* adapt test_generate_continue_from_past_key_values

* adapt test_prompt_lookup_decoding_matches_greedy_search

* adapt generic GenerationMixin tests to whisper's generate

* fix speculative decoding

* fix

* [run-slow] whisper

* change HF_HUB_TOKEN for require_read_token

* [run-slow] whisper

* prioritize kwargs over generation_config

* remove unnecessary args

* [run-slow] whisper

* update tests

* [run-slow] whisper

* add comment

* update test

* [run-slow] whisper

* update test + revert require_read_token

* docstring updates

* revert tokenizer decode args change

* do not use a patch + docstring updates

* [run-slow] whisper

* make

* [run-slow] whisper

* add a flag to force unique call to generate

* test update

* [run-slow] whisper

* add force_unique_generate_call arg

* do not use a patch

* correct the timestamps for the pad tokens

* docstring update

* docstring update

* docstring update

* upodate TF tests

* add require_read_token

* [run-slow] whisper

* test reset dynamo

* [run-slow] whisper

* fix

* [run-slow] whisper

* avoid iterating twice on current_segments

* [run-slow] whisper

* [run-slow] whisper

---------

Co-authored-by: Eustache Le Bihan <eustlb@users.noreply.huggingface.co>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-18 14:13:21 +01:00
f1b7634fc8 Trigger GitHub CI with a comment on PR (#35211)
* fix

* fix

* comment

* final

* final

* final

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-18 13:56:49 +01:00
c7e48053aa [tests] make cuda-only tests device-agnostic (#35222)
fix cuda-only tests
2024-12-18 10:14:22 +01:00
1eee1cedfd Fix loading with only state dict and low_cpu_mem_usage = True (#35217)
* fix loading with only state dict and config

* style

* add tests

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-12-18 09:54:32 +01:00
0531d7513b [docs] Improve register_pipeline (#35300)
register_pipeline
2024-12-17 10:27:23 -08:00
UV
77080f023f Fixed typo in audio_classification.md (#35305) 2024-12-17 09:45:51 -08:00
8bfd7eeeef Add Cohere2 docs details (#35294)
* Add Cohere2 docs details

* Update docs/source/en/model_doc/cohere2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-12-17 09:36:31 -08:00
a7feae190f Fix remove unused parameter in docs (#35306)
remove unused parameter in example

Co-authored-by: zzzzzsa <zzzzzsaqwq@gmail.com>
2024-12-17 09:34:41 -08:00
927c3e39ec Fix image preview in multi-GPU inference docs (#35303)
fix: link for img
2024-12-17 09:33:50 -08:00
4302b27719 Fix typos in translated quicktour docs (#35302)
* fix: quicktour typos

* fix: one more
2024-12-17 09:32:00 -08:00
deac971c46 🚨🚨🚨 Limit backtracking in Nougat regexp (#35264)
* Limit backtracking in regexp

* Update

* [run-slow] nougat

* Update
2024-12-17 16:34:18 +00:00
d29a06e39a remove benchmark job in push-important-models.yml (#35292)
remove-bench

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-17 17:27:26 +01:00
e0ae9b5974 🚨🚨🚨 Delete conversion scripts when making release wheels (#35296)
* Delete conversion scripts when making release wheels

* make fixup

* Update docstring
2024-12-17 14:18:42 +00:00
6eb00dd2f0 Support for SDPA for SAM models (#34110)
* feat: add support for sdpa and gradient checkpointing

* fix: ruff format

* fix: config sdpa

* fix: sdpa layer naming convention

* fix: update test_eager_matches_sdpa_inference to handle vision_hidden_states

* test: skip incompatible tests and fix loading issue with sdpa

- Updated tests to skip cases flash and dynamic compile.
- Minor adjustment to ensure correct loading of model with sdpa for dispatch test.

* style: apply Ruff formatting

* ruff fix again after rebase

* [run-slow] sam

* [run-slow] sam

* refactor: Address review comments and improve sub-config handling in SAM model tests

- Added attributes for sub_configs as per PR #34410.
- Enabled tests for configs, ensuring the composite model (SAM) has several sub-configs in the main config.
- Added class attribute _is_composite=True to the tester class
- test_sdpa_can_dispatch_composite_models added

* [run-slow] sam

* style: ruff

* [run-slow] sam

* style: ruff again ...

* [run-slow] sam
2024-12-17 14:46:05 +01:00
747f361da1 Add sdpa for Beit (#34941)
* Add sdpa for Beit

* Updates

* [run-slow] beit

* Update inference benchmarks

* Update

* Fix - add missed to super().forward()

* Updates

* Fix missing import
2024-12-17 14:44:47 +01:00
6c08b3b6e5 Add Falcon3 documentation (#35307)
* Add Falcon3 documentation

* Update Falcon3 documentation

* Change Falcon to Falcon3

* Update docs and run make fix-copies

* Add blog post and huggingface models links
2024-12-17 14:23:13 +01:00
f33a0cebb3 Add ColPali to 🤗 transformers (#33736)
* feat: run `add-new-model-like`

* feat: add paligemma code with "copied from"

* feat: add ColPaliProcessor

* feat: add ColPaliModel

* feat: add ColPaliConfig

* feat: rename `ColPaliForConditionalGeneration` to `ColPaliModel`

* fixup modeling colpali

* fix: fix root import shortcuts

* fix: fix `modeling_auto` dict

* feat: comment out ColPali test file

* fix: fix typos from `add-new-model-like`

* feat: explicit the forward input args

* feat: move everything to `modular_colpali.py`

* fix: put back ColPaliProcesor

* feat: add auto-generated files

* fix: run `fix-copies`

* fix: remove DOCStRING constants to make modular converter work

* fix: fix typo + modular converter

* fix: add missing imports

* feat: no more errors when loading ColPaliModel

* fix: remove unused args in forward + tweak doc

* feat: rename `ColPaliModel` to `ColPaliForRetrieval`

* fix: apply `fix-copies`

* feat: add ColPaliProcessor to `modular_colpali`

* fix: run make quality + make style

* fix: remove duplicate line in configuration_auto

* feat: make ColPaliModel inehrit from PaliGemmaForConditionalGeneration

* fix: tweak and use ColPaliConfig

* feat: rename `score` to `post_process_retrieval`

* build: run modular formatter + make style

* feat: convert colpali weights + fixes

* feat: remove old weight converter file

* feat: add and validate tests

* feat: replace harcoded path to "vidore/colpali-v1.2-hf" in tests

* fix: add bfloat16 conversion in weight converter

* feat: replace pytest with unittest in modeling colpali test

* feat: add sanity check for weight conversion (doesn't work yet)

* feat: add shape sanity check in weigth converter

* feat: make ColPaliProcessor args explicit

* doc: add doc for ColPali

* fix: trying to fix output mismatch

* feat: tweaks

* fix: ColPaliModelOutput inherits from ModelOutput instead of PaliGemmaCausalLMOutputWithPast

* fix: address comments on PR

* fix: adapt tests to the Hf norm

* wip: try things

* feat: add `__call__` method to `ColPaliProcessor`

* feat: remove need for dummy image in `process_queries`

* build: run new modular converter

* fix: fix incorrect method override

* Fix tests, processing, modular, convert

* fix tokenization auto

* hotfix: manually fix processor -> fixme once convert modular is fixed

* fix: convert weights working

* feat: rename and improve convert weight script

* feat: tweaks

* fest: remove `device` input for `post_process_retrieval`

* refactor: remove unused `get_torch_device`

* Fix all tests

* docs: update ColPali model doc

* wip: fix convert weights to hf

* fix logging modular

* docs: add acknowledgements in model doc

* docs: add missing docstring to ColPaliProcessor

* docs: tweak

* docs: add doc for `ColPaliForRetrievalOutput.forward`

* feat: add modifications from colpali-engine v0.3.2 in ColPaliProcessor

* fix: fix and upload colapli hf weights

* refactor: rename `post_process_retrieval` to `score_retrieval`

* fix: fix wrong typing for `score_retrieval`

* test: add integration test for ColPali

* chore: rerun convert modular

* build: fix root imports

* Update docs/source/en/index.md

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* fix: address PR comments

* wip: reduce the prediction gap in weight conversion

* docs: add comment in weight conversion script

* docs: add example for `ColPaliForRetrieval.forward`

* tests: change dataset path to the new one in hf-internal

* fix: colpali weight conversion works

* test: add fine-grained check for ColPali integration test

* fix: fix typos in convert weight script

* docs: move input docstring in a variable

* fix: remove hardcoded torch device in test

* fix: run the new modular refactor

* docs: fix python example for ColPali

* feat: add option to choose `score_retrieval`'s output dtype and device

* docs: update doc for `score_retrieval`

* feat: add `patch_size` property in ColPali model

* chore: run `make fix-copies`

* docs: update description for ColPali cookbooks

* fix: remove `ignore_index` methods

* feat: remove non-transformers specific methods

* feat: update `__init__.py` to new hf format

* fix: fix root imports in transformers

* feat: remove ColPali's inheritance from PaliGemma

* Fix CI issues

* nit remove prints

* feat: remove ColPali config and model from `modular_colpali.py`

* feat: add `ColPaliPreTrainedModel` and update modeling and configuration code

* fix: fix auto-removed imports in root `__init__.py`

* fix: various fixes

* fix: fix `_init_weight`

* temp: comment `AutoModel.from_config` for experiments

* fix: add missing `output_attentions` arg in ColPali's forward

* fix: fix `resize_token_embeddings`

* fix: make `input_ids` optional in forward

* feat: rename `projection_layer` to `embedding_proj_layer`

* wip: fix convert colpali weight script

* fix tests and convert weights from original repo

* fix unprotected import

* fix unprotected torch import

* fix style

* change vlm_backbone_config to vlm_config

* fix unprotected import in modular this time

* fix: load config from Hub + tweaks in convert weight script

* docs: move example usage from model docstring to model markdown

* docs: fix input docstring for ColPali's forward method

* fix: use `sub_configs` for ColPaliConfig

* fix: remove non-needed sanity checks in weight conversion script + tweaks

* fix: fix issue with `replace_return_docstrings` in ColPali's `forward`

* docs: update docstring for `ColPaliConfig`

* test: change model path in ColPali test

* fix: fix ColPaliConfig

* fix: fix weight conversion script

* test: fix expected weights for ColPali model

* docs: update ColPali markdown

* docs: fix minor typo in ColPaliProcessor

* Fix tests and add _no_split_modules

* add text_config to colpali config

* [run slow] colpali

* move inputs to torch_device in integration test

* skip test_model_parallelism

* docs: clarify quickstart snippet in ColPali's model card

* docs: update ColPali's model card

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2024-12-17 11:26:43 +01:00
a7f5479b45 fix modular order (#35297)
* fix modular ordre

* fix

* style
2024-12-17 08:05:35 +01:00
UV
f5620a7634 Improved documentation of Automatic speech recognition (#35268)
Improved documentation quality of Automatic speech recognition
2024-12-16 09:50:11 -08:00
eb92bc44b7 Fix wrongs in quicktour[zh] (#35272)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-12-16 09:23:34 -08:00
886f690e76 Translating "translate perf_infer_gpu_multi.md" to Chinese (#35271)
add "translate perf_infer_gpu_multi"
2024-12-16 09:22:35 -08:00
22834eeba1 Fix typos in Translated Audio Classification Docs (#35287)
* fix: qwen2 model ids

* fix: line

* fix: more format

* update: reformat

* fix: doc typos
2024-12-16 08:51:32 -08:00
9feae5fb01 [Whisper] patch float type on mps (#35295)
* fix float type on mps

* make
2024-12-16 16:52:47 +01:00
d5b81e1ca1 Delete redundancy for loop checks. (#35288)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2024-12-16 13:36:27 +00:00
d0f32212ed Temporarily disable amd push ci (#35293)
Temporarily disable amd push ci (reduce noise)
2024-12-16 14:18:50 +01:00
85eb339231 Fix : model used to test ggml conversion of Falcon-7b is incorrect (#35083)
fixing test model
2024-12-16 13:21:44 +01:00
14910281a7 Blip: fix offloading and MP tests (#35239)
* fix device map

* fix offloading + model parallel test
2024-12-16 12:44:33 +01:00
66531a1ec3 Aggeregate test summary files in CircleCI workflow runs (#34989)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* try 1

* fix

* fix

* fix

* update

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-16 11:06:17 +01:00
5615a39369 Fall back to slow image processor in ImageProcessingAuto when no fast processor available (#34785)
* refactor image_processing_auto logic

* fix fast image processor tests

* Fix tests fast vit image processor

* Add safeguard when use_fast True and torchvision not available

* change default use_fast back to None, add warnings

* remove debugging print

* call get_image_processor_class_from_name once
2024-12-15 14:00:36 -05:00
ca03842cdc [i18n-Chinese] Translating perf_train_cpu.md to Chinese (#35242)
add "1"
2024-12-13 14:46:49 -08:00
add53e25ff don't use no_sync when deepspeed doesn't support it for certain zero stages (#35157)
* don't use no_sync when deepspeed doesn't support it for certain zero stages

* chore: lint

* fix no_sync context for deepspeed across all zero types

* chore: lint
2024-12-13 19:23:00 +01:00
7237b3ecfc Fix FSDP no longer working (#35212)
Fix FSDP failing
2024-12-13 19:20:51 +01:00
6009642459 Translating agents_advanced.md to Chinese (#35231)
add "translate agents_advanced"
2024-12-13 10:12:00 -08:00
UV
e94083bf90 Fixed typos in Audio Classification Documentation (#35263)
* Fixed typos in Audio Classification Documentation

* removed space in '8000 kHZ'

* Changes made as per review
2024-12-13 09:43:44 -08:00
bc6ae0d55e Update AMD docker image (rocm 6.1) (#35259)
* Use rocm 6.3 as base amd image and add nvidia-ml-py to exclude list

* Align rocm base image with torch wheels @6.1. Seems like the most stable combo
2024-12-13 15:41:03 +01:00
8096161b76 Use rsfE with pytest (#35119)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-13 14:36:22 +01:00
bdd4201fdb [tests] fix "Tester object has no attribute '_testMethodName'" (#34910)
* add more cases

* fix method not found in unittest

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>

* fix more cases

* add more models

* add all

* no unittest.case

* remove for oneformer

* fix style

---------

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2024-12-13 14:33:45 +01:00
3d213b57fe skip Fuyu from test_generate (#35246)
* skip Fuyu from test_generate

* make fixup, quality, repo-consistency
2024-12-13 10:12:49 +01:00
64478c7631 Add Cohere2 model (#35224) 2024-12-13 09:35:50 +01:00
e4e404fdd0 Run model as compressed/uncompressed mode (#34719)
* draft, run model as compreszed/uncompressed mode

* draft

* run run_compressed=False

* run_compressed as attr

* set run_compressed=False using quantization_config

* remove redundant line

* make is_qat_trainable dependent on run_compressed status

* add tests

* lint

* full in docstring

* add decompress

* comments

* decompress if model is compresssed and not run_compressed

* apply_quant_config logic fix -- populate statedict properly

* comments

* remove non  compressed model

* make is_compressed as property

* cosmetic

* run apply_quant_config for non-compressed models -- popualte scales and zeropoints

* add pahtway for decompressing sparse models

* typo on is_quantization_compressed

* lint

* fix typo
2024-12-13 08:23:31 +01:00
31f9a289a6 Fix typo in chat template example (#35250)
Fix template example typo
2024-12-12 16:53:21 -08:00
11ba1d472c [Init refactor] Modular changes (#35240)
* Modular changes

* Gemma

* Gemma
2024-12-12 19:23:28 +01:00
a691ccb0c2 Change back to Thread for SF conversion (#35236)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-12 16:05:04 +01:00
e3ee49fcfb Refactoring AssistedCandidateGenerator for Improved Modularity and Reusability (#35009)
* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file

* refactor

* NOTHING. add space to rerun github actions tests

* remove it...

* NOTHING. add space to rerun github actions tests

* remove it...

* replace: `self.prev_tokens` -> `self.prev_assistant_ids`

* NOTHING. rerun CI tests

* remove it

* introduce `self.prev_target_ids_len`

* fix style

* fix style

---------

Co-authored-by: Jonathan Mamou <jonathan.mamou@intel.com>
2024-12-12 15:47:05 +01:00
63766abe36 Support Python 3.10+ Union style in chat template type hints parsing (#35103)
* fix(utils): Support the newest Union type in chat template

* fix(utils/chat_template): Backward compatibility for the newest Union type

* Update src/transformers/utils/chat_template_utils.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2024-12-12 14:07:06 +00:00
5cf11e5ab9 Fix type hints for apply_chat_template (#35216) 2024-12-12 13:59:24 +00:00
UV
3db8e27816 Fixed typo of 'indentifier' in audio_utils.py (#35226) 2024-12-12 13:45:04 +00:00
a9ccdfd8e3 docs: clarify initializer_range parameter description in Idefics3VisionConfig (#35215) 2024-12-11 11:26:18 -08:00
6181c6b095 Fix seamless TTS generate (#34968)
* fix seamless tts generate

* apply same fix for v2

* [run-slow] seamless_m4t, seamless_m4t_v2

* remove TODO

* [run-slow] seamless_m4t, seamless_m4t_v2

* [run-slow] seamless_m4t, seamless_m4t_v2

* ignore failing test on multigpus

* [run-slow] seamless_m4t, seamless_m4t_v2

* [run-slow] seamless_m4t, seamless_m4t_v2
2024-12-11 15:38:42 +01:00
33c12e4d80 Fix CI (#35208)
fix aria
2024-12-11 14:24:52 +01:00
7d303efa5f Cleanup: continue the init refactor (#35170)
* Round 2

* Round 3
2024-12-11 14:12:34 +01:00
5fcf6286bf Add TimmWrapper (#34564)
* Add files

* Init

* Add TimmWrapperModel

* Fix up

* Some fixes

* Fix up

* Remove old file

* Sort out import orders

* Fix some model loading

* Compatible with pipeline and trainer

* Fix up

* Delete test_timm_model_1/config.json

* Remove accidentally commited files

* Delete src/transformers/models/modeling_timm_wrapper.py

* Remove empty imports; fix transformations applied

* Tidy up

* Add image classifcation model to special cases

* Create pretrained model; enable device_map='auto'

* Enable most tests; fix init order

* Sort imports

* [run-slow] timm_wrapper

* Pass num_classes into timm.create_model

* Remove train transforms from image processor

* Update timm creation with pretrained=False

* Fix gamma/beta issue for timm models

* Fixing gamma and beta renaming for timm models

* Simplify config and model creation

* Remove attn_implementation diff

* Fixup

* Docstrings

* Fix warning msg text according to test case

* Fix device_map auto

* Set dtype and device for pixel_values in forward

* Enable output hidden states

* Enable tests for hidden_states and model parallel

* Remove default scriptable arg

* Refactor inner model

* Update timm version

* Fix _find_mismatched_keys function

* Change inheritance for Classification model (fix weights loading with device_map)

* Minor bugfix

* Disable save pretrained for image processor

* Rename hook method for loaded keys correction

* Rename state dict keys on save, remove `timm_model` prefix, make checkpoint compatible with `timm`

* Managing num_labels <-> num_classes attributes

* Enable loading checkpoints in Trainer to resume training

* Update error message for output_hidden_states

* Add output hidden states test

* Decouple base and classification models

* Add more test cases

* Add save-load-to-timm test

* Fix test name

* Fixup

* Add do_pooling

* Add test for do_pooling

* Fix doc

* Add tests for TimmWrapperModel

* Add validation for `num_classes=0` in timm config + test for DINO checkpoint

* Adjust atol for test

* Fix docs

* dev-ci

* dev-ci

* Add tests for image processor

* Update docs

* Update init to new format

* Update docs in configuration

* Fix some docs in image processor

* Improve docs for modeling

* fix for is_timm_checkpoint

* Update code examples

* Fix header

* Fix typehint

* Increase tolerance a bit

* Fix Path

* Fixing model parallel tests

* Disable "parallel" tests

* Add comment for metadata

* Refactor AutoImageProcessor for timm wrapper loading

* Remove custom test_model_outputs_equivalence

* Add require_timm decorator

* Fix comment

* Make image processor work with older timm versions and tensor input

* Save config instead of whole model in image processor tests

* Add docstring for `image_processor_filename`

* Sanitize kwargs for timm image processor

* Fix doc style

* Update check for tensor input

* Update normalize

* Remove _load_timm_model function

---------

Co-authored-by: Amy Roberts <22614925+amyeroberts@users.noreply.github.com>
2024-12-11 12:40:30 +00:00
bcc50cc7ce [PEFT] Better Trainer error when prompt learning with loading best model at the end (#35087)
Original issue: https://github.com/huggingface/peft/issues/2256

There is a potential error when using load_best_model_at_end=True with a
prompt learning PEFT method. This is because Trainer uses load_adapter
under the hood but with some prompt learning methods, there is an
optimization on the saved model to remove parameters that are not
required for inference, which in turn requires a change to the model
architecture. This is why load_adapter will fail in such cases and users
should instead set load_best_model_at_end=False and use
PeftModel.from_pretrained. As this is not obvious, we now intercept the
error and add a helpful error message.
2024-12-11 12:44:39 +01:00
d363e71d0e 🧹 Remove deprecated RotaryEmbedding parts in the Attention layers (#34858)
* update

* style

* fix missing args

* remove last trace of old rope classes

* remove deprecated copied from

* fix copies

* trigger CIs

* post rebase clean-up

* reverse mistral

* cleanup after dropping commits

* Add comment
2024-12-11 11:16:52 +01:00
9094b87dd4 BLIP: enable device map (#34850)
fix device map
2024-12-11 11:03:30 +01:00
10feacd88a [i18n-<languageCode>] Translating agents.md to Chinese (#35139)
* add "translate agents.md"

* add "agents.md"

* add "translate warnings"

* add "totree"

* add "remove transformer_agent"

* add "remove transformer _agent file"

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-12-10 15:16:37 -08:00
e8508924fd Update data collator docstrings to accurately reference Nvidia tensor core compute capability version (#35188)
update data collator docs to reflect correct tensor core compute capability

Co-authored-by: John Graham Reynolds <john.graham.reynolds@vumc.org>
2024-12-10 15:16:01 -08:00
5290f6a62d [docs] Fix FlashAttention link (#35171)
fix link
2024-12-10 11:36:25 -08:00
91b8ab18b7 [i18n-<languageCode>] Translating Benchmarks.md to Chinese (#35137)
* add "Translating Benchmarks.md to Chinese "

* Removed all the English original text (which was previously kept as comments in the document) and refined some of the Chinese expressions.
2024-12-10 09:58:47 -08:00
217c47e31b Only import torch.distributed if it is available (#35133) 2024-12-10 18:19:30 +01:00
52d135426f Multiple typo fixes in NLP, Audio docs (#35181)
Fixed multiple typos in Tutorials, NLP, and Audio sections
2024-12-10 09:08:55 -08:00
425af6cdc2 [i18n-ar] Translated file : docs/source/ar/community.md into Arabic (#33027)
* Add docs/source/ar/community.md to Add_docs_source_ar_community.md

* Update community.md

* Update community.md

* Update community.md

* Update _toctree.yml - add community.md

* Update docs/source/ar/community.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Create how_to_hack_models.md

* Create modular_transformers.md

* Create tiktoken.md

* Update _toctree.yml

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/how_to_hack_models.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/modular_transformers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tiktoken.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/tiktoken.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2024-12-10 09:08:27 -08:00
e5c45a6679 Fixing GGUF support for StableLm (#35060)
fix

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-12-10 16:30:09 +01:00
3e2769a3c9 Fix DBRX LayerNorm init method (#35177)
fix dbrx layernorm init
2024-12-10 14:31:22 +00:00
5fba3f99c0 Remove unnecessary masked_fill in deberta models (#35182) 2024-12-10 13:52:20 +00:00
6acb4e43a7 Support BatchNorm in Hubert pos_conv_emb as in fairseq (#34389)
* Support BatchNorm in Hubert pos_conv_emb as in fairseq

* Correct the new defaults (#34377)

* Correct the new defaults

* CIs

* add check

* Update utils.py

* Update utils.py

* Add the max_length in generate test checking shape without passing length

* style

* CIs

* fix fx CI issue

* [auto. ping] Avoid sending empty info + add more team members (#34383)

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix glm  (#34388)

* Fix duplicated

* fix import

* Use non nested images and batched text Idefics2/3  (#34222)

* add support for non nested images and add tests

* add tests error scenario

* fix style

* added single and no image to error tests

* Fix onnx non-expotable inplace aten op (#34376)

* fix onnx non-expotable inplace op

* mistral, qwen2, qwen2_vl, starcoder2

* fixup copies

* Fix right padding in LLaVA models (#34305)

* fix right pad llavas

* device mismatch

* no filter (#34391)

* no filter

* no filter

* no filter

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* SynthID: better example (#34372)

* better example

* Update src/transformers/generation/configuration_utils.py

* Update src/transformers/generation/logits_process.py

* nits

* Tests: upgrade `test_eager_matches_sdpa_generate` (#34386)

* Fix bnb training test failure (#34414)

* Fix bnb training test: compatibility with OPTSdpaAttention

* Avoid check expected exception when it is on CUDA (#34408)

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typos in agents_advanced.md (#34405)

* [docs] Cache implementations (#34325)

cache

* [run-slow] hubert

* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable

* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes

* [run-slow] hubert

* Support BatchNorm in Hubert pos_conv_emb as in fairseq

* [run-slow] hubert

* Support BatchNorm in Hubert pos_conv_emb as in fairseq
Add conversion integration test, and make batchnorm explicit variable

* Support BatchNorm in Hubert pos_conv_emb as in fairseq
fix make fixup styling changes

* [run-slow] hubert

* [run-slow] hubert

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Rudy Delouya <rudy.delouya@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-12-10 14:18:23 +01:00
80f2b1610f Fix file path for shard_num 1 with mllama converter (#35053)
"#35049 fix path for num_shard 1"
2024-12-10 09:11:45 +00:00
0938b57770 Assisted decoding multi-gpu (#35116)
* fix

* move a few lines up
2024-12-10 09:59:17 +01:00
dada0fd85f Fix num_items_in_batch not being an integer (#35115)
In method `Trainer#get_batch_samples`, the return values should be a
list of batch samples and an integer indicating the number of items that
exist in the batch. However, this was not actually a case and what was
returned instead of an integer, was a tensor with one element. In the
multi-GPU setup, this tensor is placed in a different device than the
loss tensor, causing the loss function to raise a `RuntimeError`.

The problem arises from
5d7739f15a/src/transformers/trainer.py (L5139-L5144),
where the outer `sum` operates over a list of tensors which means that
the final result is also a tensor. To counter this issue, a new check
(after the accelerator gathering) has been added in order to convert a
potential tensor to an integer before returning the
`num_items_in_batch`.
2024-12-10 08:40:40 +01:00
34f4080ff5 [CI] Fix bnb quantization tests with accelerate>=1.2.0 (#35172) 2024-12-09 13:55:16 -05:00
UV
fa8763ce17 Fixed typo of 'avilable' in prompts.py (#35145) 2024-12-09 16:40:32 +00:00
4bc39de5c3 Super tiny fix logging message (#35132)
Update integration_utils.py
2024-12-09 16:31:32 +00:00
8e806a336f Cleanup: continue the init refactor (#35167)
Round 2
2024-12-09 16:09:50 +01:00
7238387f67 Fix typo in EETQ Tests (#35160)
fix
2024-12-09 14:13:36 +01:00
de8a0b7547 Option to set 'non_blocking' for to(device) in BatchEncoding and BatchFeature (#34883)
* Option to set 'non_blocking' for to(device) operation for performance improvements. Defaults to 'false', thus no behavioral changes.

* Enabling non_blocking in to() operation of BatchFeature.

* Improved docstring on utilization of non_blocking

* Force non_blocking as keyword argument

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Daniel Bogdoll <dbogdoll@umich.edu>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2024-12-09 11:29:04 +01:00
UV
1452dc2514 Corrected typo in agent system prompts (#35143) 2024-12-09 10:42:23 +01:00
9e420e0269 [I-JEPA] Update docs (#35148)
Update docs
2024-12-09 10:01:31 +01:00
1ccca8f48c Fix GA loss bugs and add unit test (#35121)
* fix GA bugs and add unit test

* narrow down model loss unit test diff gap

* format code to make ruff happy

* send num_items_in_batch argument to decoder

* fix GA loss bug in BertLMHeadModel

* use TinyStories-33M to narrow down diff gap

* fotmat code

* missing .config

* avoid add extra args

---------

Co-authored-by: kangsheng <kangsheng@meituan.com>
2024-12-09 09:57:41 +01:00
c8c8dffbe4 Update I-JEPA checkpoints path (#35120)
Update checkpoints path
2024-12-06 13:42:51 +00:00
7f95372c62 Add feature dim attributes to BitLinear for easier PEFT integration (#34946)
Update bitnet.py, extremely small change to allow for easier PEFT integration

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2024-12-06 13:39:45 +01:00
9ad4c93536 Add Aria (#34157)
* Add Aria
---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-06 12:17:34 +01:00
15ab310c3a Fix private forked repo. CI (#35114)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-06 12:03:31 +01:00
98e8062df3 [docs] top_p, top_k, temperature docstrings (#35065)
clarify
2024-12-05 11:24:51 -08:00
44f88d8ccb [docs] Update Python version in translations (#35096)
update: doc version
2024-12-05 11:06:54 -08:00
66ab300aaf Dev version 2024-12-05 19:12:22 +01:00
a5bb528471 Fix signatures for processing kwargs (#35105)
* add conversion script

* remove pg2 refs

* fixup style

* small update

* get correct scaling

* add back missing bos

* fix missing config keys

* might revert this pos_embeddings

* fixup 9b config

* fix 9b

* fixup 9b conversion for good + add back num_hidden_layers

* add correct query scaling for 2b, 9b, 27b

* fixup 27b conversion

* Additional variant: 27b-896

* Use CPU for conversion to reduce GPU RAM requirements

* fix causal mask generation + formatting

* fix in-training causal mask generation edge case

* trigger CI

* update config

* update config

* update config

* update config

* update config

* update config

* update config

* update config

* update config

* move conversion file to main model dir

* handle multi-images + bos token

* address comments for input ids

* revert ci fixes

* [run-slow] paligemma

* fix

* [run-slow] paligemma

* skip end 2 end

* [run-slow] paligemma

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-05 18:15:48 +01:00
e27465c801 Adaptive dynamic number of speculative tokens (#34156)
* initial commit

* update strategy

* add tradeoff FPR TPR with cost

* all probs

* fix

* fix

* fix style

* Update src/transformers/generation/configuration_utils.py

shorter docstring

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* import guard

* fix style

* add is_sklearn_available condition

* vectorizing to flatten the for-loop

* fix style

* disable adaptation for UAG

* update doc

* add TestAssistedCandidateGeneratorUpdateStrategy

* fix style

* protect import

* fix style

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-12-05 17:07:33 +01:00
b0a51e5cff Fix flaky Hub CI (test_trainer.py) (#35062)
* fix

* Update src/transformers/testing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* check

* check

* check

* check

* check

* check

* Update src/transformers/testing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Lucain <lucainp@gmail.com>

* check

* check

* check

* Final space

* Final adjustment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
2024-12-05 17:02:27 +01:00
a928d9c128 [trainer] fix the GA model_accepts_loss_kwargs (#34915)
* fix

* style

* values

* fix
2024-12-05 16:37:46 +01:00
e682c17e4a BLIP: this is correct now (#35081)
this is correct now
2024-12-05 16:30:09 +01:00
50189e36a6 Add I-JEPA (#33125)
* first draft

* add IJepaEmbeddings class

* fix copy-from for IJepa model

* add weight conversion script

* update attention class names in IJepa model

* style changes

* Add push_to_hub option to convert_ijepa_checkpoint function

* add initial tests for I-JEPA

* minor style changes to conversion script

* make fixup related

* rename conversion script

* Add I-JEPA to sdpa docs

* minor fixes

* adjust conversion script

* update conversion script

* adjust sdpa docs

* [run_slow] ijepa

* [run-slow] ijepa

* [run-slow] ijepa

* [run-slow] ijepa

* [run-slow] ijepa

* [run-slow] ijepa

* formatting issues

* adjust modeling to modular code

* add IJepaModel to objects to ignore in docstring checks

* [run-slow] ijepa

* fix formatting issues

* add usage instruction snippet to docs

* change pos encoding, add checkpoint for doc

* add verify logits for all models

* [run-slow] ijepa

* update docs to include image feature extraction instructions

* remove pooling layer from IJepaModel in image classification class

* [run-slow] ijepa

* remove pooling layer from IJepaModel constructor

* update docs

* [run-slow] ijepa

* [run-slow] ijepa

* small changes

* [run-slow] ijepa

* style adjustments

* update copyright in init file

* adjust modular ijepa

* [run-slow] ijepa
2024-12-05 16:14:46 +01:00
95a855e212 Deprecate quanto and switch to optimum-quanto (#35001)
* deprecate quanto

* fix style
2024-12-05 16:11:09 +01:00
482cb28a18 Fix tie_word_embeddings handling for GGUF models (#35085)
* fix tie_word_embeddings

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2024-12-05 16:00:41 +01:00
35447054f5 Update Mistral conversion script (#34829)
* Update convert_mistral_weights_to_hf.py

* Update convert_mistral_weights_to_hf.py

* Update convert_mistral_weights_to_hf.py
2024-12-05 15:47:20 +01:00
93f87d3cf5 [tokenizers] bump to 0.21 (#34972)
bump to 0.21
2024-12-05 15:46:02 +01:00
54aae121eb [Whisper] Fix whisper tokenizer (#34537)
* handle single timestamp ending

* include last timestamp token

* handle single timestamp ending

* avoid floating points arithm limitations

* ensure float64 operations

* new test

* make fixup

* make copies

* handle edge case double tokens ending with different tokens

* handle single timestamp ending

* make fixup

* handle conditioning on prev segments

* fix

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* [run-slow] whisper

* don't call item() to avoid unnecessary sync

* fix

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eustlb@users.noreply.huggingface.co>
2024-12-05 13:46:29 +01:00
beb2c66ec3 Informative (#35059)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-05 09:50:27 +01:00
1ed1de2fec [docs] Increase visibility of torch_dtype="auto" (#35067)
* auto-dtype

* feedback
2024-12-04 09:18:44 -08:00
baa3b22137 [docs] add a comment that offloading requires CUDA GPU (#35055)
* add commen to offloading

* Update docs/source/en/kv_cache.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-12-04 07:48:34 -08:00
1da1e0d7f2 Support for easier multimodal use of modular (#35056)
* update modular and add examples

* style

* improve example comments

* style

* fix small logic issue for imports

* fix relative order issue when files do not make sense

* Improve comments

* trigger CIs
2024-12-04 15:13:11 +01:00
46df859975 [GPTNeoX] Flex Attention + Refactor (#34896)
* gpt neox flex attention + refactor

* some formatting

* small fix on dropout

* add assertion on flex attn test

* flaky ci :(

* add head mask support

* style

* handle dtype, replace torch where

* fixup flex with output attns

* code review and several other fixes

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* style

* remove unnecessary comment

* remove incorrect comment

* make flex attn check more agnostic tor versions and centralized

* change peft input dtype check to value since q and k could be affected by other stuff like RoPE

* i forgor

* flaky

* code review and small fixes

* Update src/transformers/models/gpt_neox/modeling_gpt_neox.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-04 14:48:28 +01:00
accb7204f9 Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 (#35007)
* add base tp plan for qwen2 and qwen2moe

* add parallel tp for starcoder2

* fix modular conversion

* add infer dim for qkv states

* Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-12-04 14:43:36 +01:00
c7a109ec81 Fix pad_token_tensor is None in warning (#34005)
Fix pad_token_tensor is None in warning
2024-12-04 11:15:25 +01:00
329f5dbf97 [docs] use device-agnostic API instead of hard-coded cuda (#35048)
replace cuda
2024-12-03 10:54:15 -08:00
b8cdc262d5 [docs] use device-agnostic instead of cuda (#35047)
* fix on xpu

* [run_all]

* add the missing import for Image lib

* add more devices in comment

* bug fix

* replace cuda
2024-12-03 10:53:45 -08:00
346597b644 Translate community.md into Chinese (#35013)
* community translation

* Update docs/source/zh/community.md

Co-authored-by: Isotr0py <2037008807@qq.com>

---------

Co-authored-by: Isotr0py <2037008807@qq.com>
2024-12-03 10:22:02 -08:00
3deaa8179d [docs] fix example code bug (#35054)
fix code bug
2024-12-03 09:18:39 -08:00
125de41643 fix speecht5 failure issue in test_peft_gradient_checkpointing_enable… (#34454)
* fix speecht5 failure issue in test_peft_gradient_checkpointing_enable_disable

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* [run-slow] speecht5

---------

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
2024-12-03 13:58:54 +00:00
7a7f27697a Fix BertGeneration (#35043)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-12-03 13:56:59 +01:00
901f504580 Add token cost + runtime monitoring to Agent and HfEngine children (#34548)
* Add monitoring to Agent and HfEngine children
2024-12-03 13:14:52 +01:00
ee37bf0d95 Automatic compilation in generate: do not rely on inner function (#34923)
* compiled forward in PreTrainedModel

* update

* style

* update name

* trigger CIs

* Add way to use custom compile args

* style

* switch parameterization to generation_config

* Add to inits

* Update configuration_utils.py

* inits

* style

* docs

* style

* Update configuration_utils.py

* back without dataclass for repo consistency

* Update configuration_utils.py

* style

* style

* style once again

* add config serialization

* update

* true dataclass

* trigger CIs

* merge compile methods + remove serialization of compile config
2024-12-03 11:20:31 +01:00
f9c7e6021e Translate bertlogy.md into Chinese (#34908)
* bertology translation

* Update docs/source/zh/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/zh/bertology.md

Co-authored-by: blueingman <15329507600@163.com>

* Update docs/source/zh/bertology.md

Co-authored-by: blueingman <15329507600@163.com>

* Update docs/source/zh/bertology.md

Co-authored-by: Isotr0py <2037008807@qq.com>

* Update docs/source/zh/bertology.md

Co-authored-by: Isotr0py <2037008807@qq.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: blueingman <15329507600@163.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2024-12-02 11:42:40 -08:00
527dc04e46 [docs] add the missing import for Image and bug fix (#34776)
* add the missing import for Image lib

* add more devices in comment

* bug fix
2024-12-02 11:40:20 -08:00
4955e4e638 [i18n-ar] Translated file : docs/source/ar/notebooks.md into Arabic (#33049)
* Add docs/source/ar/notebooks.md to Add_docs_source_ar_notebooks.md

* Update notebooks.md

* Update _toctree.yml
2024-12-02 11:40:04 -08:00
f0dec874f0 add docstring example for compute_loss_func (#35020) 2024-12-02 11:39:09 -08:00
31299670cd Multiple typo fixes in Tutorials docs (#35035)
* Fixed typo in multi gpu docs and OLMoE version

* Fixed typos in docs for agents, agents advanced, knowledge distillation, and image feature extraction

* Fixed incorrect usage of model.image_guided_detection in zero shot object detection docs
2024-12-02 15:26:34 +00:00
31830474bf Fix test_eager_matches_sdpa_inference for XPU backend (#34889)
* Use torch.nn.attention.sdpa_kernel instead of deprecated torch.backends.cuda.sdp_kernel

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* Fix test_eager_matches_sdpa_inference for XPU backend

As of PyTorch 2.5 XPU backend supports only torch.nn.attention.SDPBackend.MATH
which is implemented on PyTorch level using aten operators and is device
agnostic with respect to implementation of each aten operator. Thus, we can
reuse CUDA (or CPU) MATH weights for XPU.

Fixes: #34888
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* Use torch.amp.autocast instead of deprecated torch.cuda.amp.autocast in nemotron

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

---------

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2024-12-02 16:21:04 +01:00
f41d5d8f74 Add type hints for forward functions in Gemma2 (#35034)
* feat: add gemma2 type hints

* fix: mask is optional
2024-12-02 14:03:36 +00:00
7b5f76e32e Typo in warning switching to optimum-quanto (#35028)
fix typos
2024-12-02 13:47:05 +00:00
c24c79ebf9 Optimize memory usage of mllama encoder (#34930)
mllama encoder memory optimization
2024-12-02 11:46:45 +01:00
9ab8c5b503 fix variable undefined bug when return_tensors is not specified in llava processing (#34953)
* fix variable undefined bug when return_tensors is not specified in llava processor

* improve readability
2024-12-02 11:44:42 +01:00
3480cbb97e Only cast cu_seqlens when tracing (#35016)
* Only cast `cu_seqlens` when tracing

* Formatting
2024-12-02 11:39:39 +01:00
19dabe9636 Update FillMaskPipeline.__call__ signature and docstring (#35006)
Update `FillMaskPipeline.__call__`

- Remove unused `*args`
- Update docstring with `inputs` over `args`
2024-11-29 13:44:56 +00:00
f7427f58ed fix: double verbs (#35008) 2024-11-29 13:19:57 +00:00
737f4dc4b6 Update timm version (#35005)
* Bump timm

* dev-ci
2024-11-29 12:46:59 +00:00
89d7bf584f 🚨🚨🚨 Uniformize kwargs for TrOCR Processor (#34587)
* Make kwargs uniform for TrOCR

* Add tests

* Put back current_processor

* Remove args

* Add todo comment

* Code review - breaking change
2024-11-29 11:58:11 +00:00
0b5b5e6a70 Let server decide default repo visibility (#34999)
* Let server decide default repo visibility

* code style
2024-11-28 17:05:08 +01:00
f491096f7d Fix docker CI : install autogptq from source (#35000)
* Fixed Docker

* Test ci

* Finally

* add comment
2024-11-28 16:31:36 +01:00
01ad80f820 Improve .from_pretrained type annotations (#34973)
* Fix from_pretrained type annotations

* Better typing for image processor's `from_pretrained`
2024-11-28 15:05:19 +00:00
9d6f0ddcec Add optimized PixtralImageProcessorFast (#34836)
* Add optimized PixtralImageProcessorFast

* make style

* Add dummy_vision_object

* Review comments

* Format

* Fix dummy

* Format

* np.ceil for math.ceil
2024-11-28 16:04:05 +01:00
6300212946 Fix utils/check_bad_commit.py (for auto ping in CI) (#34943)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-28 15:34:38 +01:00
5e8c1d713d Offloaded cache: fix generate (#34921)
* fix cache impl

* require_torch_gpu

* fix mamba

* fix copies
2024-11-28 15:05:56 +01:00
57ca9e6d2f Allow compressed-tensors quantized model to be trained (#34520)
* populate quantization_config for kv-cache-scheme only configs

* make compressed-tensors quantized models trainable

* populate versions on quant config

* pass oneshot then finetune

* remove breakpoint

* SunMarc comments and fix to_dict logic

* lint

* lint

* test

* comment

* comments'
2024-11-28 15:05:16 +01:00
44af935ec5 Refine the code of Universal Assisted Generation (#34823)
* removed the useless attritbutes

* add configs for window size

* fixed the wrong kwargs

* added docstring
2024-11-28 15:04:24 +01:00
2b053fdf1a 🚨🚨🚨 Changed DINOv2Config default patch size to 14 (#34568)
Changed DINOv2Config default patch size to 14
2024-11-28 14:48:06 +01:00
4f0bf9864c Fix save_pretrained for partially offloaded models (#34890)
* delete unnecessary reference

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* update comment, explicit delete state_dict

* Update src/transformers/modeling_utils.py

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* fix style

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-11-28 14:46:56 +01:00
f4b674f269 [PEFT] Set eval mode when loading PEFT adapter (#34509)
* [PEFT] Set eval mode when loading PEFT adapter

Resolves #34469

When calling model.load_adapter to load a PEFT adapter, by default the
adapter should be set to eval mode. This is now correctly done. Users
can still pass is_trainable=True to load the adapter in training mode.

* Linter
2024-11-28 13:56:25 +01:00
5523e38b55 Fixed typo in VisitWebpageTool (#34978)
Fixed typo in VisitWebpageTool
2024-11-27 12:49:21 -08:00
4120cb257f Fix typo in code block in vipllava.md (#34957)
fix typo in code block in vipllava.md
2024-11-27 08:19:34 -08:00
2910015d6d [i18n-zh]Translated perf_train_special.md into Chinese (#34948)
* Add translation for perf_train_special documentation

* Update docs/source/zh/perf_train_special.md

Co-authored-by: Isotr0py <2037008807@qq.com>

* Update docs/source/zh/perf_train_special.md

Co-authored-by: Isotr0py <2037008807@qq.com>

* Update _toctree.yml

* Update _toctree.yml

* Update perf_train_special.md

* Update perf_train_special.md

---------

Co-authored-by: Isotr0py <2037008807@qq.com>
2024-11-27 07:57:43 -08:00
637225508f [docs] add explanation to release_memory() (#34911)
* explain release_memory

* Update docs/source/en/llm_tutorial_optimization.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-11-27 07:47:28 -08:00
0600f46353 🌐 [i18n-KO] Translated encoder-decoder.md to Korean (#34880)
* Initial version of translation, english still remaining

* Revised Translation, removed english. _toctree not updated

* updated _toctree.yml && 3rd ver translation

* updated _toctree.yml && 3rd ver translation

* Update encoder-decoder.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update encoder-decoder.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update encoder-decoder.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update encoder-decoder.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update encoder-decoder.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update encoder-decoder.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2024-11-27 07:47:14 -08:00
5f8b24ee12 Fix flaky test execution caused by Thread (#34966)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-27 16:32:50 +01:00
0d99a938aa Avoid calling get_max_length (#34971)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-27 15:15:35 +01:00
8f48ccf548 Fix : Add PEFT from source to CI docker (#34969)
* Docker fix peft

* Test new docker

* uncomment
2024-11-27 14:10:47 +01:00
4c1388f48e [FlexAttention] Update gemma2 (#34942)
* update tests

* now maybe this fixes the previous fialing tests!

* nit default

* Update src/transformers/models/gemma2/modular_gemma2.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix-copies

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2024-11-27 11:50:48 +01:00
6c3f168b36 [i18n-zh]Translated tiktoken.md into chinese (#34936)
* Add translation for tiktoken documentation

* Update tiktoken.md

* Update tiktoken.md
2024-11-26 10:09:52 -08:00
5bfb40bc8e docs: HUGGINGFACE_HUB_CACHE -> HF_HUB_CACHE (#34904) 2024-11-26 09:37:18 -08:00
784d22078a [doc] use full path for run_qa.py (#34914)
use full path for run_qa.py
2024-11-26 09:23:44 -08:00
6bc0c219c1 [docs] use device-agnostic API instead of cuda (#34913)
add device-agnostic API

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2024-11-26 09:23:34 -08:00
64b73e61f8 [i18n-ar] Translated file : docs/source/ar/benchmarks.md into Arabic (#33023)
* Add docs/source/ar/benchmarks.md to Add_docs_source_ar_benchmarks.md

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/benchmarks.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update _toctree.yml

* Update benchmarks.md

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2024-11-26 09:23:11 -08:00
a0ba631519 Update the Python version in the Chinese README to match the English README. (#34870)
Update Python Version
2024-11-26 09:22:34 -08:00
1f6b423f0c Fix torch.onnx.export of Qwen2-VL vision encoder (#34852)
* Fix torch.onnx.export of Qwen2-VL vision encoder

This PR fixes onnx export support for the vision encoder of Qwen2-VL, which converts the `cu_seqlens` to `torch.int32`, leading to errors later on when using the values for slicing.

c57eafdaa1/src/transformers/models/qwen2_vl/modeling_qwen2_vl.py (L1044-L1046)

## Error:
```
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] (op_type:Slice, node name: /blocks.0/attn/Slice_4): axes has inconsistent type tensor(int64)
```

## Code to reproduce issue:
```py

import requests
from PIL import Image
import torch
from transformers import (
    AutoProcessor,
    Qwen2VLForConditionalGeneration,
)

# Constants
VISION_MODEL_NAME = "vision_encoder.onnx"

# Load model and processor
model_id = "hf-internal-testing/tiny-random-Qwen2VLForConditionalGeneration"
model = Qwen2VLForConditionalGeneration.from_pretrained(model_id).eval()
processor = AutoProcessor.from_pretrained(model_id)

# Prepare inputs
url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
conversation = [
    {
        "role": "user",
        "content": [
            { "type": "image" },
            { "type": "text", "text": "Describe this image."},
        ],
    },
]
images = [image]
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
inputs = processor(text=[text_prompt], images=images, padding=True, return_tensors="pt")

## Vision model
vision_inputs = dict(
    pixel_values=inputs["pixel_values"],
    grid_thw=inputs["image_grid_thw"],
)
vision_inputs_positional = tuple(vision_inputs.values())
vision_outputs = model.visual.forward(*vision_inputs_positional)  # Test forward pass
torch.onnx.export(
    model.visual,
    args=vision_inputs_positional,
    f=VISION_MODEL_NAME,
    export_params=True,
    opset_version=14,
    do_constant_folding=True,
    input_names=list(vision_inputs.keys()),
    output_names=["image_features"],
    dynamic_axes={
        "pixel_values": {
            0: "batch_size * grid_t * grid_h * grid_w",
            1: "channel * temporal_patch_size * patch_size * patch_size",
        },
        "grid_thw": {0: "batch_size"},
        "image_features": {0: "batch_size * grid_t * grid_h * grid_w"},
    },
)

# Load and check the exported model model
import onnx
model = onnx.load(VISION_MODEL_NAME)
onnx.checker.check_model(model, full_check=True)
inferred = onnx.shape_inference.infer_shapes(model, check_type=True)
```

* Formatting

* [run-slow] qwen2_vl
2024-11-26 16:14:36 +01:00
d5cf91b346 Separate chat templates into a single file (#33957)
* Initial draft

* Add .jinja file loading for processors

* Add processor saving of naked chat template files

* make fixup

* Add save-load test for tokenizers

* Add save-load test for tokenizers

* stash commit

* Try popping the file

* make fixup

* Pop the arg correctly

* Pop the arg correctly

* Add processor test

* Fix processor code

* stash commit

* Processor clobbers child tokenizer's chat template

* Processor clobbers child tokenizer's chat template

* make fixup

* Split processor/tokenizer files to avoid interactions

* fix test

* Expand processor tests

* Rename arg to "save_raw_chat_template" across all classes

* Update processor warning

* Move templates to single file

* Move templates to single file

* Improve testing for processor/tokenizer clashes

* Improve testing for processor/tokenizer clashes

* Extend saving test

* Test file priority correctly

* make fixup

* Don't pop the chat template file before the slow tokenizer gets a look

* Remove breakpoint

* make fixup

* Fix error
2024-11-26 14:18:04 +00:00
5a45617887 change apply_rotary_pos_emb of Glmmodel for GLM-Edge Series model (#34629)
* change apply_rotary_pos_emb

* upload for glm-edge

* remove useless part

* follow the suggestion

* fix

* format

* format

* test

* format again

* format again

* remove modular change

* remove modular change

* this apply_rotary_pos_emb need modify?

* fix with this

* format

* format

* ruff check

* modify modular_glm failed

* remove partial_rotary_factor of function  partial_rotary_factor

* fix wrong change of examples/research_projects

* revert

* remove line 118

* use q_rot
2024-11-26 15:05:42 +01:00
1141eff1bd Add Pytorch Tensor Parallel support for Mistral (#34927)
add base tp support
2024-11-26 14:28:07 +01:00
4d1d0f29a4 [Whisper] Fix whisper integration tests (#34111)
* fix test_tiny_timestamp_generation

* fix test_large_timestamp_generation

* fix test_whisper_shortform_single_batch_prev_cond

* fix test_whisper_shortform_multi_batch_hard_prev_cond

* return_timestamps necessary with long form

* fix test_default_multilingual_transcription_long_form

* fix test_tiny_token_timestamp_generation_longform

* fix test_whisper_longform_multi_batch_hard

* Update tests/models/whisper/test_modeling_whisper.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* fix typo

* do not expect special tokens

* fix test_whisper_longform_single_batch_beam

* fix test_whisper_longform_multi_batch_hard_prev_cond

* update test_whisper_longform_multi_batch_hard_prev_cond

* update test_whisper_longform_multi_batch_hard_prev_cond

* these tests does not make sense anymore

* this test does not make sense anymore

* make fixup

* suggested nits

* add test with forced_decoder_ids

* this test does not make sense anymore

* change assert for unittest test cases

* make fixup

* test with prompt_ids and task and language

* fix unittest test case call

* fix test_tiny_generation

* fix test_tiny_en_generation

* fix test_tiny_en_batched_generation

* fix test_tiny_longform_timestamps_generation

* fix test_tiny_timestamp_generation

* fix test_large_generation

* fix test_large_batched_generation

* fix test_large_generation_multilingual

* fix test_large_timestamp_generation

* fix test_large_timestamp_generation

* fix test_tiny_token_timestamp_generation_longform

* fix test_tiny_en_batched_generation

* make fixup

* [run-slow] whisper

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2024-11-26 12:23:08 +01:00
0e805e6d1e Skipping aqlm non working inference tests till fix merged (#34865) 2024-11-26 11:09:30 +01:00
73b4ab1085 VideoLLaVA: add default values (#34916)
add default values
2024-11-26 08:20:06 +01:00
bdb29ff9f3 Fix import structure for Fast Image processors (#34859)
* Fix import structure image_processor_fast

* update to new inits
2024-11-25 16:27:56 -05:00
bfc3556b20 making gpt2 fx traceable (#34633)
* making gpt2 fx tracable

* running make fix-copies

* Revert "running make fix-copies"

This reverts commit 5a3437cb5b63799243bceae7d21a2aed8d0418c7.
2024-11-25 19:30:38 +01:00
95c10fedb3 Updated documentation and added conversion utility (#34319)
* Updated documentation and added conversion utility

* Update docs/source/en/tiktoken.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tiktoken.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Moved util function to integration folder + allow for str

* Update formatting

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Updated formatting

* style changes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-11-25 18:44:09 +01:00
890ea7de93 Fix failling GGML test (#34871)
fix_test
2024-11-25 18:04:52 +01:00
b76a292bde Upgrade torch version to 2.5 in dockerfile for quantization CI (#34924)
* Upgrade Torch 2.5

* uncomment
2024-11-25 17:38:20 +01:00
a830df2909 Fix test_auto_backbone_timm_model_from_pretrained (#34877)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-25 17:20:41 +01:00
a464afbe2a fix static cache data type miss-match (#34799)
* fix gptj data type missmatch

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add low precision static cache tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix low-precision static cache tests

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* avoid config change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* change data type convert in cache copy

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* cast key value after k v out

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2024-11-25 16:59:38 +01:00
b13916c09d [AWQ, CI] Bump AWQ version used in docker image (#34922)
The old AWQ version is failing with the latest (unreleased)
transformers, giving the error:

> ImportError: cannot import name 'shard_checkpoint' from
'transformers.modeling_utils'

This has been resolved in awq v0.2.7:

https://github.com/casper-hansen/AutoAWQ/pull/644
2024-11-25 16:49:57 +01:00
4e6b19cd95 Fix : BitNet tests (#34895)
* fix_tests_bitnet

* fix format
2024-11-25 16:47:14 +01:00
9121ab8fe8 Rename OLMo November to OLMo2 (#34864)
* Rename/move OLMo Nov files to OLMo2

* Rename Olmo1124 and its variants to Olmo2
2024-11-25 16:31:22 +01:00
1de3598d30 Bump tornado from 6.4.1 to 6.4.2 in /examples/research_projects/lxmert (#34917)
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.4.1 to 6.4.2.
- [Changelog](https://github.com/tornadoweb/tornado/blob/v6.4.2/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.4.1...v6.4.2)

---
updated-dependencies:
- dependency-name: tornado
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-25 15:19:29 +00:00
f4c04ba32b Fix Qwen2 failing tests (#34819)
* fix: qwen2 model ids

* fix: line

* fix: more format

* update: reformat
2024-11-25 15:53:04 +01:00
11cc2295c7 [peft] Given that self.active_adapter is deprecated, avoid using it (#34804)
* Given that self.active_adapter is deprecated, avoid using it

* Remove misleading comment - `self.active_adapter` is not used (and deprecated)
2024-11-25 15:29:52 +01:00
74db22f905 Fix convert_tokens_to_string when decoder is None (#34569)
* Fix convert_tokens_to_string when decoder is None

* revert unrelated changs

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-11-25 14:35:24 +01:00
97514a8ba3 chore: fix some typos (#34891)
Signed-off-by: wanxiangchwng <cui.shuang@foxmail.com>
2024-11-25 13:05:59 +00:00
62ab94dea8 Bump tornado from 6.4.1 to 6.4.2 in /examples/research_projects/visual_bert (#34887)
Bump tornado in /examples/research_projects/visual_bert

Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.4.1 to 6.4.2.
- [Changelog](https://github.com/tornadoweb/tornado/blob/v6.4.2/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.4.1...v6.4.2)

---
updated-dependencies:
- dependency-name: tornado
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-25 12:54:55 +00:00
c50b5675d6 prepare_fa2_from_position_ids function bugfix (#33269)
contiguous() is called before view() for key and value within prepare_fa2_from_position_ids function
2024-11-25 13:51:26 +01:00
a0f4f3174f allow unused input parameters passthrough when chunking in asr pipelines (#33889)
* allow unused parameter passthrough when chunking in asr pipelines

* format code

* format

* run fixup

* update tests

* update parameters to pipline in test

* updates parametrs in tests

* change spelling in gitignore

* revert .gitignore to main

* add git ignore of devcontainer folder

* assert asr output follows expected inference output type

* run fixup

* Remove .devcontainer from .gitignore

* remove compliance check
2024-11-25 11:36:44 +01:00
4dc1a69349 Sum gathered input tokens (#34554)
* sum gathered input tokens

* ruff line-length is 119, format the code

---------

Co-authored-by: kangsheng <kangsheng@meituan.com>
2024-11-25 11:27:13 +01:00
1e492afd61 🔴 Mllama: fix base prefix (#34874)
fix base prefix
2024-11-25 11:20:20 +01:00
857d46ca0c [Deberta/Deberta-v2] Refactor code base to support compile, export, and fix LLM (#22105)
* some modification for roadmap

* revert some changes

* yups

* weird

* make it work

* sttling

* fix-copies

* fixup

* renaming

* more fix-copies

* move stuff around

* remove torch script warnings

* ignore copies

* revert bad changes

* woops

* just styling

* nit

* revert

* style fixup

* nits configuration style

* fixup

* nits

* will this fix the tf pt issue?

* style

* ???????

* update

* eval?

* update error message

* updates

* style

* grumble grumble

* update

* style

* nit

* skip torch fx tests that were failing

* style

* skip the failing tests

* skip another test and make style
2024-11-25 10:43:16 +01:00
098962dac2 BLIP: fix generation after hub update (#34876)
* fix blip generation

* dont remove it yet

* Update src/transformers/models/blip_2/modeling_blip_2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments

* modular

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-11-25 10:41:55 +01:00
c1a8520419 Cache: init empty cache when use_cache (#34274)
* fix

* fix tests

* fix copies

* add docs

* Revert "add docs"

This reverts commit 32d35634f12ba02781d2ebdee0c8dcfbe992a7b9.

* qwen move deltas

* mllama can potentiall fullgraph compile

* enable mllama compile and fix tests

* remove mllama fixes
2024-11-25 10:11:33 +01:00
1339a14dca Add safe_globals to resume training on PyTorch 2.6 (#34632)
Starting from version 2.4 PyTorch introduces a stricter check for the objects which
can be loaded with torch.load(). Starting from version 2.6 loading with weights_only=True
requires allowlisting of such objects.

This commit adds allowlist of some numpy objects used to load model checkpoints.
Usage is restricted by context manager. User can still additionally call
torch.serialization.add_safe_globals() to add other objects into the safe globals list.

Accelerate library also stepped into same problem and addressed it with PR-3036.

Fixes: #34631
See: https://github.com/pytorch/pytorch/pull/137602
See: https://pytorch.org/docs/stable/notes/serialization.html#torch.serialization.add_safe_globals
See: https://github.com/huggingface/accelerate/pull/3036

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2024-11-25 10:03:43 +01:00
318fe25f22 Fix: Enable prefill phase key value caching of nemotron/minitron models (#34742)
* modeling nemotron kv caching bugfix

Signed-off-by: jeongin601 <0200angela@gmail.com>

* test file deleted

Signed-off-by: jeongin601 <0200angela@gmail.com>

* code refinement

Signed-off-by: jeongin601 <0200angela@gmail.com>

* remove unused variables

Signed-off-by: jeongin601 <0200angela@gmail.com>

* import block sorted

* removed deprecation warning

Signed-off-by: jeongin601 <0200angela@gmail.com>

* removed support for tuple shape past_key_values

Signed-off-by: jeongin601 <0200angela@gmail.com>

* Update conditional statement for cache initialization

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Signed-off-by: jeongin601 <0200angela@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-11-25 09:45:35 +01:00
3a8eb74668 Fix support for image processors modifications in modular (#34866)
* add fix and examples

* fix camel case naming
2024-11-22 18:14:24 -05:00
54be2d7ae8 Bitnet test fix to avoid using gated model (#34863)
small test fix
2024-11-22 17:18:49 +01:00
286ffaaf0a [CI] Skip EETQ tests while package is broken with latest transformers (#34854)
* CI Skip EETQ tests while package is broken

EETQ tries to import the shard_checkpoint function from transformers but
the function has been removed. Therefore, trying to use EETQ currently
results in an import error. This fix results in EETQ tests being skipped
if there is an import error.

The issue has been reported to EETQ:

https://github.com/NetEase-FuXi/EETQ/issues/34

* Raise helpful error when trying to use eetq

* Forget to raise the error in else clause
2024-11-22 17:13:30 +01:00
861758e235 smol improvements to support more flexible usage (#34857)
* smol improvements to support more flexible usage

* ruff
2024-11-22 16:34:38 +01:00
42b36d7395 Speculative decoding: Test the target distribution (to prevent issues like #32867) (#34553)
* Update test_utils.py

* formatting

* Update test_utils.py

* formatting

* formatting

* Update test_utils.py

* formatting

* Update test_utils.py

* formatting

* format

* comments at standard positions
2024-11-22 16:02:37 +01:00
597efd21d2 Auto compile when static cache (#34247)
* generate with compile

* nits

* simple

* generate with compile

* nits

* simple

* safe

* style

* Update src/transformers/generation/utils.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* remove TOKENIZER forked warning

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2024-11-22 15:33:35 +01:00
d9e6f307e7 Remove quantization related config from dequantized model (#34856)
* Remove quantization related config from dequantized model

* Fix whitespace
2024-11-22 10:06:29 +01:00
1867be666d Update checks for torch.distributed.tensor to require torch >= 2.5 (#34816)
* Update checks for torch.distributed.tensor

* Update PR with feedback

* Formatting fix for import order

* Remove unused function
2024-11-22 10:05:26 +01:00
6a912ff2c5 Watermarking: fix order (#34849)
fix watermarking order
2024-11-22 08:25:14 +01:00
4e90b99ed9 Refactor StarCoder2 using modular (#34015)
* Create modular_starcoder2.py

* Update modular_starcoder2.py

* update

* finalize modular

* revert # no-unravel

* Add support

* style

* Update modular_model_converter.py

* update docstring
2024-11-21 14:52:39 +01:00
18871599c9 Fix heuristic scheduling for UAG (#34805)
* fix heuristic schedule

* fix style

* fix format
2024-11-21 14:46:35 +01:00
d6a5c23f71 Fix ds nvme (#34444)
* skip nested deepspeed.zero.Init call

* make fixup

* solve conflict

* solve conflict

* put back local

* use context mangers instead of local thread

* Skip recursive calls to deepspeed.zero.Init

* Skip recursive calls to deepspeed.zero.Init

* back to old notebooks

* make style
2024-11-21 13:52:22 +01:00
ae5cbf804b Improve gguf tensor processing (#34515)
* add tensor processing system to separate logic for models

* format refactoring

* small fix

* make some methods private

* move custom methods to processors

* refactor tensor processing

* format fix
2024-11-21 13:40:49 +01:00
c57eafdaa1 Add Nemotron GGUF Loading Support (#34725)
* Add Nemotron GGUF Loading Support

* fix the Nemotron architecture assignation

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-11-21 11:37:34 +01:00
d4e1acbb7c Change logging level from warning to info for max_steps overriding num_train_epochs (#34810)
Update trainer.py
2024-11-21 11:37:02 +01:00
28fb02fc05 VLMs: enable generation tests - last batch (#34484)
* add tests for 3 more vlms

* fix fuyu back

* skip test
2024-11-21 11:00:22 +01:00
40821a2478 Fix CI slack reporting issue (#34833)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-20 21:36:13 +01:00
3cb8676a91 Fix CI by tweaking torchao tests (#34832) 2024-11-20 20:28:51 +01:00
bf42c3bd4b Fix hyperparameter search when optuna+deepseed (#34642)
* Fix hyperparameter search when optuna+deepseed

* Adding free_memory to the search setup

---------

Co-authored-by: Corentin-Royer <corentin.royer@ibm.com>
2024-11-20 18:02:58 +01:00
67890de3b8 Torchao weights only + prequantized compability (#34355)
* weights only compability

* better tests from code review

* ping torch version

* add weights_only check
2024-11-20 17:24:45 +01:00
f297af55df Fix: take into account meta device (#34134)
* Do not load for meta device

* Make some minor improvements

* Add test

* Update tests/utils/test_modeling_utils.py

Update test parameters

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Make the test simpler

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-11-20 11:32:07 +01:00
8cadf76e1c fix(DPT,Depth-Anything) torch.export (#34103)
* Fix torch.export issue in dpt based models

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* Simplify the if statements

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* Move activation definitions of zoe_depth to init()

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* Add test_export for dpt and zoedepth

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* add depth anything

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* Remove zoedepth non-automated zoedepth changes and zoedepth test

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* [run_slow] dpt, depth_anything, zoedepth

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

---------

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>
2024-11-20 11:31:21 +01:00
9d16441e4f Fix the memory usage issue of logits in generate() (#34813) 2024-11-20 11:25:37 +01:00
9470d65324 Fix low memory beam search (#34746)
* fix

* higher max positions in tests
2024-11-20 07:46:35 +01:00
145fbd46cb LLaVA OV: fix unpadding precision (#34779)
* fix

* propagate

* type check
2024-11-20 07:46:13 +01:00
3033509327 Translate attention.md into Chinese (#34716)
* try

* tryagain

* tryagggain

* translated

* translated2

* Update docs/source/zh/attention.md

Co-authored-by: Huazhong Ji <hzji210@gmail.com>

---------

Co-authored-by: Huazhong Ji <hzji210@gmail.com>
2024-11-19 10:03:12 -08:00
befbbf2f98 Added image-text-to-text pipeline to task guide (#34783)
* Added image-text-to-text pipeline to task guide

* Update docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Merge codeblocks

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-11-19 09:49:10 -08:00
469eddbe2d Fix check_training_gradient_checkpointing (#34806)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-19 17:48:34 +01:00
05ebe8b9b0 Run test_medium_seamless_m4t_pt in subprocess to avoid many failures (#34812)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-19 17:32:10 +01:00
eedc113914 Add Image Processor Fast Deformable DETR (#34353)
* add deformable detr image processor fast

* add fast processor to doc

* fix copies

* nit docstring

* Add tests gpu/cpu and fix docstrings

* fix docstring

* import changes from detr

* fix imports

* rebase and fix

* fix input data format change in detr and rtdetr fast
2024-11-19 11:18:58 -05:00
b99ca4d28b Add support for OpenAI api "image_url" input in chat for image-text-to-text pipeline (#34562)
* add support for openai api image_url input

* change continue to elif

* Explicitely add support for OpenAI/TGI chat format

* rewrite content to transformers chat format and add tests

* Add support for typing of image type in chat templates

* add base64 to possible image types

* refactor nesting
2024-11-19 11:08:37 -05:00
15dd625a0f Bump aiohttp from 3.10.2 to 3.10.11 in /examples/research_projects/decision_transformer (#34792)
Bump aiohttp in /examples/research_projects/decision_transformer

Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.10.2 to 3.10.11.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](https://github.com/aio-libs/aiohttp/compare/v3.10.2...v3.10.11)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-19 16:08:07 +00:00
dc42330388 fix crash in tiiuae/falcon-11B-vlm image-to-text generation (#34728)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2024-11-19 16:51:32 +01:00
427b62ed1a Fix post process function called in the instance segmentation example of mask2former (#34588)
* Fix post process function called in the instance segmentation example of mask2former

* fix description and additional notes for post_process_instance_segmentation of maskformers

* remove white space in maskformers post_process_instance_segmentation doc

* change image.size[::-1] to height and width for clarity in segmentation examples
2024-11-19 16:49:25 +01:00
jp
fdb9230485 Add do_convert_rgb to vit (#34523)
* Add: do_convert_rgb

* Add: doc string

* Update src/transformers/models/vit/image_processing_vit.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/vit/image_processing_vit.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/vit/image_processing_vit.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Add: do_convert_rgb to fast

* Add: convert_to_rgb

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2024-11-19 16:48:05 +01:00
7b9e51c1a0 Feature: print tokens per second during training (#34507)
* Log tokens per second during training

* Nitpicks

* Move logic into _maybe_log_save_evaluate

* Use speed_metrics
2024-11-19 16:46:04 +01:00
5fa4f64605 🚨🚨🚨 fix(Mask2Former): torch export 🚨🚨🚨 (#34393)
* fix(Mask2Former): torch export

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* revert level_start_index and create a level_start_index_list

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* Add a comment to explain the level_start_index_list

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* Address comment

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* add torch.export.export test

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* rename arg

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* remove spatial_shapes

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* Use the version check from pytorch_utils

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* [run_slow] mask2former

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

---------

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>
2024-11-19 16:44:53 +01:00
581524389a MLU devices : Checks if mlu is available via an cndev-based check which won't trigger the drivers and leave mlu (#34326)
* add Cambricon MLUs support

* fix mlu device rng state

* up for quality check

* up mlu to support fp16

* fix mlu device dependency error

* fix mlu device dependency error

* enable mlu device for bf16

* fix mlu device memory tracker

* Cambricon support SDPA and flash_attn

* MLU devices : Checks if `mlu` is available via an `cndev-based` check which won't trigger the drivers and leave mlu
2024-11-19 16:37:39 +01:00
e3a5889ef0 Modular fix (#34802)
* Modular fix

* style

* remove logger warning

* Update modular_model_converter.py
2024-11-19 16:08:57 +01:00
ce1d328e3b Fix cache_utils for optimum.quanto kvcache quantization (#34750)
* add co-author

Co-authored-by: w3rew <w3rew@users.noreply.github.com>

* fix docs

* fix cache

* remove print

---------

Co-authored-by: w3rew <w3rew@users.noreply.github.com>
2024-11-19 14:16:34 +01:00
4bff54f921 Gemma capping (#34282)
* softcapping

* soft cap before the mask

* style

* ...

* super nit

* update

* fixes

* update

* small issue with modular

* fix modular imports

* update

* fixup

* simplify a hell lot

* simplify cleaning imports

* finish fixing

* update our design

* nits

* use a deprecation cycle

* updates

* Fix modular (recursive deps need to always be computed after merges!)

* push

* fix

* update

* fix modular order

* make fix-copies

* updates

* update

* ?

* don't compile for now

* ?

* fix some stuff

* donc!

* fix copies

* update

* fixup

* ?

* fix two tests

* fix?

* for now, don't use head info

* eager when output attentoin and sdpa or flash as it's the simplest behaviour (for our tests as well :))

* fix-copies

* revert sdpa check

* Apply suggestions from code review

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* rebase, fix-copies and push

* add a slow integration test

* update the test

* fix left padding issue

* fix test

* remove duplicate scaling

* quality

* add a small test and make sure it works

* 2b

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2024-11-19 13:52:38 +01:00
54739a320e Self-speculation (Layer-Skip Llama) (#34240)
* 😅

* early exit (#34244)

* mvp

* docs and tests

* a few fixes

* no shared cache

* Apply suggestions from code review

Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>

* docs

* make fix-copies

* cohere fix

* [test all]

* [test all] consistent model code copies

* [test all] make fix-copies :D

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>

* Update src/transformers/generation/candidate_generator.py

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* [test all] don't use a stand-alone attribute; fix test

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Mostafa Elhoushi <m.elhoushi@ieee.org>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-11-19 12:20:07 +00:00
5de58d5955 fix cpu bnb path (#34647)
* fix cpu bnb path

* Update src/transformers/generation/utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix awq quantizer env check

* fix awq quantizer device check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-11-19 12:44:44 +01:00
jp
3cd78be34e Fix: siglip image processor rgb_convert is not being applied correctly. (#34301)
Fix: do_convert_rgb
2024-11-19 12:40:36 +01:00
0db91c3c8d Support gradient checkpointing in Qwen2VL ViT (#34724)
* Support gradient checkpointing in Qwen2VL ViT

* Enable gradient checkpoint tests for Qwen2VL

* [run-slow] qwen2_vl
2024-11-19 12:30:44 +01:00
1a0cd69435 feat: allow to use hf-hub models for timm backbone (#34729)
Currently a backbone name like 'hf-hub:bioptimus/H-optimus-0' throws an
error, even though it could work.

Co-authored-by: Christian Gebbe <>
2024-11-19 10:26:35 +00:00
d8a5d31d9c Trainer hyperparameter search kwargs docs update (#34459)
* doc: Trainer.hyperparameter_search docstring discrepancy solved

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-11-19 11:23:03 +01:00
dadb286f06 protect tensor parallel usage (#34800)
protect
2024-11-19 09:54:11 +01:00
eed11f34ab Fix Whisper CI (#34617)
* Revert "Revert "Fix Whisper CI" (#34605)"

This reverts commit 74d3824cc0725829e7d92e1d43b97be1f18454f8.

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-11-18 21:37:50 +01:00
759a378ee5 Allow handling files as args for a tool created with Tool.from_space (#34687)
* Allow handling files as args for a tool created with `Tool.from_space`
2024-11-18 20:15:35 +01:00
20142ab542 Simplify Tensor Parallel implementation with PyTorch TP (#34184)
* Simplify Tensor Parallel implementation with PyTorch TP

* Move tp_plan to config

* Lint

* Format and warning

* Disable copy-from check

* Conditionally get attr from config

* make fix-copies

* Move base_model_tp_plan to PretrainedConfig

* Move TP into from_pretrained

* Add device context for load

* Do not serialize

* Move _tp_plan setting to post_init

* Add has_tp_plan

* Add test_tp

* Add 'Multi-gpu inference' doc

* Add backward support for device type identification

* Auto-detect accelerator

* supports_tp_plan

* copyright year

* Fix copy
2024-11-18 19:51:49 +01:00
7df93d6ffb fix: Wrong task mentioned in docs (#34757) 2024-11-18 18:42:28 +00:00
7693b62268 Fix callback key name (#34762)
Fixes typo.
2024-11-18 18:41:12 +00:00
1ef6c5f1c5 fix: Update pixel_values parameter in hf_model input (#34782) 2024-11-18 18:40:01 +00:00
e80a65ba4f [tests] add XPU part to testing (#34778)
add XPU part to testing

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2024-11-18 09:59:11 -08:00
9568a9dfc5 [docs] add XPU besides CUDA, MPS etc. (#34777)
add XPU
2024-11-18 09:58:50 -08:00
8568bf1bcf [docs] make empty_cache device-agnostic (#34774)
make device-agnostic
2024-11-18 09:58:26 -08:00
36759f3312 make sure to disable gradients for integer tensor (#32943) 2024-11-18 16:49:37 +01:00
1c471fc307 Fix skip of test_training_gradient_checkpointing (#34723)
19d58d31f has introduced a context manager to manage subtests of
test_training_gradient_checkpointing. However, test body was not
moved under "with" statement. Thus, while tests are correctly
marked as skipped, test bodies were still executed. In some cases,
as with llama this caused attribute errors.

Fixes: #34722
Fixes: 19d58d31f ("Add MLLama (#33703)")

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2024-11-18 15:45:40 +01:00
c772d4d91e fix a typo bug where 'id2label' was incorrectly written as 'i2label' when reading config (#34637)
fix a bug where 'id2label' was incorrectly written as 'i2label' when reading the config from pretrained config
2024-11-18 14:41:48 +01:00
eb0ab3ed4b Fix broken link (#34618) 2024-11-18 14:13:26 +01:00
1646ffb4d1 VLMs: patch_size -> num_image_tokens in processing (#33424)
* use num additional tokens

* fix copies + docs

* another fix copies :)

* add docs

* move order for BC
2024-11-18 13:21:07 +01:00
3ee24e2208 Add OLMo November 2024 (#34551)
* Add model skeletion with transformers-cli add-new-model-like

* Convert config to modular, add rms_norm_eps, delete clip_qkv

* Convert model to modular, add RMSNorm

* Add flash attention with qk norm and no qkv clipping

* Add decoder layer with RMSNorm after attention/feedforward layers

* Add base and causal model

* Add converter improvements from OLMo repo

* Update weight loading in OLMo to HF converter

* Set correct default for rms_norm_eps

* Set correct pipeline_model_mapping in test

* Run make fixup

* Fix model type

* Re-run modular conversion

* Manually set config docs to fix build errors

* Convert olmo-1124 to olmo_1124 to fix flash attention docs errors

* Start updating tests

* Update tests

* Copy upstream test_eager_matches_sdpa_inference_1_bfloat16 changes to olmo_1124

* Rename input_layernorm and post_attention_layernorm to reflect their ops better

* Use correct tokenizer

* Remove test unsupported by GPT2 tokenizer

* Create GenerationConfig outside of from_pretrained call

* Use simpler init file structure

* Add explicit __all__ to support simplified init

* Make safetensor serialization the default

* Update OLMo November 2024 docs
2024-11-18 10:43:10 +01:00
13493215ab 🧼 remove v4.44 deprecations (#34245)
* remove v4.44 deprecations

* PR comments

* deprecations scheduled for v4.50

* hub version update

* make fiuxp

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-11-15 23:07:24 +01:00
8d50fda644 Remove FSDP wrapping from sub-models. (#34452)
* Remove FSDP wrapping from sub-models.

* solve conflict trainer.py

* make fixup

* add unit test for fsdp_auto_wrap_policy when using auto_find_batch_size

* put back extract_model_from_parallel

* use transformers unwrap_model
2024-11-15 23:00:03 +01:00
b0c0ba7b4d FSDP grad accum fix (#34645)
* add gradient accumulation steps tests for fsdp

* invert no_sync context to fix training for fsdp
2024-11-15 22:28:06 +01:00
52ea4aa589 add xpu path for awq (#34712)
* add xpu path for awq

* update readme
2024-11-15 15:45:24 +01:00
7b3d615bc2 fix(wandb): pass fake dataset to avoid exception in trainer (see #34455) (#34720) 2024-11-15 15:44:02 +01:00
f5dbfab7f3 Update llava.md (#34749)
LLava -> Llava
2024-11-15 15:39:57 +01:00
8ba3e1505e Retain newlines in chat template when continue_final_message=True (#34253)
* Retain newlines in chat template when

* Add try/except

* Add regression test

* Simplify test

* Apply suggestions from code review

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2024-11-15 14:27:04 +00:00
a3d69a8994 [docs] add xpu device check (#34684)
* add XPU path

* use accelerate API

* Update docs/source/en/tasks/semantic_segmentation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update more places with accelerate API

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-11-13 14:16:59 -08:00
68f8186a89 Fix example in EsmConfig docstring (#34653) 2024-11-13 13:55:58 -08:00
e7c36a9d57 [docs] Broken link in generation_strategies (#34717)
[docs] Broken link
2024-11-13 13:44:42 -08:00
be8748a53c 🌐 [i18n-KO] Translated marian.md to Korean (#34698)
* initial translation

* removed english

* Fixed Trivial Typos, updated _toctree.yml
2024-11-13 13:14:23 -08:00
33eef99250 Agents: Small fixes in streaming to gradio + add tests (#34549)
* Better support transformers.agents in gradio: small fixes and additional tests
2024-11-11 20:52:09 +01:00
6de2a4d1f1 [i18n-ar] Translated file : docs/source/ar/torchscript.md into Arabic (#33079)
* Add docs/source/ar/torchscript.md to Add_docs_source_ar_torchscript.md

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/torchscript.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Merge troubleshooting.md with this Branch

* Update _toctree.yml

* Update torchscript.md

* Update troubleshooting.md

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2024-11-11 10:41:01 -08:00
25f510a9c6 [docs] update not-working model revision (#34682)
update revision
2024-11-11 07:09:31 -08:00
3ea3ab62d8 Agents: turn any Space into a Tool with Tool.from_space() (#34561)
* Agents: you can now load a Space as a tool
2024-11-10 12:22:40 +01:00
134ba90da9 Update llm_engine.py (#33332)
* Update llm_engine.py
- Added support for optional token and max_tokens parameters in the constructor.
- Provided usage examples and detailed documentation for each method.
2024-11-10 12:19:20 +01:00
768f3c016e [i18n-ar] Translated file : docs/source/ar/trainer.md into Arabic (#33080)
* Add docs/source/ar/trainer.md to Add_docs_source_ar_trainer.md

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update trainer.md

* Update trainer.md

* Update trainer.md

* Create _toctree.yml

* Delete docs/source/ar/_toctree.yml

* Update _toctree.yml - add trainer

* Update _toctree.yml

* merge serialization.md into this branch

* merge sagemaker.md into this PR

* Update _toctree.yml

* Update docs/source/ar/trainer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ar/trainer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-11-09 11:26:28 -08:00
a06a0d1263 🌐 [i18n-KO] Translated bert.md to Korean (#34627)
* Translated bert.md, Need additional check

* Translation 2nd ver, changed _toctree.yml

* Fixed Typo

* Update bert.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update bert.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update bert.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update bert.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-11-07 18:56:09 -08:00
1cf17077bf 🌐 [i18n-KO] Translated timesformer.md to Korean (#33972)
* docs: ko: model_doc/timesformer.md

* feat: nmt draft

* fix: manual edits

* fix_toctree

* fix toctree on Video Models
2024-11-07 11:04:27 -08:00
6938524a28 fix(dvclive): pass fake dataset to avoid exception in trainer init (#34455)
fix(dvclive): pass fake dataset to avoid exception in trainer
2024-11-07 15:57:34 +01:00
7bbc624743 🌐 [i18n-KO] Translated convbert.md to Korean (#34599)
* docs: ko: convbert.md

* Update _toctree.yml

* feat: nmt draft
2024-11-05 09:32:17 -08:00
e83aaaa86b Fix use_parallel_residual and qkv_bias for StableLM GGUF config extraction (#34450)
* fix stablelm qkv_bias

* fix stablelm qkv_bias and use_parallel_residual

* remove original_model.config for stablelm gguf test
2024-11-05 18:26:20 +01:00
9f28d0c5d0 Fix torchvision interpolation CI (#34539)
fix-torch-interpolation-ci
2024-11-05 11:02:14 -05:00
d2bae7ee9d Changing __repr__ in torchao to show quantized Linear (#34202)
* Changing __repr__ in torchao

* small update

* make style

* small update

* add LinearActivationQuantizedTensor

* remove some cases

* update imports & handle return None

* update
2024-11-05 16:11:02 +01:00
f2d5dfbab2 Remove @slow for test_eager_matches_sdpa_inference (#34558)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-05 16:10:42 +01:00
082e57e0d4 Fix #34494 assistant tokens when truncated (#34531)
* Fix assistant tokens when truncated

* fix test

* fix test

* step
2024-11-05 15:10:15 +00:00
74d3824cc0 Revert "Fix Whisper CI" (#34605)
Revert "Fix Whisper CI (#34541)"

This reverts commit eb811449a2389e48930c45f84c88fd041735cf92.
2024-11-05 15:12:47 +01:00
45b0c7680c Remove unused test_dataset (#34516) 2024-11-05 14:01:25 +00:00
663c851239 DistilBERT is ExecuTorch compatible (#34475)
* DistillBERT is ExecuTorch compatible

* [run_slow] distilbert

* [run_slow] distilbert

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
2024-11-05 13:41:48 +01:00
893ad04fad Load sub-configs from composite configs (#34410)
* save/load sub-configs

* nit forgot these

* fix copies

* move test to common

* use dict for sub-configs

* add load-save-laod test

* clean up modeling check

* oops this are correct keys

* fix some tests, missed some composite configs

* this model was missed
2024-11-05 11:34:01 +01:00
5e1fd4e204 FIX: Broken repr of TorchAoConfig (#34560)
FIX Broken repr of TorchAoConfig

The __repr__ method references a non-existent self.kwargs. This is now
fixed.

There does not appear to be a uniform way of defining __repr__ for
quantization configs. I copied the method as implemented for HQQ:

e2ac16b28a/src/transformers/utils/quantization_config.py (L285-L287)
2024-11-05 10:26:13 +01:00
d0b1d8d888 Skip DeepSpeed ZeRO Stage 3 model initialization when bnb (#34395)
* Skip DeepSpeed ZeRO Stage 3 model initialization when it is intended to be quantized.

* Propagate the quantization state using a context manager

* make fixup
2024-11-05 10:06:07 +01:00
eb811449a2 Fix Whisper CI (#34541)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-11-04 21:35:37 +01:00
bfa021be05 fix TrainerState doc because num_input_tokens_seen is unused by defau… (#34593)
fix TrainerState doc because num_input_tokens_seen is unused by default config

Co-authored-by: kangsheng <kangsheng@meituan.com>
2024-11-04 09:42:20 -08:00
0a6795af12 🌐 [i18n-KO] Update README_ko.md (#33098)
* Update README_ko.md

Delete the blank paragraph in the language selection button and Edit to synchronize with the English version of README.md

* [i18n-KO] Update README_ko.md

* Additional edit for keep consistency with main [documentation](https://huggingface.co/docs/transformers/v4.44.2/ko/index). (메인 문서와 일관성 유지를 위한 수정)

* Update README_ko.md

Additional update.
* Change docs link to Korean translated page if it exists.

* Change doc link to korean translated if it exists.

Change the link of doc and delete a row 'migration' of the table Learn more[더 알아보기], since it does not exist in the main version of doc.

* modify a link of the main README.md

from
`https://huggingface.co/docs/transformers/index#supported-frameworks`

to
`https://huggingface.co/docs/transformers/index#supported-models-and-frameworks`

since the title of 'supported table' changed.

* [i18n-ko] edit links and sync with main `README.md`

* docs/change comment to Korean1

Change English comment to Korean

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* docs/change comment to Korean2

Change English comment to Korean

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* revise to original

to seperate `edit_README_ko_md` and `README.md`

* Synchronization with English documentation.

Synchronization with English documentation, and translated a line of comment from English to Korean.

---------

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
2024-11-04 09:42:07 -08:00
1112c54604 🌐 [i18n-KO] Translated perf_train_special.md to Korean (#34590)
* Translated to Ko, 1st version

* updated _toctree.yml
2024-11-04 09:41:44 -08:00
a86bd6f2d8 [i18n-HI] Translated TFLite page to Hindi (#34572)
* [i18n-HI] Translated TFLite page to Hindi

* [i18n-HI] Translated TFLite page to Hindi

* Update docs/source/hi/tflite.md

Co-authored-by: K.B.Dharun Krishna <kbdharunkrishna@gmail.com>

---------

Co-authored-by: K.B.Dharun Krishna <kbdharunkrishna@gmail.com>
2024-11-04 09:40:30 -08:00
48831b7d11 Add text support to the Trainer's TensorBoard integration (#34418)
* feat: add text support to TensorBoardCallback

* feat: ignore long strings in trainer progress

* docs: add docstring for max_str_len

* style: remove trailing whitespace

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-11-04 17:36:27 +01:00
34927b0f73 MPS: isin_mps_friendly can support 0D tensors (#34538)
* apply fix

* tested

* make fixup
2024-11-04 16:18:50 +00:00
187439c3fa VLM: special multimodal Tokenizer (#34461)
* kinda works

* update

* add tests

* update

* use special tokens in processors

* typo

* fix copies

* fix

* fix moshi after rebase

* update

* fix tests

* update

* Update docs/source/en/main_classes/tokenizer.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update docs

* test for load time adding tokens

* fix some more tests which are now fetched better

* one more fix

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-11-04 16:37:51 +01:00
ef976a7e18 Update trainer for easier handling of accumulate, compile fixes, and proper reporting (#34511)
* Update trainer for easier handling of accumulate + proper reporting

* test

* Fixup tests

* Full fix

* Fix style

* rm comment

* Fix tests

* Minimize test + remove py 311 check

* Unused import

* Forward contrib credits from discussions

* Fix reported metrics

* Refactor, good as it's going to get

* rm pad tok id check

* object detection and audio are being annoying

* Fin

* Fin x2

---------

Co-authored-by: Gyanateet Dutta <Ryukijano@users.noreply.github.com>
2024-11-04 07:47:34 -05:00
33868a057c [i18n-HI] Translated accelerate page to Hindi (#34443)
* [i18n-HI] Translated accelerate page to Hindi

* Update docs/source/hi/accelerate.md

Co-authored-by: K.B.Dharun Krishna <kbdharunkrishna@gmail.com>

* Update docs/source/hi/accelerate.md

Co-authored-by: K.B.Dharun Krishna <kbdharunkrishna@gmail.com>

* Update docs/source/hi/accelerate.md

Co-authored-by: K.B.Dharun Krishna <kbdharunkrishna@gmail.com>

* Update docs/source/hi/accelerate.md

Co-authored-by: K.B.Dharun Krishna <kbdharunkrishna@gmail.com>

---------

Co-authored-by: Kay <kay@Kays-MacBook-Pro.local>
Co-authored-by: K.B.Dharun Krishna <kbdharunkrishna@gmail.com>
2024-11-01 08:26:45 -07:00
e2ac16b28a Large modular logic refactoring (#34487)
* rework converter

* Update modular_model_converter.py

* Update modular_model_converter.py

* Update modular_model_converter.py

* Update modular_model_converter.py

* cleaning

* cleaning

* finalize imports

* imports

* Update modular_model_converter.py

* Better renaming to avoid visiting same file multiple times

* start converting files

* style

* address most comments

* style

* remove unused stuff in get_needed_imports

* style

* move class dependency functions outside class

* Move main functions outside class

* style

* Update modular_model_converter.py

* rename func

* add augmented dependencies

* Update modular_model_converter.py

* Add types_to_file_type + tweak annotation handling

* Allow assignment dependency mapping + fix regex

* style + update modular examples

* fix modular_roberta example (wrong redefinition of __init__)

* slightly correct order in which dependencies will appear

* style

* review comments

* Performance + better handling of dependencies when they are imported

* style

* Add advanced new classes capabilities

* style

* add forgotten check

* Update modeling_llava_next_video.py

* Add prority list ordering in check_conversion as well

* Update check_modular_conversion.py

* Update configuration_gemma.py
2024-11-01 10:13:51 +01:00
86701f2b6f 🔴 🔴 fix query_pre_attn_scalar different of num_heads in default gemma2 config (#34540)
* fix query_pre_attn_scalar different of num_heads in default config

* propagate modular changes

* fix copies

* fix modular copies

* fix copies?

* correct copies fix
2024-11-01 09:06:17 +01:00
4cc0813e28 BLIP: enable generation tests (#34174)
* blip2 tests

* instructblips

* copies

* fix slow tests

* fix

* uncomment this

* clean up after rebase

* should be model main input

* fix overwritten tests

* oops len should be multiple of frame number

* style

* fix some tests
2024-11-01 08:54:48 +01:00
6beb3f1691 Blip: get/set input embeddings correctly (#34152)
* set-get embeds

* add tests

* fix tests

* remove

* return dict True

* fix tests

* why did i remove this

* enabel torchscript tests
2024-11-01 08:39:39 +01:00
b53e44e847 [i18n-ar] Translated file : docs/source/ar/multilingual.md into Arabic (#33048)
* Add docs/source/ar/multilingual.md to Add_docs_source_ar_multilingual.md

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/multilingual.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update _toctree.yml

* Update _toctree.yml

* Add Translated files to branch for merg

* Update _toctree.yml

* Update _toctree.yml

* Update custom_models.md

* Update chat_templating.md

* Update docs/source/ar/create_a_model.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update create_a_model.md

* Update gguf.md

* Update gguf.md

* Update gguf.md

* Update gguf.md

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-31 16:10:09 -07:00
2801d7bcf6 update doc (#34478)
* update doc

* Update docs/source/en/perf_train_cpu.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* delete closing tip

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-31 15:59:23 -07:00
df8640cedb [CLIPSeg] Make interpolate_pos_encoding default to True (#34419)
* Remove interpolate_pos_encoding

* Make fixup

* Make interpolate_pos_encoding default to True

* Reuse existing interpolation

* Add integration test
2024-10-31 22:15:04 +01:00
203e27059b Add image text to text pipeline (#34170)
* Standardize image-text-to-text-models-output

add post_process_image_text_to_text to chameleon and cleanup

Fix legacy kwarg behavior and deprecation warning

add post_process_image_text_to_text to qwen2_vl and llava_onevision

Add post_process_image_text_to_text to idefics3, mllama, pixtral processor

* nit var name post_process_image_text_to_text udop

* nit fix deprecation warnings

* Add image-text-to-text pipeline

* add support for image url in chat template for pipeline

* Reformat to be fully compatible with chat templates

* Add tests chat template

* Fix imports and tests

* Add pipeline tag

* change logic handling of single prompt ans multiple images

* add pipeline mapping to models

* fix batched inference

* fix tests

* Add manual batching for preprocessing

* Fix outputs with nested images

* Add support for all common processing kwargs

* Add default padding when multiple text inputs (batch size>1)

* nit change version deprecation warning

* Add support for text only inference

* add chat_template warnings

* Add pipeline tests and add copied from post process function

* Fix batched pipeline tests

* nit

* Fix pipeline tests blip2

* remove unnecessary max_new_tokens

* revert processing kosmos2 and remove unnecessary max_new_tokens

* fix pipeline tests idefics

* Force try loading processor if pipeline supports it

* revert load_processor change

* hardcode loading only processor

* remove unnecessary try except

* skip imagetexttotext tests for kosmos2 as tiny model causes problems

* Make code clearer

* Address review comments

* remove preprocessing logic from pipeline

* fix fuyu

* add BC resize fuyu

* Move post_process_image_text_to_text to ProcessorMixin

* add guard in post_process

* fix zero shot object detection pipeline

* add support for generator input in pipeline

* nit

* change default image-text-to-text model to llava onevision

* fix owlv2 size dict

* Change legacy deprecation warning to only show when True
2024-10-31 15:48:11 -04:00
c443d8d536 Bug Fix for issue #34294 (#34295)
Update SiglipVisionEmbeddings.forward to cast input to correct dtype before embedding it.
2024-10-31 18:51:15 +01:00
114dd812dd make test_eager_matches_sdpa_inference less flaky (#34512)
* try

* try

* try

* try

* try

* try

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-31 18:34:00 +01:00
294c170ff9 feat: add benchmarks pg indexes (#34536)
* feat: add benchmarks pg indexes

* refactor: remove debug `df -h`
2024-10-31 17:41:06 +01:00
b5919e12f7 fix(DPT,Depth-Anything) Address expected_slice errors inside inference tests (#34518)
* fix(DPT,Depth-Anything) Address expected_slice errors inside inference tests

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>

* [run_slow] dpt, depth_anything

---------

Signed-off-by: Phillip Kuznetsov <philkuz@gimletlabs.ai>
2024-10-31 16:47:58 +01:00
4ca004eac6 Qwen2VL: skip base input_ids-inputs_embeds equivalence check (#34535)
it has complex inputs_embeds computation
2024-10-31 15:42:13 +00:00
ab98f0b0a1 avoid calling gc.collect and cuda.empty_cache (#34514)
* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-31 16:36:13 +01:00
dca93ca076 Fix step shifting when accumulate gradient (#33673)
* replace total_batched_samples with step while counting grad accum step

* remove unused variable

* simplify condition for update step

* fix format by ruff

* simplify update step condition using accelerator.sync_gradients

* simplify update condition using do_sync_step

* remove print for test

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-10-31 09:53:23 -04:00
jp
1b86772de5 Fix: img size mismatch caused by incorrect unpadding in LLaVA-Next (#34522)
Fix: unpadding img mismatch
2024-10-31 14:32:45 +01:00
f38531619d enable QA bf16 pipeline (#34483)
* enable QA bf16 pipeline

* add tests
2024-10-31 12:55:53 +00:00
405b562698 UPDATE Documentation for #TRANSLATING.md Documentation into Multiple Languages.(Changes made) (#34226)
* Update TRANSLATING.md

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update TRANSLATING.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-30 12:37:39 -07:00
48872fd6ae Add Image Processor Fast RT-DETR (#34354)
* add fast image processor rtdetr

* add gpu/cpu test and fix docstring

* remove prints

* add to doc

* nit docstring

* avoid iterating over images/annotations several times

* change torch typing

* Add image processor fast documentation
2024-10-30 13:49:47 -04:00
9f06fb0505 Fix super tiny extra space typo (#34440)
Update training_args.py
2024-10-30 16:55:16 +01:00
5251fe6271 Add GGUF for Mamba (#34200)
* add mamba architecture for gguf

* add logic for weights conversion, some fixes and refactoring

* add lm_head layers, unit test refactoring

* more fixes for tests

* remove lm_head creation

* remove unused comments
2024-10-30 16:52:17 +01:00
eab6c491d4 Use torch 2.5 in scheduled CI (#34465)
* torch 2.5

* try

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-30 14:54:10 +01:00
241d79026f fix pixtral processor (#34486)
* fix pixtral processor

* test out full length batches + remove undue ValueError

* fix up processing

* fix tests

* fix

* last fixup

* style

* [run-slow] pixtral

* [run-slow] pixtral

* fix config key

* skip torchscript tests

* [run-slow] pixtral

* add missing key

* [run-slow] pixtral

* fix docs

* [run-slow] pixtral

* fix wrong url for integration test

* [run-slow] pixtral

* pixtralVisionModel does not have a lm head

* [run-slow] pixtral
2024-10-30 14:17:20 +01:00
8a734ea2c3 Tests: move generate tests to the right mixin and delete redundant tests (#34464)
* tmp commit

* tmp commit

* cull overwrites of deleted tests

* typo

* more specific docstring

* make fixup

* parameterize at the top?

* correction

* more deletions :D

* tmp commit

* for VLMs too

* fix _check_outputs

* test nit

* make fixup

* fix another flaky

* test_generate_from_inputs_embeds -- handle missing attention mask
2024-10-30 10:59:08 +00:00
913330ca9f VLMs: fix number of image tokens (#34332)
* fix

* fix tests

* add tests

* style

* style

* fix qwen after rebase

* fix video llava
2024-10-30 10:21:37 +01:00
0f764a5af7 Mllama: update docs (#34334)
* update docs

* be more explicit

* use avaialble methods
2024-10-30 10:11:50 +01:00
25a9fc584a Fix format mistake in string repr of tokenizer objects (#34493)
* fix repr string format for tokenizer objects

The repr of tokenizer tokens looks confusing and just stupid, like this: `Tokenizer(...), added_tokens_decoder={1: ..., 2: ...}`. The dict that is the value of the added_tokens_decoder attribute is outside of the parentheses of the tokenizer object, whereas all other attributes are inside the parentheses like they should be.

This commit fixes this bug.

* cos: add newline before closing parenthesis of repr string
2024-10-30 10:03:41 +01:00
cd277618d4 Roberta is ExecuTorch compatible (#34425)
* Roberta is ExecuTorch compatible

* [run_slow] roberta

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-30 08:36:45 +00:00
9bee9ff5db Un-deprecate timeout arg in pipelines (#34382)
* Un-deprecate timeout

* Put "timeout" on the allowed list

* make fixup
2024-10-29 18:45:14 +00:00
e4449bb790 fix incorrect warning (#34416) 2024-10-29 14:08:42 -04:00
f55595b177 Fix performance in get_imports regexp (#34298)
* fix: Fix performance in get_imports regexp

* Minimize get_imports content regexp
2024-10-29 17:29:24 +00:00
4e2e8809ff Bump werkzeug from 3.0.3 to 3.0.6 in /examples/research_projects/decision_transformer (#34420)
Bump werkzeug in /examples/research_projects/decision_transformer

Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.3 to 3.0.6.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/werkzeug/compare/3.0.3...3.0.6)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-29 16:42:40 +00:00
e9ad460494 Adding optimizer_cls_and_kwargs to Trainer.__init__ (#34358)
* Adding `optimizer_cls_and_kwargs` to `Trainer.__init__`

* formatting

* make fix-copies docstring

* added more docs for optimizer_cls_and_kwargs

* add docs for Trainer(optimizer_cls_and_kwargs)

* reverting anchor names
2024-10-29 16:23:16 +01:00
f339042b0b Albert is ExecuTorch compatible (#34476)
Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-29 16:22:13 +01:00
34620e8f0a MobileBERT is ExecuTorch compatible (#34473)
Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-29 16:14:31 +01:00
56c45d5757 Bug fix for drop path decay rate in swin transformer (#34291)
* potential bug fix for drop path

* variable name change

* forgot to rename the variables

* back to original

* modify dpr properly

* check_copies auto fix

* corresponsing swin2 changes

* auto fix

* linting

* default value for drop_path_rate as 0.0

* Update src/transformers/models/glm/modeling_glm.py

* maskformer fix

* ruff format

* changes made to tf code as well

* lint

---------

Co-authored-by: abhijit deo <167164474+deo-abhijit@users.noreply.github.com>
2024-10-29 16:09:18 +01:00
0ab0a42651 fix-qwen2vl-no-position_ids (#33487) 2024-10-29 15:27:34 +01:00
8755dd26b7 manual head_dim for mixtral model (#34281) 2024-10-29 14:31:36 +01:00
5392f12e16 Bert is ExecuTorch compatible (#34424)
Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-29 14:30:02 +01:00
004530aa05 Fix regression loading dtype (#34409)
* fix regression

* add test for torchao

* expected output

* better fix
2024-10-29 11:41:04 +01:00
9e3d704e23 Fixes for Modular Converter on Windows (#34266)
* Separator in regex

* Standardize separator for relative path in auto generated message

* open() encoding

* Replace `\` on `os.path.abspath`

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-29 11:40:41 +01:00
626c610a4d Fix perplexity computation in perplexity.md (#34387)
fix average NLL in perplexity.md
2024-10-29 11:10:10 +01:00
439334c8fb Simplify running tests in a subprocess (#34213)
* check

* check

* check

* check

* add docstring

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-29 10:48:57 +01:00
a1835195d1 🚨🚨🚨 [SuperPoint] Fix keypoint coordinate output and add post processing (#33200)
* feat: Added int conversion and unwrapping

* test: added tests for post_process_keypoint_detection of SuperPointImageProcessor

* docs: changed docs to include post_process_keypoint_detection method and switched from opencv to matplotlib

* test: changed test to not depend on SuperPointModel forward

* test: added missing require_torch decorator

* docs: changed pyplot parameters for the keypoints to be more visible in the example

* tests: changed import torch location to make test_flax and test_tf

* Revert "tests: changed import torch location to make test_flax and test_tf"

This reverts commit 39b32a2f69500bc7af01715fc7beae2260549afe.

* tests: fixed import

* chore: applied suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* tests: fixed import

* tests: fixed import (bis)

* tests: fixed import (ter)

* feat: added choice of type for target_size and changed tests accordingly

* docs: updated code snippet to reflect the addition of target size type choice in post process method

* tests: fixed imports (...)

* tests: fixed imports (...)

* style: formatting file

* docs: fixed typo from image[0] to image.size[0]

* docs: added output image and fixed some tests

* Update docs/source/en/model_doc/superpoint.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: included SuperPointKeypointDescriptionOutput in TYPE_CHECKING if statement and changed tests results to reflect changes to SuperPoint from absolute keypoints coordinates to relative

* docs: changed SuperPoint's docs to print output instead of just accessing

* style: applied make style

* docs: added missing output type and precision in docstring of post_process_keypoint_detection

* perf: deleted loop to perform keypoint conversion in one statement

* fix: moved keypoint conversion at the end of model forward

* docs: changed SuperPointInterestPointDecoder to SuperPointKeypointDecoder class name and added relative (x, y) coordinates information to its method

* fix: changed type hint

* refactor: removed unnecessary brackets

* revert: SuperPointKeypointDecoder to SuperPointInterestPointDecoder

* Update docs/source/en/model_doc/superpoint.md

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2024-10-29 09:36:03 +00:00
655bec2da7 use a tinymodel to test generation config which aviod timeout (#34482)
* use a tinymodel to test generation config which aviod timeout

* remove tailing whitespace
2024-10-29 09:39:06 +01:00
63ca6d9771 Fix CI (#34458)
* fix

* fix mistral
2024-10-29 08:26:04 +01:00
808d6c50f8 Generation: fix test (#34369)
* fix test

* fix copies
2024-10-29 07:57:10 +01:00
fe76b60370 LLaVA: latency issues (#34460)
* fix llavas

* code style

* green ci
2024-10-29 07:54:51 +01:00
a769ed45e1 Add post_process_depth_estimation for GLPN (#34413)
* add depth postprocessing for GLPN

* remove previous temp fix for glpn tests

* Style changes for GLPN's `post_process_depth_estimation`

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* additional style fix

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-28 19:44:20 +01:00
6cc4a67b3d feat: run benchmarks on A100 (#34287) 2024-10-28 19:33:17 +01:00
d21dbd1520 enable average tokens across devices (#34373)
* enable average tokens across devices

* reduce earlier in case model needs it

* simplify if statement

* reformat code to make ruff happy

* add doc for argument: average_tokens_across_devices

* cannot find world size when pytorch is unavailable

* format code

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-28 18:59:38 +01:00
a17f287ac0 [i18n-ar] Translated file : docs/source/ar/fast_tokenizers.md into Arabic (#33034)
* Add docs/source/ar/fast_tokenizers.md to Add_docs_source_ar_fast_tokenizers.md

* Update _toctree.yml

* Update _toctree.yml

* Update docs/source/ar/_toctree.yml

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

* Update docs/source/ar/fast_tokenizers.md

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>

---------

Co-authored-by: Abdullah Mohammed <554032+abodacs@users.noreply.github.com>
2024-10-28 10:54:37 -07:00
084e946cfd Apply linting to the important code blocks to make it readable (#34449)
Enhance user experience using py-linting
2024-10-28 10:48:18 -07:00
1f7539c829 🌐 [i18n-KO] Translated model_doc/barthez.md to Korean (#33980)
* docs: ko: model_doc/barthez.md

* feat: nmt draft

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-28 10:46:49 -07:00
fc1ae7f30f [docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details (#34322)
* [docs] update input documentation for MAMBA2 and MISTRAL models to include cache_position and attention_mask details

* [docs] correct input documentation for MISTRAL model to reference `input_ids` instead of `decoder_input_ids`

* [docs] clarify cache_position description in MISTRAL model documentation
2024-10-28 09:14:07 -07:00
c1753436db New option called "best" for args.save_strategy. (#31817)
* Add _determine_best_metric and new saving logic.

1. Logic to determine the best logic was separated out from
`_save_checkpoint`.
2. In `_maybe_log_save_evaluate`, whether or not a new best metric was
achieved is determined after each evaluation, and if the save strategy
is "best' then the TrainerControl is updated accordingly.

* Added SaveStrategy.

Same as IntervalStrategy, but with a new attribute called BEST.

* IntervalStrategy -> SaveStrategy

* IntervalStratgy -> SaveStrategy for save_strat.

* Interval -> Save in docstring.

* Updated docstring for save_strategy.

* Added SaveStrategy and made according changes.

`save_strategy` previously followed `IntervalStrategy` but now follows
`SaveStrategy`.

Changes were made accordingly to the code and the docstring.

* Changes from `make fixup`.

* Removed redundant metrics argument.

* Added new test_save_best_checkpoint test.

1. Checks for both cases where `metric_for_best_model` is explicitly
provided and when it's not provided.
2. The first case should have two checkpoints saved, whereas the second
should have three saved.

* Changed should_training_end saving logic.

The Trainer saves a checkpoints at the end of training by default as
long as `save_strategy != SaveStrategy.NO`. This condition was modified
to include `SaveStrategy.BEST` because it would be counterintuitive that
we'd only want the best checkpoint to be saved but the last one is as
well.

* `args.metric_for_best_model` default to loss.

* Undo metric_for_best_model update.

* Remove checking metric_for_best_model.

* Added test cases for loss and no metric.

* Added error for metric and changed default best_metric.

* Removed unused import.

* `new_best_metric` -> `is_new_best_metric`

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Applied `is_new_best_metric` to all.

Changes were made for consistency and also to fix a potential bug.

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-10-28 16:02:22 +01:00
8b3b9b48fc exclude fsdp from delay_optimizer_creation (#34140)
* exclude fsdp from delay_optimizer_creation

* add test case for trainer: FSDP mode and fp8 as mixed precision

* rearrange imports

* ruff formatted

* adapt _init_fsdp to fp8

* use _init_fsdp only when resume_from_checkpoint

* In case of FDP, self.layer will be CheckpointWrapper which has no len() method

* delete _init_fsdp

* solve conflict

* fix conflict

* make fixup
2024-10-28 13:50:16 +01:00
92bcdff2ef Fix batch size handling in prediction_loop for DataLoaderShard (#34343)
* Fix batch size handling in prediction_loop for DataLoaderShard

Updated the prediction_loop method in the Trainer class to correctly handle batch size when using DataLoaderShard. This ensures that the batch size is retrieved from total_batch_size for distributed training scenarios, preventing TypeError related to NoneType during evaluation.

* Update src/transformers/trainer.py

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Applied the fix to remove unused imports

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-10-28 13:23:52 +01:00
9360f1827d Tiny update after #34383 (#34404)
* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-28 12:01:05 +01:00
fc465bb196 pin tensorflow_probability<0.22 in docker files (#34381)
0.21

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-28 11:59:46 +01:00
fddbd3c13c Fix pix2struct (#34374)
* fix

* fix and test use_cache test

* style

* remove atol
2024-10-28 11:24:56 +01:00
1d06379331 [docs] Cache implementations (#34325)
cache
2024-10-25 08:52:45 -07:00
6a62a6d1b5 Fix typos in agents_advanced.md (#34405) 2024-10-25 08:52:29 -07:00
f73f5e62e2 Avoid check expected exception when it is on CUDA (#34408)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-25 17:14:07 +02:00
e447185b1f Fix bnb training test failure (#34414)
* Fix bnb training test: compatibility with OPTSdpaAttention
2024-10-25 10:23:20 -04:00
186b8dc190 Tests: upgrade test_eager_matches_sdpa_generate (#34386) 2024-10-25 11:55:07 +01:00
8814043c8c SynthID: better example (#34372)
* better example

* Update src/transformers/generation/configuration_utils.py

* Update src/transformers/generation/logits_process.py

* nits
2024-10-25 11:46:46 +01:00
223855314f no filter (#34391)
* no filter

* no filter

* no filter

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-25 12:32:39 +02:00
9f365fe0ac Fix right padding in LLaVA models (#34305)
* fix right pad llavas

* device mismatch
2024-10-25 11:02:07 +02:00
5779bac4c4 Fix onnx non-expotable inplace aten op (#34376)
* fix onnx non-expotable inplace op

* mistral, qwen2, qwen2_vl, starcoder2

* fixup copies
2024-10-25 09:44:09 +02:00
940a6bd343 Use non nested images and batched text Idefics2/3 (#34222)
* add support for non nested images and add tests

* add tests error scenario

* fix style

* added single and no image to error tests
2024-10-24 20:00:13 -04:00
3d99f1746e Fix glm (#34388)
* Fix duplicated

* fix import
2024-10-24 19:17:52 +02:00
a308d28d39 [auto. ping] Avoid sending empty info + add more team members (#34383)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-24 19:07:23 +02:00
4c6e0c9252 Correct the new defaults (#34377)
* Correct the new defaults

* CIs

* add check

* Update utils.py

* Update utils.py

* Add the max_length in generate test checking shape without passing length

* style

* CIs

* fix fx CI issue
2024-10-24 18:42:03 +02:00
1c5918d910 Fix torch.fx issue related to the new loss_kwargs keyword argument (#34380)
* Fix FX

* Unskip tests
2024-10-24 18:34:28 +02:00
d9989e0b9a [PEFT] Add warning for missing key in LoRA adapter (#34068)
When loading a LoRA adapter, so far, there was only a warning when there
were unexpected keys in the checkpoint. Now, there is also a warning
when there are missing keys.

This change is consistent with
https://github.com/huggingface/peft/pull/2118 in PEFT and the planned PR
https://github.com/huggingface/diffusers/pull/9622 in diffusers.

Apart from this change, the error message for unexpected keys was
slightly altered for consistency (it should be more readable now). Also,
besides adding a test for the missing keys warning, a test for
unexpected keys warning was also added, as it was missing so far.
2024-10-24 17:56:40 +02:00
fe35073319 Ignore unsupported kwarg in ProcessorMixin call (#34285)
Fix accept any common kwargs
2024-10-24 11:46:39 -04:00
e288616606 refactor: remove redundant if-condition and improve type correctness for convert_tokens_to_ids (#34030)
* chore: remove redundant if-condition

* fix: import `Iterable`
2024-10-24 17:40:26 +02:00
450b9cbfac Add code sample docstrings and checkpoint reference for GLM models (#34360)
* Add code sample docstrings and checkpoint reference for GLM models

* Update modular_glm.py

* Update modeling_glm.py
2024-10-24 17:28:51 +02:00
6432ad8bb5 Fix pil_torch_interpolation_mapping import in image_processing_detr_fast (#34375)
fix pil_torch_interpolation_mapping import
2024-10-24 09:22:50 -04:00
dd267fca72 Add T5 GGUF loading support (#33389)
* add: GGUFT5Converter

* add: tensormapping for t5

* add: test code for t5

* fix: Remove whitespace from blank line

* add: t5 fp16 tests

* fix: whitespace formatting

* fix: minor formatting

* fix: testing every weights
2024-10-24 15:10:59 +02:00
30c76d5b28 add code generation to natural language processing section (#34333) 2024-10-24 14:42:47 +02:00
2112027d0c Zamba is an LM (#34342)
* Zamba is an LM

* Addition
2024-10-24 14:29:33 +02:00
b29c24ff1e CI: fix failures (#34371)
fix
2024-10-24 13:44:53 +02:00
f0b3ef9e2e translated gguf.md into chinese (#34163)
* translated gguf.md into chinese

* Apply suggestions from code review

I have updated the PR accordingly.Thank you very much for detailed guidance,and I 'll pay more attention to the details next time.

Co-authored-by: Isotr0py <2037008807@qq.com>

* Apply suggestions from code review

Co-authored-by: Isotr0py <2037008807@qq.com>

---------

Co-authored-by: Isotr0py <2037008807@qq.com>
2024-10-24 11:47:58 +02:00
9643069465 v4.47.0.dev0 2024-10-24 11:23:29 +02:00
f0e640adfa Drop support for Python 3.8 (#34314)
* drop python 3.8

* update docker files

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-24 11:16:55 +02:00
05863817d6 Better defaults (#34026)
* be nice to our usres

* nit

* fixup

* default to -1

* oups

* turbo nit

* auto infer framework
2024-10-24 11:11:55 +02:00
65753d6065 Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned (#33932)
* fix: fixes for graph breaks

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: formatting

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: import error

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: Add Fa2Kwargs

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* Revert "PR changes"

This reverts commit 39d2868e5c93cc5f3f3c7c6ff981b66614c0e0e4.

* PR changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: FlashAttentionKwarg

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: FlashAttentionKwarg

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* PR Changes

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* addition of documentation

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* change in _flash_attention_forward

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* make fix-copies

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* revert make fix-copies

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix copies

* style

* loss kwargs typing

* style and pull latest changes

---------

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-10-24 11:02:54 +02:00
b0f0c61899 Add SynthID (watermerking by Google DeepMind) (#34350)
* Add SynthIDTextWatermarkLogitsProcessor

* esolving comments.

* Resolving comments.

* esolving commits,

* Improving SynthIDWatermark tests.

* switch to PT version

* detector as pretrained model + style

* update training + style

* rebase

* Update logits_process.py

* Improving SynthIDWatermark tests.

* Shift detector training to wikitext negatives and stabilize with lower learning rate.

* Clean up.

* in for 7B

* cleanup

* upport python 3.8.

* README and final cleanup.

* HF Hub upload and initiaze.

* Update requirements for synthid_text.

* Adding SynthIDTextWatermarkDetector.

* Detector testing.

* Documentation changes.

* Copyrights fix.

* Fix detector api.

* ironing out errors

* ironing out errors

* training checks

* make fixup and make fix-copies

* docstrings and add to docs

* copyright

* BC

* test docstrings

* move import

* protect type hints

* top level imports

* watermarking example

* direct imports

* tpr fpr meaning

* process_kwargs

* SynthIDTextWatermarkingConfig docstring

* assert -> exception

* example updates

* no immutable dict (cant be serialized)

* pack fn

* einsum equivalent

* import order

* fix test on gpu

* add detector example

---------

Co-authored-by: Sumedh Ghaisas <sumedhg@google.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: sumedhghaisas2 <138781311+sumedhghaisas2@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
2024-10-23 21:18:52 +01:00
e50bf61dec Fix red CI: benchmark script (#34351)
* dont'trigger always

* fux

* oups

* update

* ??

* ?

* aie
2024-10-23 18:33:52 +02:00
c42b3223db skip test_pipeline_depth_estimation temporarily (#34316)
skip

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-23 17:27:51 +02:00
d9f733625c Enable Gradient Accumulation fix across all models + trainer fully in forward() (#34283)
* Enable grad accum fix across all models + trainer fully in forward()

* handle peft case

* Account for DDP: need to run scale tests

* Use accelerator state

* Quality

* Guard

* Experiment w/ only fairseq fix

* Fairseq only

* Revert multiply_grads fix

* Mult by grad accum to fully bring back solution

* Style

* Good to go now

* Skip fx tests for now

* Bookmark

* Working now
2024-10-23 11:24:57 -04:00
1fb575fcf0 Support boolean tool args (#34208)
Support boolean tool arguments
2024-10-23 16:48:21 +02:00
343c8cb86f Added Deberta model type support (#34308)
* Added Deberta model type for 'add_prefix_space' functionality

* housekeeping

---------

Co-authored-by: Filippos Ventirozos <filippos.ventirozos@autotrader.co.uk>
2024-10-23 11:15:36 +02:00
5ba85de7a4 [docs] Fix Korean toctree (#34324)
fix
2024-10-23 10:52:51 +02:00
049682a5a6 Example doc for token classification of Llama and Dependent/Copied Models (#34139)
* Added Example Doc for token classification on all tokenClassificationModels copied from llama

* Refactor code to add code sample docstrings for Gemma and Gemma2 models (including modular Gemma)

* Refactor code to update model checkpoint names for Qwen2 models
2024-10-22 10:26:16 -07:00
644d5287b2 🌐 [i18n-KO] Translated model_doc/bartpho.md to Korean (#33981)
* docs: ko: model_doc/bartpho.md

* feat: nmt draft

* Update docs/source/ko/model_doc/bartpho.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-22 09:46:52 -07:00
b03dc0a87e 🌐 [i18n-KO] Translated bert japanese.md to Korean (#33890)
* docs: ko: bert-japanese.md

* Update _toctree.yml

* fix: manual edits

* Update docs/source/ko/_toctree.yml

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/_toctree.yml

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

---------

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-22 09:46:31 -07:00
4b14aa1bcd 🌐 [i18n-KO] Translated executorch.md to Korean (#33888)
* docs: ko: executorch.md

* Update _toctree.yml

* fix: manual edits

* Update docs/source/ko/main_classes/executorch.md

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* Update docs/source/ko/_toctree.yml

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/_toctree.yml

* Update docs/source/ko/_toctree.yml

* Update docs/source/ko/_toctree.yml

---------

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-22 09:46:20 -07:00
688eeac81e [docs] fix typo (#34235)
fix typo
2024-10-22 09:46:07 -07:00
a65a6ce7fe fix error in _get_eval_sampler when group_by_length enabled (#34237)
* remove self in _get_eval_sampler

* remove self in front of _get_eval_sampler
2024-10-22 18:02:42 +02:00
e7c3fa7f57 Fix continue_final_message for image-text-to-text chat templates (#34236)
* fix continue_final_message for vlms

* Add one test for vlms continue_final_message chat template
2024-10-22 11:57:44 -04:00
96f67c068b Feature: Add MLFLOW_MAX_LOG_PARAMS to MLflowCallback (#34279) 2024-10-22 16:34:17 +02:00
eef6b0ba42 Add option for running ffmpeg_microphone_live as a background process (#32838)
* Add option for running ffmpeg_microphone_live as a background process

* Code quality checks for audio_utils

* Code clean up for audio_utils

* Fixing logic in ffmpeg_microphone calls in audio_utils

* Allowing any arbitrary arguments to be passed to ffmpeg_microphone_live

* Formatting

* Fixing last problems with adding ffmpeg_additional_args

* Fixing default arguments and formatting issues

* Fixing comments for ffmpeg_additional_args

* Adding two shorts tests for ffmpeg_microphone_live

* Fixing test bug
2024-10-22 15:56:41 +02:00
c14ccbcd64 Olmo is ExecuTorch Compatible (#34181)
Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-22 15:53:01 +02:00
7a08a772cc Qwen2.5 is ExecuTorch Compatible (#34102)
Qwen2 is ExecuTorch Compatible

Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-22 15:52:23 +02:00
c31a6ff474 Add post_process_depth_estimation to image processors and support ZoeDepth's inference intricacies (#32550)
* add colorize_depth and matplotlib availability check

* add post_process_depth_estimation for zoedepth + tests

* add post_process_depth_estimation for DPT + tests

* add post_process_depth_estimation in DepthEstimationPipeline & special case for zoedepth

* run `make fixup`

* fix import related error on tests

* fix more import related errors on test

* forgot some `torch` calls in declerations

* remove `torch` call in zoedepth tests that caused error

* updated docs for depth estimation

* small fix for `colorize` input/output types

* remove `colorize_depth`, fix various names, remove matplotlib dependency

* fix formatting

* run fixup

* different images for test

* update examples in `forward` functions

* fixed broken links

* fix output types for docs

* possible format fix inside `<Tip>`

* Readability related updates

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Readability related update

* cleanup after merge

* refactor `post_process_depth_estimation` to return dict; simplify ZoeDepth's `post_process_depth_estimation`

* rewrite dict merging to support python 3.8

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2024-10-22 15:50:54 +02:00
104599d7a8 Fix: tensor of examples of the same length triggers invalid stacking (#34166)
* Fix issue where tensor of examples of the same length triggers invalid stacking

* Update data_collator.py
2024-10-22 15:49:21 +02:00
51e395d13e Fix FA2 attention for models supporting sliding window (#34093)
Fix FA2
2024-10-22 15:37:21 +02:00
eb6a734995 [RT-DETR] Fix onnx inference bug for Optype (Where) (#33877)
* feat: [RT-DETR] Add onnx runtime config and fix onnx inference bug Optype (Where)

* fix lint

* use dtype istead of torch.float32

* add doc

* remove onnx config

* use dtype info

* use tensor to fix lint
2024-10-22 15:14:07 +02:00
84b17e03f1 Update PR templates (#34065)
update PR template
2024-10-22 15:11:54 +02:00
681fc43713 Sync video classification pipeline with huggingface_hub spec (#34288)
* Sync video classification pipeline

* Add disclaimer
2024-10-22 13:33:49 +01:00
93352e81f5 Fix Korean doc _toctree.yml (#34293)
Fix korean doc _toctree.yml
2024-10-22 11:05:56 +02:00
b644178ed4 [docs] Fix GenerationConfig params (#34299)
fix generationconfigs
2024-10-22 11:03:25 +02:00
73d65e637b T5 compile compatibilty (#34089)
* this worked in normal generation, needs more tests

* fix almost all tests in t5

* nit

* longt5, umt5, mt5

* style

* udop, pix2struct

* more models

* fix some tests

* fix onnx tests

* tracing tests fixed

* compile enabled and tested for t5 models

* fix small bug in slow tests

* [run-slow] t5

* uncomment

* style

* update with new generation refactoring

* nit

* fix copies

* this is the fix, had to change t5 to fix copies

* update

* [run-slow] t5

* [run-slow] t5

* update

* add test for encoder only T5

* clean up after rebase

* fix pop2piano

* add comment

* style

* fix copies after rebase

* fix copies  missed this one
2024-10-22 08:23:53 +02:00
5077bc034f VLM: add more modularity (#34175)
* update

* fix tests + fix copies

* fix tests once more
2024-10-22 07:56:35 +02:00
21d5025826 Attn implementation for composite models (#32238)
* first try

* codestyle

* idefics2 is happy

* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma

* fix-copies

* [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo

* blip-2 needs to init vision from config

* when was this removed O_o

* minor fix

* tests

* this way?

* tests

* model-agnostic code

* codestyle

* add tests for idefics

* modify general test for VLMs

* no generation test for vlm yet!

* no generation test here also

* wanr in VIT-SDPA if output attn

* add more tests

* user can pass dict as attn impl

* repo consistency

* update

* muicgen

* no prints

* forgot speech enc-dec and clip

* how many composite models we have?

* musicgen meelody is same as mudicgen

* +siglip

* fix tests + add some more

* remove idefics custom overriden code

* make idefics2 automappable

* nits

* skip tests

* doctests

* Update src/transformers/models/idefics2/configuration_idefics2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/clip/test_modeling_clip.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/idefics2/test_modeling_idefics2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/idefics2/test_modeling_idefics2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/configuration_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* major update, no need for automap

* clean up

* add FA2 test

* more tests

* style

* skip tests

* why did these started failing now?

* no attributes for FA2 needed

* one tiny test

* address comment about FA2 false warning

* style

* add new models and resolve conflicts

* fix copies

* let it be this way for now, come back tomorrow to review

* some more fixes

* update

* more updates

* update

* fix copies

* style and tests

* another big update

* fix tests

* fix tests

* update

* another update

* fix tests

* fix copies

* fix tests

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-10-22 06:54:44 +02:00
32590b5ecb Fix method name which changes in tutorial (#34252)
The method `model_download_tool` was called `model_download_counter` earlier in the tutorial, this raises an error when following the code.
2024-10-21 14:21:52 -03:00
f701b98e4a Add a doc section on writing generation prompts (#34248)
Add a section on writing generation prompts
2024-10-21 14:35:57 +01:00
a4122813d1 Add DetrImageProcessorFast (#34063)
* add fully functionning image_processing_detr_fast

* Create tensors on the correct device

* fix copies

* fix doc

* add tests equivalence cpu gpu

* fix doc en

* add relative imports and copied from

* Fix copies and nit
2024-10-21 09:05:05 -04:00
24bdc94da5 Change Paligemma import logging to work with modular (#34211)
* change import logging

* fix CI
2024-10-21 08:55:27 -04:00
ca541bd4f4 Generation tests: don't rely on main input name (#34228)
* don't rely on main input name

* update
2024-10-21 10:00:14 +02:00
816f442496 Only cast logits to float when computing loss (#34147)
* Only cast logits to float when computing loss

Some misses from #31292 and #33902

* Move logits.float() into existing if labels is not None branch
2024-10-18 18:15:26 +02:00
e46e3bc173 Fix UDOP dtype issue (#34180)
* Trigger UDOP tests

* Try forcing dtype in LayoutLMV3

* Do checks to see where uint8 is getting in

* Do checks to see where uint8 is getting in

* Found it!

* Add .astype(np.float32)

* Remove forced check, make fixup

* Checking where exactly the uint8 creeps in

* More checking on the uint8 issues

* Manually upcast in rescale()

* Remove UDOP trigger
2024-10-18 16:54:58 +01:00
6604764007 add Glm (#33823)
* Create modular_glm.py

* Update modular_glm.py

* Finalize architecture without all attentions

* Add all attentions modules

* Finalize modular

* Update given last version

* Last update

* Finalize model

* Finalize converter

* Update convert_glm_weights_to_hf.py

* style

* style

* Create __init__.py

* Aff all inits

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Correct the rotary embeddings

* Remove apply_residual_connection_post_layernorm (always false)

* remove use_rms_norm (always true)

* remove past_layer_norm (always true)

* Update __init__.py

* Update config and license

* start adding tests and doc

* Add doc + style

* Update test_modeling_glm.py

* Add dummies

* Apply correct modeling

* Refactor attention to follow llama

* Update __init__.py

* Update convert_glm_weights_to_hf.py

* Correct bias

* remove linear_bias and pdrop (never used)

* apply modular

* Simplify converter

* remove dummies + style

* add model_input_names

* Add pretraining_tp to config for when eager attention is used

* Update modular to remove all pretraining_tp

* Update test_modeling_glm.py

* Update the __all__

* Update __all__

* Update __init__.py

* Update test_modeling_glm.py

* add revisions

* Add the correct repos and revisions

* style

* Update __init__.py

* update exports

* remove import of modular files

* style

* Apply Llama changes + refine converter

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* Update convert_glm_weights_to_hf.py

* style

* Use new modular converter

* add pretrainedmodel to init

* style

* Update test_modeling_glm.py

* Move config outside modular to please CI about docstrings

* Add dummies to please CI

* Update glm.md

* Update glm.md
2024-10-18 17:41:12 +02:00
e95ea479ee Informative 2 (#34154)
* Informative

* style

* Informative 2

* Apply suggestions from code review

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-10-18 14:12:15 +02:00
0437d6cd03 Fix broken test decorator require_torch_up_to_2_accelerators (#34201)
* fix broken require_torch_up_to_2_accelerators

* make style
2024-10-18 13:54:55 +02:00
5a5b590d06 BLIP: fix input expansion logic (#34225)
fix
2024-10-18 12:17:30 +02:00
b54109c746 Fix-red-ci (#34230)
* fix copies, skip fx for llama

* styke

* re-fix copies

* last?

* style
2024-10-17 23:38:35 +02:00
6ba31a8a94 Enable users to use their own loss functions + deal with prefetching for grad accum (#34198)
* bookmark

* Bookmark

* Bookmark

* Actually implement

* Pass in kwarg explicitly

* Adjust for if we do or don't have labels

* Bookmark fix for od

* bookmark

* Fin

* closer

* Negate accelerate grad accum div

* Fixup not training long enough

* Add in compute_loss to take full model output

* Document

* compute_loss -> compute_loss_fn

* Add a test

* Refactor

* Refactor

* Uncomment tests

* Update tests/trainer/test_trainer.py

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2024-10-17 17:01:56 -04:00
7a06d07e14 Support Llama 3.2 conversion (text models) (#33778)
* Support Llama 3.2 conversion (text models)

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Fix rope factor

* Update chat template

Initialize from a well-known template.
The guidance is that the changes should be applied to 3.1 models as
well.

* Remove import

* Support Llama Guard 3 conversion

* Tokenizer details

* Fix eos added token in base models

* Fix generation config for base models

* Specify revision for known tokenizers

* Style

* Reuse chat templates for older models

* Improve error when converting tokenizer < Llama 3

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
2024-10-17 22:37:37 +02:00
c1c7e89620 Fix Gradient Accumulation issue (#34191)
* quick fix

* 3 losses

* oups

* fix

* nits

* check how it scales for special models

* propagate for conditiona detr

* propagate

* propagate

* propagate

* fixes

* propagate changes

* update

* fixup

* nits

* f string

* fixes

* more fixes

* ?

* nit

* arg annoying f string

* nits

* grumble

* update

* nit

* refactor

* fix fetch tests

* nit

* nit

* Update src/transformers/loss/loss_utils.py

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* update

* nit

* fixup

* make pass

* nits

* port code to more models

* fixup

* ntis

* arf

* update

* update

* nits

* update

* fix

* update

* nits

* fine

* agjkfslga.jsdlkgjklas

* nits

* fix fx?

* update

* update

* styel

* fix imports

* update

* update

* fixup to fix the torch fx?

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2024-10-17 22:34:40 +02:00
f51ac9e059 Generate: visit non-llm prepare_inputs_for_generation (#34199)
* tmp

* all visited

* test all

* Update src/transformers/models/moshi/modeling_moshi.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* delete another one :D

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-17 16:53:48 +01:00
1d2c29f0b3 Fix bus error when using GPT2 on M1 macs (#34031)
There's a bug on M1 macs with transformer >= 4.43.0 and torch >= 2.1.0, where if a model has tied embeddings, then the fast loading from #31771 causes a bus error when the model is actually run. This can be solved by disabling `_supports_param_buffer_assignment` for these models.

More info in comments in #33357
2024-10-17 17:39:04 +02:00
9470c00042 Llama3 and Llama2 are ExecuTorch compatible (#34101)
Llama3_1b and Llama2_7b are ExecuTorch compatible

Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-17 17:33:19 +02:00
7f5088503f removes decord (#33987)
* removes decord dependency

optimize

np

Revert "optimize"

This reverts commit faa136b51ec4ec5858e5b0ae40eb7ef89a88b475.

helpers as documentation

pydoc

missing keys

* make fixup

* require_av

---------

Co-authored-by: ad <hi@arnaudiaz.com>
2024-10-17 17:27:34 +02:00
f2846ad2b7 Fix for tokenizer.apply_chat_template with continue_final_message=True (#34214)
* Strip final message

* Do full strip instead of rstrip

* Retrigger CI

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2024-10-17 15:45:07 +01:00
b57c7bce21 fix(Wav2Vec2ForCTC): torch export (#34023)
* fix(Wav2Vec2ForCTC): torch export

Resolves the issue described in #34022 by implementing the
masking of the hidden states using an elementwise multiplication
rather than indexing with assignment.

The torch.export functionality seems to mark the tensor as frozen
even though the update is legal.

This change is a workaround for now to allow the export of the
model as a FxGraph. Further investigation is required to find
the real solution in pytorch.

* [run-slow] hubert, unispeech, unispeech_sat, wav2vec2
2024-10-17 15:41:55 +01:00
fce1fcfe71 Ping team members for new failed tests in daily CI (#34171)
* ping

* fix

* fix

* fix

* remove runner

* update members

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-17 16:11:52 +02:00
aa3e35ac67 Fix warning message for fp32_cpu_offloading in bitsandbytes configs (#34079)
* change cpu offload warning for fp8 quantization

* change cpu offload warning for fp4 quantization

* change cpu offload variable name for fp8 and fp4 quantization
2024-10-17 15:11:33 +02:00
6d2b203339 Update trainer._get_eval_sampler() to support group_by_length arg (#33514)
Update 'trainer._get_eval_sampler()' to support 'group_by_length' argument

Trainer didn't support grouping by length for evaluation, which made evaluation slow with 'eval_batch_size'>1.

Updated 'trainer._get_eval_sampler()' method was based off of 'trainer._get_train_sampler()'.
2024-10-17 14:43:29 +02:00
3f06f95ebe Revert "Fix FSDP resume Initialization issue" (#34193)
Revert "Fix FSDP resume Initialization issue (#34032)"

This reverts commit 4de1bdbf637fe6411c104c62ab385f660bfb1064.
2024-10-16 15:25:18 -04:00
3a10c6192b Avoid using torch's Tensor or PIL's Image in chat template utils if not available (#34165)
* fix(utils): Avoid using torch Tensor or PIL Image if not available

* Trigger CI

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2024-10-16 16:01:18 +01:00
bd5dc10fd2 Fix wrong name for llava onevision and qwen2_vl in tokenization auto (#34177)
* nit fix wrong llava onevision name in tokenization auto

* add qwen2_vl and fix style
2024-10-16 16:48:52 +02:00
cc7d8b87e1 Revert accelerate error caused by 46d09af (#34197)
Revert `accelerate` bug
2024-10-16 16:13:41 +02:00
98bad9c6d6 [fix] fix token healing tests and usage errors (#33931)
* auto-gptq requirement is removed & model is changed & tokenizer pad token is assigned

* values func is changed with extensions & sequence key value bug is fixed

* map key value check is added in ExtensionsTree

* empty trimmed_ids bug is fixed

* tail_id IndexError is fixed

* empty trimmed_ids bug fix is updated for failed test

* too much specific case for specific tokenizer is removed

* input_ids check is updated

* require auto-gptq import is removed

* key error check is changed with empty list check

* empty input_ids check is added

* empty trimmed_ids fix is checked with numel function

* usage change comments are added

* test changes are commented

* comment style and quality bugs are fixed

* test comment style and quality bug is fixed
2024-10-16 14:22:55 +02:00
9ba021ea75 Moshi integration (#33624)
* clean mimi commit

* some nits suggestions from Arthur

* make fixup

* first moshi WIP

* converting weights working + configuration + generation configuration

* finalize converting script - still missing tokenizer and FE and processor

* fix saving model w/o default config

* working generation

* use GenerationMixin instead of inheriting

* add delay pattern mask

* fix right order: moshi codes then user codes

* unconditional inputs + generation config

* get rid of MoshiGenerationConfig

* blank user inputs

* update convert script:fix conversion, add  tokenizer, feature extractor and bf16

* add and correct Auto classes

* update modeling code, configuration and tests

* make fixup

* fix some copies

* WIP: add integration tests

* add dummy objects

* propose better readiblity and code organisation

* update tokenization tests

* update docstrigns, eval and modeling

* add .md

* make fixup

* add MoshiForConditionalGeneration to ignore Auto

* revert mimi changes

* re

* further fix

* Update moshi.md

* correct md formating

* move prepare causal mask to class

* fix copies

* fix depth decoder causal

* fix and correct some tests

* make style and update .md

* correct config checkpoitn

* Update tests/models/moshi/test_tokenization_moshi.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/moshi/test_tokenization_moshi.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* make style

* Update src/transformers/models/moshi/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

* change firm in copyrights

* udpate config with nested dict

* replace einsum

* make style

* change split to True

* add back splt=False

* remove tests in convert

* Update tests/models/moshi/test_modeling_moshi.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add default config repo + add model to FA2 docstrings

* remove logits float

* fix some tokenization tests and ignore some others

* make style tokenization tests

* update modeling with sliding window + update modeling tests

* [run-slow] moshi

* remove prepare for generation frol CausalLM

* isort

* remove copied from

* ignore offload tests

* update causal mask and prepare 4D mask aligned with recent changes

* further test refine + add back prepare_inputs_for_generation for depth decoder

* correct conditional use of prepare mask

* update slow integration tests

* fix multi-device forward

* remove previous solution to device_map

* save_load is flaky

* fix generate multi-devices

* fix device

* move tensor to int

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
2024-10-16 11:21:49 +02:00
d087165db0 IDEFICS: support inputs embeds (#34043)
* support embeds

* use cache from config

* style...

* fix tests after rebase
2024-10-16 09:25:26 +02:00
9d6998c759 🌐 [i18n-KO] Translated blip-2.md to Korean (#33516)
* docs: ko: model_doc/blip-2

* feat: nmt draft

* Apply suggestions from code review

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>

* Update docs/source/ko/model_doc/blip-2.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2024-10-15 11:21:22 -07:00
554ed5d1e0 🌐 [i18n-KO] Translated trainer_utils.md to Korean (#33817)
* docs: ko: trainer_utils.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2024-10-15 11:21:05 -07:00
8c33cf4eec 🌐 [i18n-KO] Translated gemma2.md to Korean (#33937)
* docs: ko: gemma2.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions
2024-10-15 11:20:46 -07:00
67acb0b123 🌐 [i18n-KO] Translated vivit.md to Korean (#33935)
* docs: ko: model_doc/vivit.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits
2024-10-15 10:31:44 -07:00
0f49deacbf [feat] LlavaNext add feature size check to avoid CUDA Runtime Error (#33608)
* [feat] add feature size check to avoid CUDA Runtime Error

* [minor] add error handling to all llava models

* [minor] avoid nested if else

* [minor] add error message to Qwen2-vl and chameleon

* [fix] token dimension for check

* [minor] add feature dim check for videos too

* [fix] dimension check

* [fix] test reference values

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2024-10-15 16:19:18 +02:00
d00f1ca860 Fix optuna ddp hp search (#34073) 2024-10-15 15:42:07 +02:00
65442718c4 Add support for inheritance from class with different suffix in modular (#34077)
* add support for different suffix in modular

* add dummy example, pull new changes for modular

* nide lines order change
2024-10-15 14:55:09 +02:00
d314ce70bf Generate: move logits to same device as input_ids (#34076)
tmp commit
2024-10-15 14:32:09 +02:00
5ee9e786d1 Fix default behaviour in TextClassificationPipeline for regression problem type (#34066)
* update code

* update docstrings

* update tests
2024-10-15 13:06:20 +01:00
4de1bdbf63 Fix FSDP resume Initialization issue (#34032)
* Fix FSDP Initialization for resume training

* Added init_fsdp function to work with dummy values

* Fix FSDP initialization for resuming training

* Added CUDA decorator for tests

* Added torch_gpu decorator to FSDP tests

* Fixup for failing code quality tests
2024-10-15 13:48:10 +02:00
293e6271c6 Add sdpa for Vivit (#33757)
* chore:add sdpa to vivit

* fix:failing slow test_inference_interpolate_pos_encoding(failing on main branch too)

* chore:fix nits

* ci:fix repo consistency failure

* chore:add info and benchmark to model doc

* [run_slow] vivit

* chore:revert interpolation test fix for new issue

* [run_slow] vivit

* [run_slow] vivit

* [run_slow] vivit

* chore:add fallback for output_attentions being True

* [run_slow] vivit

* style:make fixup

* [run_slow] vivit
2024-10-15 11:27:54 +02:00
23874f5948 Idefics: enable generation tests (#34062)
* add idefics

* conflicts after merging main

* enable tests but need to fix some

* fix tests

* no print

* fix/skip some slow tests

* continue not skip

* rebasing broken smth, this is the fix
2024-10-15 11:17:14 +02:00
dd4216b766 Update README.md with Enterprise Hub (#34150) 2024-10-15 10:45:22 +02:00
fa3f2db5c7 Add documentation for docker (#33156)
* initial commit

* nit
2024-10-14 11:58:45 +02:00
5114c9b9e9 Specify that users should be careful with their own files (#34153)
* Informative

* style
2024-10-14 11:40:39 +02:00
013d3ac2b5 Fixed error message in mllama (#34106) 2024-10-14 10:30:35 +02:00
cb5ca3265f Add GGUF for starcoder2 (#34094)
* add starcoder2 arch support for gguf

* fix q6 test
2024-10-14 10:22:49 +02:00
4c439173df Fix a typo (#34148)
Correct a typo

"If you want you tokenizer..."->"If you want your tokenizer...."
2024-10-14 10:15:25 +02:00
7434c0ed21 Mistral-related models for QnA (#34045)
* mistral qna start

* mixtral qna

* oops

* qwen2 qna

* qwen2moe qna

* add missing input embed methods

* add copied to all methods, can't directly from llama due to the prefix

* make top level copied from
2024-10-14 08:53:32 +02:00
37ea04013b Generate: Fix modern llm generate calls with synced_gpus (#34095) 2024-10-12 16:45:52 +01:00
617b21273a fix(ci): benchmarks dashboard was failing due to missing quotations (#34100) 2024-10-11 19:52:06 +02:00
144852fb6b refactor: benchmarks (#33896)
* refactor: benchmarks

Based on a discussion with @LysandreJik & @ArthurZucker, the goal of
this PR is to improve transformers' benchmark system.

This is a WIP, for the moment the infrastructure required to make things
work is not ready. Will update the PR description when it is the case.

* feat: add db init in benchmarks CI

* fix: pg_config is missing in runner

* fix: add psql to the runner

* fix: connect info from env vars + PR comments

* refactor: set database as env var

* fix: invalid working directory

* fix: `commit_msg` -> `commit_message`

* fix: git marking checked out repo as unsafe

* feat: add logging

* fix: invalid device

* feat: update grafana dashboard for prod grafana

* feat: add `commit_id` to header table

* feat: commit latest version of dashboard

* feat: move measurements into json field

* feat: remove drop table migration queries

* fix: `torch.arrange` -> `torch.arange`

* fix: add missing `s` to `cache_position` positional argument

* fix: change model

* revert: `cache_positions` -> `cache_position`

* fix: set device for `StaticCache`

* fix: set `StaticCache` dtype

* feat: limit max cache len

* fix script

* raise error on failure!

* not try catch

* try to skip generate compilation

* update

* update docker image!

* update

* update again!@

* update

* updates

* ???

* ??

* use `torch.cuda.synchronize()`

* fix json

* nits

* fix

* fixed!

* f**k

* feat: add TTNT panels

* feat: add try except

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-10-11 18:03:29 +02:00
80bee7b114 Avoid many test failures for LlavaNextVideoForConditionalGeneration (#34070)
* skip

* [run-slow] llava_next_video

* skip

* [run-slow] video_llava, llava_next_video

* skip

* [run-slow] llava_next_video

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-11 17:41:50 +02:00
37ac078535 Generate: move prepare_inputs_for_generation in encoder-decoder llms (#34048) 2024-10-11 16:11:18 +01:00
fd70464fa7 Fix flaky tests (#34069)
* fix mllama only

* allow image token index
2024-10-11 14:41:46 +01:00
3a24ba82ad Fix NaNs in cost_matrix for mask2former (#34074)
Fix NaNs in cost_matrix

Sometimes that happens :(
2024-10-11 15:35:55 +02:00
7b06473b8f avoid many failures for ImageGPT (#34071)
* skip

* [run-slow] imagegpt

* skip

* [run-slow] imagegpt

* [run-slow] imagegpt,video_llava

* skip

* [run-slow] imagegpt,video_llava

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-11 15:24:01 +02:00
1c66be8062 Fix PushToHubMixin when pusing to a PR revision (#34090) 2024-10-11 15:06:15 +02:00
409dd2d19c Fix failing conversion (#34010)
* Fix

* Tests

* Typo

* Typo
2024-10-11 14:59:23 +02:00
9dca0c9116 Fix DAC slow tests (#34088)
* Fix DAC slow tests and fix decode

* [run-slow] dac
2024-10-11 14:43:03 +02:00
f052e94bcc Fix flax failures (#33912)
* Few fixes here and there

* Remove typos

* Remove typos
2024-10-11 14:38:35 +02:00
e878eaa9fc Tests: upcast logits to float() (#34042)
upcast
2024-10-11 11:51:49 +01:00
4b9bfd32f0 Update SSH workflow file (#34084)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-11 10:53:12 +02:00
be9aeba581 Idefics: fix position ids (#33907)
* fix position ids

* fix labels also

* fix copies

* oops, not that one

* dont deprecate
2024-10-11 10:28:34 +02:00
7d97cca8dd Generate using exported model and enable gemma2-2b in ExecuTorch (#33707)
* Generate using exported model and enable gemma2-2b in ExecuTorch

* [run_slow] gemma, gemma2

* truncate expected output message

* Bump required torch version to support gemma2 export

* [run_slow] gemma, gemma2

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-11 10:16:31 +02:00
70b07d97cf Default synced_gpus to True when using FullyShardedDataParallel (#33483)
* Default synced_gpus to True when using FullyShardedDataParallel

Fixes #30228

Related:

* https://github.com/pytorch/pytorch/issues/100069
* https://github.com/pytorch/pytorch/issues/123962

Similar to DeepSpeed ZeRO Stage 3, when using FSDP with multiple GPUs and differently sized data per rank, the ranks reach different synchronization points at the same time, leading to deadlock

To avoid this, we can automatically set synced_gpus to True if we detect that a PreTrainedModel is being managed by FSDP using _is_fsdp_managed_module, which was added in 2.0.0 for torch.compile: https://github.com/pytorch/pytorch/blob/v2.0.0/torch/distributed/fsdp/_dynamo_utils.py

* Remove test file

* ruff formatting

* ruff format

* Update copyright year

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add test for FSDP-wrapped model generation

Before #33483, these tests would have hung for 10 minutes before crashing due to a timeout error

* Ruff format

* Move argparse import

* Remove barrier

I think this might cause more problems if one of the workers was killed

* Move import into function to decrease load time

https://github.com/huggingface/transformers/pull/33483#discussion_r1787972735

* Add test for accelerate and Trainer

https://github.com/huggingface/transformers/pull/33483#discussion_r1790309675

* Refactor imports

* Ruff format

* Use nullcontext

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-10 14:09:04 -04:00
24b82f3cd5 Small Fix to modular converter (#34051)
* small_fix

* supporting both src/tranformers and examples/

* make style
2024-10-10 18:43:27 +02:00
211f1d93db provide trust_remote_code for search feat extractor in model config (#34036) 2024-10-10 16:33:46 +01:00
8363fd8346 Update Blip2 is_pipeline_test_to_skip method signature (#34067)
Update method signature
2024-10-10 16:32:08 +01:00
e7dfb917f8 [TESTS] ASR pipeline (#33925)
* fix whisper translation

* correct slow_unfinished_sequence test

* make fixup
2024-10-10 17:31:22 +02:00
a37a06a20b Fix data_seed unused (#33731)
* fixing data_seed unused

* fix accelerate version needed

* fix style

* update the fix following accelerate fix
2024-10-10 15:28:00 +02:00
b2f09fb90f [Docs] Update compressed_tensors.md (#33961)
* Update compressed_tensors.md

Fix some unfinished sections

* Update docs/source/en/quantization/compressed_tensors.md

Co-authored-by: Xiao Yuan <yuanx749@gmail.com>

---------

Co-authored-by: Xiao Yuan <yuanx749@gmail.com>
2024-10-10 15:22:41 +02:00
4a3f1a686f check if eigenvalues of covariance matrix are complex. (#34037)
check if eigenvalues of covariance complex for psd checking
2024-10-10 14:44:05 +02:00
fb0c6b521d Universal Assisted Generation: Assisted generation with any assistant model (by Intel Labs) (#33383)
* Update candidate_generator.py

* Update utils.py

* add lookbehind params to _get_candidate_generator

* make fixup

* add unit tests

* fix failing tests

* add docstrings

* fix docstrings; remove non-optimized AnyTokenizer

* added any tokenizer generation correctness test

* make fixup

* fix assertion syntax

* PR review fixes

* address additional PR comments

* fix tests

* remove stropping criteria arg

* make fixup

* add AssistantConfig

* fix prev_tokens branching

* pass tokenizers through `generate()`kwargs

* fix lookbehind values; tokenizer params WIP

* fixup

* AssistantConfig

* remove AssistantConfig; apply PR suggestions

* restructure tests

* fixup

* fix assistant_tokenizer arg validation

* fixup

* fix tests in TestAssistedCandidateGeneratorDifferentTokenizers

* fix class docstring

* PR suggestions

* doc

* doc update and improvements to `_validate_assistant()`

---------

Co-authored-by: mosheber <moshe.berchansky@intel.com>
2024-10-10 14:41:53 +02:00
dda3f91d06 Specifying torch dtype in Qwen2VLForConditionalGeneration (#33953)
* Specifying torch dtype

* Reverting change & changing fallback _from_config() dtype
2024-10-10 14:39:33 +02:00
f8a260e2a4 Sync QuestionAnsweringPipeline (#34039)
* Sync QuestionAnsweringPipeline

* typo fixes

* Update deprecation warnings
2024-10-10 13:38:14 +01:00
c9afee5392 Add gguf support for gpt2 (#34044)
* add gpt2 gguf support

* add doc change

* small refactoring
2024-10-10 13:42:18 +02:00
66e08dba71 Fix pipelines tests (#34049)
* Fix wrong skip annotation

* Remove error raise
2024-10-10 12:04:06 +01:00
a84c413773 HfArgumentParser: allow for hyhenated field names in long-options (#33990)
Allow for hyphenated field names in long-options

argparse converts hyphens into underscores before assignment (e.g., an
option passed as `--long-option` will be stored under `long_option`), So
there is no need to pass options as literal attributes, as in
`--long_option` (with an underscore instead of a hyphen). This commit
ensures that this behavior is respected by `parse_args_into_dataclasses`
as well.

Issue: #33933

Co-authored-by: Daniel Marti <mrtidm@amazon.com>
2024-10-10 11:58:26 +02:00
adea67541a Phi3: fix attn for sliding window (#33586)
* fix phi3 attn fir sliding window

* fix tests

* address most comment

* style

* update after rebase

* add more models

* fix tests
2024-10-10 11:50:39 +02:00
a265600c60 add sdpa to OPT (#33298)
* add sdpa to OPT

* chore: remove redundant whitespace in OPTDecoder class

* fixup

* bug fix

* add sdpa and attention generate test

* fixup

* Refactor OPTAttention forward method for improved readability and maintainability

* undo refactor for _shape and key,val states

* add OPT to doc, fixup didn't find it for some reason

* change order

* change default attn_implemntation in testing to eager

* [run-slow] opt

* change test_eager_matches_sdpa_generate to the one llama

* Update default attention implementation in testing common

* [run-slow] opt

* remove uneeded print

* [run-slow] opt

* refactor model testers to have attn_implementation="eager"

* [run-slow] opt

* convert test_eager_matches_sdpa_generate to opt-350M

* bug fix when creating mask for opt

* [run-slow] opt

* if layer head mask default to eager

* if head mask is not none fall to eager

* [run-slow] opt

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Clean up Unpack imports (#33631)

clean up Unpack imports

* Fix DPT /Dinov2 sdpa regression on main (#33660)

* fallback to eager if output attentions.

* fix copies

* handle dependency errors in check_imports (#33622)

* handle dependency errors in check_imports

* change log level to warning

* add back self.max_position_embeddings = config.max_position_embeddings (#33550)

* add back self.max_position_embeddings = config.max_position_embeddings

* fix-copies

* Fix Llava conversion for LlavaQwen2ForCausalLM with Clip vision tower (#33613)

fix llavaqwen2 model conversion

* Uniformize kwargs for Udop processor and update docs (#33628)

* Add optional kwargs and uniformize udop

* cleanup Unpack

* nit Udop

* Generation: deprecate `PreTrainedModel` inheriting from `GenerationMixin`  (#33203)

* Enable BNB multi-backend support (#31098)

* enable cpu bnb path

* fix style

* fix code style

* fix 4 bit path

* Update src/transformers/utils/import_utils.py

Co-authored-by: Aarni Koskela <akx@iki.fi>

* add multi backend refactor tests

* fix style

* tweak 4bit quantizer + fix corresponding tests

* tweak 8bit quantizer + *try* fixing corresponding tests

* fix dequant bnb 8bit

* account for Intel CPU in variability of expected outputs

* enable cpu and xpu device map

* further tweaks to account for Intel CPU

* fix autocast to work with both cpu + cuda

* fix comments

* fix comments

* switch to testing_utils.torch_device

* allow for xpu in multi-gpu tests

* fix tests 4bit for CPU NF4

* fix bug with is_torch_xpu_available needing to be called as func

* avoid issue where test reports attr err due to other failure

* fix formatting

* fix typo from resolving of merge conflict

* polish based on last PR review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix CI

* Update src/transformers/integrations/integration_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/integrations/integration_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix error log

* fix error msg

* add \n in error log

* make quality

* rm bnb cuda restriction in doc

* cpu model don't need dispatch

* fix doc

* fix style

* check cuda avaliable in testing

* fix tests

* Update docs/source/en/model_doc/chameleon.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update tests/quantization/bnb/test_4bit.py

Co-authored-by: Aarni Koskela <akx@iki.fi>

* Update tests/quantization/bnb/test_4bit.py

Co-authored-by: Aarni Koskela <akx@iki.fi>

* fix doc

* fix check multibackends

* fix import sort

* remove check torch in bnb

* docs: update bitsandbytes references with multi-backend info

* docs: fix small mistakes in bnb paragraph

* run formatting

* reveret bnb check

* move bnb multi-backend check to import_utils

* Update src/transformers/utils/import_utils.py

Co-authored-by: Aarni Koskela <akx@iki.fi>

* fix bnb check

* minor fix for bnb

* check lib first

* fix code style

* Revert "run formatting"

This reverts commit ac108c6d6b34f45a5745a736ba57282405cfaa61.

* fix format

* give warning when bnb version is low and no cuda found]

* fix device assignment check to be multi-device capable

* address akx feedback on get_avlbl_dev fn

* revert partially, as we don't want the function that public, as docs would be too much (enforced)

---------

Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix error string after refactoring into get_chat_template (#33652)

* Fix error string after refactoring into get_chat_template

* Take suggestion from CR

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* uniformize git processor (#33668)

* uniformize git processor

* update doctring

* Modular `transformers`: modularity and inheritance for new model additions (#33248)

* update exampel

* update

* push the converted diff files for testing and ci

* correct one example

* fix class attributes and docstring

* nits

* oups

* fixed config!

* update

* nitd

* class attributes are not matched against the other, this is missing

* fixed overwriting self.xxx now onto the attributes I think

* partial fix, now order with docstring

* fix docstring order?

* more fixes

* update

* fix missing docstrings!

* examples don't all work yet

* fixup

* nit

* updated

* hick

* update

* delete

* update

* update

* update

* fix

* all default

* no local import

* fix more diff

* some fix related to "safe imports"

* push fixed

* add helper!

* style

* add a check

* all by default

* add the

* update

* FINALLY!

* nit

* fix config dependencies

* man that is it

* fix fix

* update diffs

* fix the last issue

* re-default to all

* alll the fixes

* nice

* fix properties vs setter

* fixup

* updates

* update dependencies

* make sure to install what needs to be installed

* fixup

* quick fix for now

* fix!

* fixup

* update

* update

* updates

* whitespaces

* nit

* fix

* simplify everything, and make it file agnostic (should work for image processors)

* style

* finish fixing all import issues

* fixup

* empty modeling should not be written!

* Add logic to find who depends on what

* update

* cleanup

* update

* update gemma to support positions

* some small nits

* this is the correct docstring for gemma2

* fix merging of docstrings

* update

* fixup

* update

* take doc into account

* styling

* update

* fix hidden activation

* more fixes

* final fixes!

* fixup

* fixup instruct  blip video

* update

* fix bugs

* align gemma2 with the rest as well

* updats

* revert

* update

* more reversiom

* grind

* more

* arf

* update

* order will matter

* finish del stuff

* update

* rename to modular

* fixup

* nits

* update makefile

* fixup

* update order of the checks!

* fix

* fix docstring that has a call inside

* fiix conversion check

* style

* add some initial documentation

* update

* update doc

* some fixup

* updates

* yups

* Mostly todo gimme a minut

* update

* fixup

* revert some stuff

* Review docs for the modular transformers (#33472)

Docs

* good update

* fixup

* mmm current updates lead to this code

* okay, this fixes it

* cool

* fixes

* update

* nit

* updates

* nits

* fix doc

* update

* revert bad changes

* update

* updates

* proper update

* update

* update?

* up

* update

* cool

* nits

* nits

* bon bon

* fix

* ?

* minimise changes

* update

* update

* update

* updates?

* fixed gemma2

* kind of a hack

* nits

* update

* remove `diffs` in favor of `modular`

* fix make fix copies

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Fix CIs post merging modular transformers (#33681)

update

* Fixed docstring for cohere model regarding unavailability of prune_he… (#33253)

* Fixed docstring for cohere model regarding unavailability of prune_head() methods

The docstring mentions that cohere model supports prune_heads() methods. I have fixed the docstring by explicitly mentioning that it doesn't support that functionality.

* Update src/transformers/models/cohere/modeling_cohere.py

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Generation tests: update imagegpt input name, remove unused functions (#33663)

* Improve Error Messaging for Flash Attention 2 on CPU (#33655)

Update flash-attn error message on CPU

Rebased to latest branch

* Gemma2: fix config initialization (`cache_implementation`) (#33684)

* Fix ByteLevel alphabet missing when Sequence pretokenizer is used (#33556)

* Fix ByteLevel alphabet missing when Sequence pretokenizer is used

* Fixed formatting with `ruff`.

* Uniformize kwargs for image-text-to-text processors (#32544)

* uniformize FUYU processor kwargs

* Uniformize instructblip processor kwargs

* Fix processor kwargs and tests Fuyu, InstructBlip, Kosmos2

* Uniformize llava_next processor

* Fix save_load test for processor with chat_template only as extra init args

* Fix import Unpack

* Fix Fuyu Processor import

* Fix FuyuProcessor import

* Fix FuyuProcessor

* Add defaults for specific kwargs kosmos2

* Fix Udop to return BatchFeature instead of BatchEncoding and uniformize kwargs

* Add tests processor Udop

* remove Copied from in processing Udop as change of input orders caused by BatchEncoding -> BatchFeature

* Fix overwrite tests kwargs processors

* Add warnings and BC for changes in processor inputs order, change docs, add BC for text_pair as arg for Udop

* Fix processing test fuyu

* remove unnecessary pad_token check in instructblip ProcessorTest

* Fix BC tests and cleanup

* FIx imports fuyu

* Uniformize Pix2Struct

* Fix wrong name for FuyuProcessorKwargs

* Fix slow tests reversed inputs align fuyu llava-next, change udop warning

* Fix wrong logging import udop

* Add check images text input order

* Fix copies

* change text pair handling when positional arg

* rebase on main, fix imports in test_processing_common

* remove optional args and udop uniformization from this PR

* fix failing tests

* remove unnecessary test, fix processing utils and test processing common

* cleanup Unpack

* cleanup

* fix conflict grounding dino

* 🚨🚨 Setting default behavior of assisted decoding (#33657)

* tests: fix pytorch tensor placement errors (#33485)

This commit fixes the following errors:
* Fix "expected all tensors to be on the same device" error
* Fix "can't convert device type tensor to numpy"

According to pytorch documentation torch.Tensor.numpy(force=False)
performs conversion only if tensor is on CPU (plus few other restrictions)
which is not the case. For our case we need force=True since we just
need a data and don't care about tensors coherency.

Fixes: #33517
See: https://pytorch.org/docs/2.4/generated/torch.Tensor.numpy.html

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

* bump tokenizers, fix added tokens fast (#32535)

* update based on tokenizers release

* update

* nits

* update

* revert re addition

* don't break that yet

* fmt

* revert unwanted

* update tokenizers version

* update dep table

* update

* update in conversion script as well

* some fix

* revert

* fully revert

* fix training

* remove set trace

* fixup

* update

* update

* [Pixtral] Improve docs, rename model (#33491)

* Improve docs, rename model

* Fix style

* Update repo id

* fix code quality after merge

* HFQuantizer implementation for compressed-tensors library (#31704)

* Add compressed-tensors HFQuantizer implementation

* flag serializable as False

* run

* revive lines deleted by ruff

* fixes to load+save from sparseml, edit config to quantization_config, and load back

* address satrat comment

* compressed_tensors to compressed-tensors and revert back is_serializable

* rename quant_method from sparseml to compressed-tensors

* tests

* edit tests

* clean up tests

* make style

* cleanup

* cleanup

* add test skip for when compressed tensors is not installed

* remove pydantic import + style

* delay torch import in test

* initial docs

* update main init for compressed tensors config

* make fix-copies

* docstring

* remove fill_docstring

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* review comments

* review comments

* comments - suppress warnings on state dict load, tests, fixes

* bug-fix - remove unnecessary call to apply quant lifecycle

* run_compressed compatability

* revert changes not needed for compression

* no longer need unexpected keys fn

* unexpected keys not needed either

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* add to_diff_dict

* update docs and expand testing

* Update _toctree.yml with compressed-tensors

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update doc

* add note about saving a loaded model

---------

Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>

* update model card for opt

* add batch size to inference table

* [slow-run] opt

* [run-slow] opt

---------

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: chengchengpei <5881383+chengchengpei@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Aarni Koskela <akx@iki.fi>
Co-authored-by: Titus von Koeller <9048635+Titus-von-Koeller@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Tibor Reiss <75096465+tibor-reiss@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Muhammad Naufil <m.naufil1@gmail.com>
Co-authored-by: sizhky <yyeshr@gmail.com>
Co-authored-by: Umar Butler <umar@umar.au>
Co-authored-by: Jonathan Mamou <jonathan.mamou@intel.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Sara Adkins <sara@neuralmagic.com>
Co-authored-by: Sara Adkins <sara.adkins65@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Dipika <dipikasikka1@gmail.com>
2024-10-10 11:49:34 +02:00
69b5ccb887 Add Translate docs into Arabic - section files CONCEPTUAL GUIDES (#33982)
Add Translate docs into Arabic - section files CONCEPTUAL GUIDES
---------------------------------------------------------------------------------------
 Philosophy [i18n-ar] Translated file : docs/source/ar/philosophy.md into Arabic #33064
 Glossary [i18n-ar] Translated file : docs/source/ar/glossary.md into Arabic #33038
 What 🤗 Transformers can do [i18n-ar] Translated file : docs/source/ar/task_summary.md into Arabic #33073
 How 🤗 Transformers solve tasks [i18n-ar] Translated file : docs/source/ar/tasks_explained.md into Arabic #33074
 The Transformer model family [i18n-ar] Translated file : docs/source/ar/model_summary.md into Arabic #33047
 Summary of the tokenizers [i18n-ar] Translated file : docs/source/ar/tokenizer_summary.md into Arabic #33078
 Attention [i18n-ar] Translated file : docs/source/ar/attention.md into Arabic #33021
 Padding and truncation [i18n-ar] Translated file : docs/source/ar/pad_truncation.md into Arabic #33050
 BERTology [i18n-ar] Translated file : docs/source/ar/bertology.md into Arabic #33024
 Perplexity of fixed-length models [i18n-ar] Translated file : docs/source/ar/perplexity.md into Arabic #33063
 Pipelines for webserver inference [i18n-ar] Translated file : docs/source/ar/pipeline_webserver.md into Arabic #33066
 Model training anatomy [i18n-ar] Translated file : docs/source/ar/model_memory_anatomy.md into Arabic #33045
 Getting the most out of LLMs [i18n-ar] Translated file : docs/source/ar/llm_tutorial_optimization.md into Arabic #33043
2024-10-09 14:51:19 -07:00
88d01d9119 🌐 [i18n-KO] Translated generation_utils.md to Korean (#33818)
* docs: ko: generation_utils.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update generation_utils.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-09 11:55:07 -07:00
c02cf48729 🌐 [i18n-KO] Translated main_classes/callback.md to Korean (#33572)
* docs: ko: callback.md

* feat: nmt draft & manual edits

* fix: resolve suggestions

* Update docs/source/ko/main_classes/callback.md

* Apply suggestions from code review

* Apply suggestions from code review

확인했습니다! 상세한 리뷰 정말 감사합니다!

Co-authored-by: boyunJang <gobook1234@naver.com>

* Update _toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-09 11:54:38 -07:00
0354d44926 🌐 [i18n-KO] Translated text_generation.md to Korean (#33777)
* docs: ko: text_generation.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

---------

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-09 11:20:01 -07:00
973e6066d4 🌐 [i18n-KO] Translated model_doc/patchtst.md to Korean (#33589)
* docs: ko: model_doc/patchtst.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

---------

Co-authored-by: Jihun Lim <31366038+heuristicwave@users.noreply.github.com>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-09 11:15:24 -07:00
61a6dce7e4 🌐 [i18n-KO] Translated main_classes/data_collator.md to Korean (#33954)
* docs: ko: main_classes/data_collator.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: resolve suggestions

---------

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-09 11:14:43 -07:00
6ac5f25bb6 🌐 [i18n-KO] Translated modeling_utils.md to Korean (#33808)
* docs: ko: modeling_utils.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
2024-10-09 10:50:03 -07:00
8dca259826 🌐 [i18n-KO] Translated model_doc/graphormer.md to Korean (#33569)
* docs: ko: model_doc/graphormer.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
2024-10-09 10:44:28 -07:00
4ad923344d 🌐 [i18n-KO] Translated model_doc/informer.md to Korean (#33585)
* docs: ko: model_doc/informer.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
2024-10-09 10:41:06 -07:00
04f51c42c8 🌐 [i18n-KO] Translated model_doc/time_series_transformer.md to Korean (#33596)
* docs: ko: model_doc/time_series_transformer.md

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-09 10:40:48 -07:00
32cc15c6a2 🌐 [i18n-KO] Translated model_doc/trajectory_transformer.md to Korean (#33597)
* docs: ko: model_doc/trajectory_transformer.md

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
2024-10-09 10:40:36 -07:00
f0fbef1c63 🌐 [i18n-KO] Translated main_classes/model.md to Korean (#33606)
* feat: nmt draft

* fix: manual edits

* docs: ko: main_classes/model.md

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-09 10:40:06 -07:00
48b54205d0 🌐 [i18n-KO] Translated model_doc/mamba2.md to Korean (#33629)
* docs: ko: model_doc/mamba2.md

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestion

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

---------

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-09 10:39:54 -07:00
03e6fa0061 🌐 [i18n-KO] Translated main_classes/keras_callbacks.md to Korean (#33955)
* docs: ko: main_classes/keras_callbacks.md

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-09 10:34:01 -07:00
13929a0ec6 🌐 [i18n-KO] Translated model_doc/deberta.md to Korean (#33967)
* docs: ko: model_doc/deberta.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
2024-10-09 10:33:34 -07:00
41794e6098 🌐 [i18n-KO] Translated model_doc/bart.md to Korean (#33893)
* docs: ko: model_doc/bart.md

* fix: anchor edits

* feat: nmt draft

* Update docs/source/ko/model_doc/bart.md

* Update docs/source/ko/model_doc/bart.md

* fix: manual edits

* Update docs/source/ko/model_doc/bart.md

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-09 10:33:14 -07:00
36d410dab6 FEAT : Adding BitNet quantization method to HFQuantizer (#33410)
* rebasing changes

* fixing style

* adding some doc to functions

* remove bitblas

* change dtype

* fixing check_code_quality

* fixing import order

* adding doc to tree

* Small update on BitLinear

* adding some tests

* sorting imports

* small update

* reformatting

* reformatting

* reformatting with ruff

* adding assert

* changes after review

* update disk offloading

* adapting after review

* Update after review

* add is_serializable back

* fixing style

* adding serialization test

* make style

* small updates after review
2024-10-09 17:51:41 +02:00
48461c0fe2 Make pipeline able to load processor (#32514)
* Refactor get_test_pipeline

* Fixup

* Fixing tests

* Add processor loading in tests

* Restructure processors loading

* Add processor to the pipeline

* Move model loading on tom of the test

* Update `get_test_pipeline`

* Fixup

* Add class-based flags for loading processors

* Change `is_pipeline_test_to_skip` signature

* Skip t5 failing test for slow tokenizer

* Fixup

* Fix copies for T5

* Fix typo

* Add try/except for tokenizer loading (kosmos-2 case)

* Fixup

* Llama not fails for long generation

* Revert processor pass in text-generation test

* Fix docs

* Switch back to json file for image processors and feature extractors

* Add processor type check

* Remove except for tokenizers

* Fix docstring

* Fix empty lists for tests

* Fixup

* Fix load check

* Ensure we have non-empty test cases

* Update src/transformers/pipelines/__init__.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/pipelines/base.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Rework comment

* Better docs, add note about pipeline components

* Change warning to error raise

* Fixup

* Refine pipeline docs

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-10-09 16:46:11 +01:00
4fb28703ad Fix PIL dep for tests (#34028)
Fix PIL dep for tess
2024-10-09 10:45:06 -04:00
5ee52ae0bc Mllama: fix tests (#34000)
* fix tests

* don't need this

* style
2024-10-09 14:02:56 +02:00
295a90cb40 Generate: remove most decoder-only LLMs prepare_inputs_for_generation (#33870) 2024-10-09 12:15:48 +01:00
cdee5285ca Fix Failed tests with mobile bert resize tokens embedding (#33950)
* Fix Failed tests with mobile bert

* Cast to the correct dtype

* Code fixup

* Fix padding_idx larger that embedding_size

* Reduce covariance more. use 1e-7 instead of 1e-5

* Comment fix

* Reduce covariance more. use 1e-9 instead of 1e-7

* Copy new config

* all but MRA fixed

* fix mra

* very flaky

* skip instead

* make fixup

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
2024-10-09 11:23:50 +01:00
faa0f63b93 Add gguf support for StableLM (#33793)
* add stablelm gguf architecture support

* add additional quantization tests

* resolve merge conflict, add weight conversion tests for fp16
2024-10-09 12:16:13 +02:00
e783f12f20 [Patch helper] update to not have to checkout main (#34006)
add more support
2024-10-09 09:21:46 +02:00
698b36da72 🌐 [i18n-KO] Translated modular_transformers.md to Korean (#33772)
* docs: ko: modular_transformers.md

* feat: nmt draft

* fix inline TOC

* fix: manual edits

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* fix: resolve suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-08 18:30:41 -07:00
6151bc47ba 🌐 [i18n-KO] Translated image_processing_utils.md to Korean (#33804)
* docs: ko: image_processing_utils.md

* feat: nmt draft

* fix: manual edits
2024-10-08 18:19:37 -07:00
d31d076b53 🌐 [i18n-KO] Translated output.md to Korean (#33607)
* nmt draft

* fix toctree

* minor fix

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs/source/ko/main_classes/output.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: boyunJang <gobook1234@naver.com>
Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-10-08 18:19:21 -07:00
109b1e7591 🌐 [i18n-KO] Translated blip.md to Korean (#33515)
* docs: ko:  model_doc/blip

* feat: nmt darft

* Apply suggestions from code review

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>

* Update docs/source/ko/model_doc/blip.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2024-10-08 17:59:31 -07:00
5809b43a62 🌐 [i18n-KO] Translated biogpt.md to Korean (#33773)
* docs: ko: biogpt.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestion

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

---------

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
2024-10-08 17:57:51 -07:00
c674f2e313 🌐 [i18n-KO] Translated openai-gpt.md to Korean (#33801)
* docs: ko: openai-gpt.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
2024-10-08 17:57:33 -07:00
c15d01fa1d 🌐 [i18n-KO] Translated file_utils.md to Korean (#33803)
* docs: ko: file_utils.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
2024-10-08 17:57:17 -07:00
f0f8077025 🌐 [i18n-KO] Translated swin.md to Korean (#33510)
* ko: doc: model_doc/swin.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* Update docs/source/ko/model_doc/swin.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* resolve conflicts

* resolve conflicts - 2

---------

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2024-10-08 17:57:03 -07:00
0d0ec1dbfb 🌐 [i18n-KO] Translated tokenization_utils.md to Korean (#33813)
* docs: ko: tokenization_utils.md

* feat: nmt draft

* fix: manual edits
2024-10-08 17:56:30 -07:00
386401eca0 🌐 [i18n-KO] Translated main_classes/onnx.md to Korean (#33601)
* docs: ko: main_classes/onnx.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>
2024-10-08 17:15:46 -07:00
db5f117b8a 🌐 [i18n-KO] Translated model_doc/deberta-v2.md to Korean (#33968)
* docs: ko: model_doc/deberta-v2.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
2024-10-08 17:15:33 -07:00
cd9a3c49b8 🌐 [i18n-KO] Translated model_doc/dbrx.md to Korean (#33951)
* docs: ko: model_doc/dbrx.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>
2024-10-08 17:14:42 -07:00
d6d07f9c77 🌐 [i18n-KO] Translated model_doc/cohere.md to Korean (#33885)
* docs: ko: model_doc/cohere.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: resolve suggestions

---------

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>
2024-10-08 17:14:25 -07:00
48e80284fa 🌐 [i18n-KO] Translated model_doc/mistral.md to Korean (#33648)
* docs: ko: model_doc/mistral.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
2024-10-08 17:14:12 -07:00
adb14b93f4 🌐 [i18n-KO] Translated model_doc/llama3.md to Korean (#33635)
* docs: ko: model_doc/llama3.md

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-08 17:13:57 -07:00
291e707868 🌐 [i18n-KO] Translated model_doc/paligemma.md to Korean (#33612)
* docs: ko: model_doc/paligemma.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-08 17:13:25 -07:00
dd43dafa39 🌐 [i18n-KO] Translated model_doc/clip.md to Korean (#33610)
* docs: ko: model_doc/clip.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

* fix: resolve suggestions

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
2024-10-08 17:13:07 -07:00
acde6c7d9d 🌐 [i18n-KO] Translated model_doc/patchtsmixer.md to Korean (#33587)
* docs: ko: model_doc/patchtsmixer.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* fix: resolve suggestions

---------

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
2024-10-08 17:11:48 -07:00
bb825dde73 🌐 [i18n-KO] Translated model_doc/autoformer.md to Korean (#33574)
* docs: ko: model_doc/autoformer.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions
2024-10-08 17:11:19 -07:00
1d458437dd 🌐 [i18n-KO] Translated model_doc/mamba.md to Korean (#33626)
* docs: ko: model_doc/mamba.md

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

* fix: resolve suggestions

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-08 17:11:11 -07:00
47da2c528b 🌐 [i18n-KO] Translated main_classes/configuration.md to Korean (#33952)
* docs: ko: main_classes/configuration.md

* feat: nmt draft
2024-10-08 17:11:02 -07:00
2e8de976bd 🌐 [i18n-KO] Translated main_classes/quantization.md to Korean (#33959)
* docs: ko: main_classes/quantization.md

* feat: nmt draft

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* fix: resolve suggestions

---------

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-08 17:10:41 -07:00
2fe77783c3 🌐 [i18n-KO] Translated rag.md to Korean (#33989)
* fix: toctree edits

* feat: nmt-draft

* fix: edit Inline TOC
2024-10-08 17:10:26 -07:00
1ed98773e5 🌐 [i18n-KO] Translated gpt_neox_japanese.md to Korean (#33894)
* docs: ko: gpt_neox_japanese.md

* Update _toctree.yml

* fix: manual edits

* Update docs/source/ko/model_doc/gpt_neox_japanese.md

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/model_doc/gpt_neox_japanese.md

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/model_doc/gpt_neox_japanese.md

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

---------

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
2024-10-08 17:08:06 -07:00
79af52ad9a 🌐 [i18n-KO] Translated bertweet.md to Korean (#33891)
* docs: ko: bertweet.md

* Update _toctree.yml

* fix: manual edits

* Update docs/source/ko/model_doc/bertweet.md

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

---------

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
2024-10-08 17:07:13 -07:00
d49999ce11 🌐 [i18n-KO] Translated feature_extractor.md to Korean (#33775)
* docs: ko: feature_extractor.md

* feat: nmt draft

* fix: manual edits
2024-10-08 17:06:56 -07:00
573942d96a Fix trainer_seq2seq.py's __init__ type annotations (#34021)
* Fix `trainer_seq2seq.py`'s `__init__` type annotations

* Update src/transformers/trainer_seq2seq.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Fix issue pointed out by `muellerzr`

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-10-08 16:43:30 -04:00
04b4e441dc Remove decoder_config=None (#34014)
* remove unnecessary line

* changed to the right one
2024-10-08 15:57:12 +02:00
1909def2de fix awq tests due to ipex backend (#34011)
fix awq tests
2024-10-08 15:56:05 +02:00
4f2bf135af Fix typing issue (#34012) 2024-10-08 15:15:40 +02:00
f4b741d674 Fixup DeepSpeed things (#34007) 2024-10-08 09:04:24 -04:00
17806d11ba Improve modular converter (#33991)
* improve modular

* style

* Update modular_model_converter.py

* pretty print warning

* style

* Support to remove unused classes as part of added dependencies as well

* nits

* correct bug

* add example

* style

* Add documentation
2024-10-08 14:53:58 +02:00
fb360a6c7a BatchFeature.to() supports non-tensor keys (#33918)
* Fix issue in oneformer preprocessing

* [run slow] oneformer

* [run_slow] oneformer

* Make the same fixes in DQA and object detection pipelines

* Fix BatchFeature.to() instead

* Revert pipeline-specific changes

* Add the same check in Pixtral's methods

* Add the same check in BatchEncoding

* make sure torch is imported
2024-10-08 13:43:32 +01:00
3b44d2f042 Image pipelines spec compliance (#33899)
* Update many similar visual pipelines

* Add input tests

* Add ImageToText as well

* Add output tests

* Add output tests

* Add output tests

* OutputElement -> Output

* Correctly test elements

* make fixup

* fix typo in the task list

* Fix VQA testing

* Add copyright to image_classification.py

* Revert changes to VQA pipeline because outputs have differences - will move to another PR

* make fixup

* Remove deprecation warnings
2024-10-08 13:34:28 +01:00
e2001c3413 Add auto model for image-text-to-text (#32472)
* Add Auto model for image-text-to-text

* Remove donut from processing auto, add chameleon ti image text to text models

* add qwen2_vl and llava_onevision

* add pixtral to auto model for image-text-to-text

* add mllama and idefics3

* remove models in IGNORE_NON_AUTO_CONFIGURED

* add AutoModelForImageTextToText to tests and doc
2024-10-08 14:26:43 +02:00
0dbc7090ba Processors: don't default padding side (#33942)
* don't default padding side

* fix
2024-10-08 10:58:49 +02:00
a3add29097 Add support for __all__ and potentilly deleting functions (#33859)
* Add support for __all__ and potentailly deleting functions

* updates

* update

* nits

* remove dummies

* fix warning

* fixup

* style

* update

* fixup

* skip copied from when # skip

* remove log

* bring dummies back

* fixup

* remove copied from

* fixup

* remove warnings from `make fix-copies`

* fix doc issues

* nits

* Better error message !

* add support for more flexible naming!

* style

* breaking style?

* fix super() renaming issues

* del not needed when you don't call super().__init__()

* style

* no more fmt on :)

* properly remove `self`

* fixup

* fix

* doc nits

* add some doc 🫡
2024-10-08 10:19:17 +02:00
bead0fa8dc Cache: slight change in naming (#32421)
* squash

* codestyle

* Update src/transformers/cache_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* propagate changes to all cache classes

* + whisper

* fix tests

* more fixes

* add deprecation warning

* fix copies

* address comments

* fix mistral also

* these didn't have "copied from"

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-10-08 09:43:40 +02:00
d6ba1ac041 🌐 [i18n-KO] Translated gemma.md to Korean (#33936)
* docs: ko: gemma.md

* feat: nmt draft

* fix: manual edits
2024-10-07 15:59:14 -07:00
46f146a2b5 🌐 [i18n-KO] Translated vit.md to Korean (#33884)
* docs: ko: model_doc/vit.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* Update docs/source/ko/model_doc/vit.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/model_doc/vit.md

Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

---------

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
2024-10-07 15:35:11 -07:00
1ecca92f03 🌐 [i18n-KO] Translated swin2sr.md to Korean (#33795)
* ko: doc: model_doc/swin2sr.md

* feat: nmt draft

* Update docs/source/ko/model_doc/swin2sr.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

---------

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2024-10-07 15:34:56 -07:00
8258219c4c 🌐 [i18n-KO] Translated auto.md to Korean (#33590)
* docs: ko: model_doc/auto.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* fix: resolve suggestions

---------

Co-authored-by: wony617 <49024958+Jwaminju@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2024-10-07 15:34:45 -07:00
253a9a9d6f 🌐 [i18n-KO] Translated logging.md to Korean (#33543)
* docs: ko: main_classes/logging.md

* feat: nmt-draft

* fix: update toctree.yml

* Update docs/source/ko/main_classes/logging.md

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Update docs/source/ko/main_classes/logging.md

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: Sungmin Oh <fabxoe.kor@gmail.com>
Co-authored-by: HyeokJun SHIN <96534680+jun048098@users.noreply.github.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2024-10-07 15:34:34 -07:00
178d707b7e 🌐 [i18n-KO] Translated chameleon.md to Korean (#33799)
* docs: ko: chameleon.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
2024-10-07 15:06:13 -07:00
13432f8409 🌐 [i18n-KO] Translated trainer.md to Korean (#33797)
* docs: ko: trainer.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
2024-10-07 15:05:57 -07:00
e9fbe62965 🌐 [i18n-KO] Translated pipelines_utils.md to Korean (#33809)
* docs: ko: pipelines_utils.md

* feat: nmt draft

* fix: manual edits
2024-10-07 15:05:17 -07:00
9c61ba2f25 🌐 [i18n-KO] Translated time_series_utils.md to Korean (#33806)
* docs: ko: time_series_utils.md

* feat: nmt draft

* fix: manual edits
2024-10-07 15:05:00 -07:00
9c8bd3fc1b 🌐 [i18n-KO] Translated esm.md to Korean (#33796)
* docs: ko: esm.md

* feat: nmt draft

* fix: manual edits
2024-10-07 13:39:22 -07:00
6996f2186a 🌐 [i18n-KO] Translated audio_utils.md to Korean (#33802)
* docs: ko: audio_utils.md

* feat: nmt draft

* fix: manual edits
2024-10-07 13:39:10 -07:00
410c73af1d 🌐 [i18n-KO] Translated swinv2.md to Korean (#33566)
* docs: ko: model_doc/swinv2.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits
2024-10-07 12:50:43 -07:00
6c18cefed0 🌐 [i18n-KO] Translated gguf.md to Korean (#33764)
* docs: ko: gguf.md

* feat nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>

---------

Co-authored-by: Jiwook Han <33192762+mreraser@users.noreply.github.com>
Co-authored-by: Chulhwa (Evan) Han <cjfghk5697@ajou.ac.kr>
2024-10-07 12:49:08 -07:00
c91fe85b78 Fix undefined default_config in configuration_utils.py (#33934) 2024-10-07 18:32:20 +02:00
736c7cde51 [pytes collection] Fix flax test collection (#34004)
bit weird but to filter I had to use this
2024-10-07 18:11:13 +02:00
roy
55be7c4c48 Enable customized optimizer for DeepSpeed (#32049)
* transformers: enable custom optimizer for DeepSpeed

* transformers: modify error message

---------

Co-authored-by: datakim1201 <roy.kim@maum.ai>
2024-10-07 15:36:54 +02:00
7bae833728 properly fix and RUN_SLOW (#33965)
* properly fix and RUN_SLOW

* lots of models were affected

* fix-copies

* more fixes
2024-10-07 14:45:57 +02:00
e782e95e34 Fix Tensor + Embedding error in some cases when using SiglipVisionModel (#33994)
Fix Tensor + Embedding error in some cases

Co-authored-by: kaitolucifer <kaito.o@ghelia.com>
2024-10-07 11:17:34 +02:00
9b4b0c07db [Red CIs] Fix hub failures (#34001)
maybe setup should work?
2024-10-07 10:56:24 +02:00
ad1a250719 [Docs] Add Developer Guide: How to Hack Any Transformers Model (#33979)
* docs: add example for separating q, k, v projections in SAM

* docs: How to Hack Any Transformers Model

* docs: remove changes from sam model docs

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-07 10:08:20 +02:00
f5aeb7c1a5 [Docs] Improve VLM docs (#33393)
* Improve docs

* Update docs/source/en/model_doc/llava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/llava.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Address comment

* Address comment

* Improve pixtral docs

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-10-07 09:54:07 +02:00
1f33023cfa Flash-attn performance: remove cuda sync during inference (#33570)
Switch conditions to use short-circuit during inference
2024-10-07 09:52:19 +02:00
4953ddf036 Add position ids in forward pass to opt model (#33121)
* start working on adding position ids

* add docs

* Refactor modeling_biogpt.py and modeling_opt.py for code consistency

* fix 2 PR comments

* move position_ids to end of args

* remove trailing white space

* add comment with TODO

* bug fix gradient checkpointing

* fixup

* missed on position_ids

* remove _attention_to_position_ids and refactor embedding class

* remove redundent code

---------

Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
2024-10-07 09:20:49 +02:00
1bd604d11c [WIP] Add Tokenizer for MyT5 Model (#31286)
* Initial commit for MyT5 model

* custom implementation of MyT5 tokenizer, unused files deleted

* unittest for myt5 tokenizer

* upadate of import structure and style

* removed remmanents of MyT5Config

* fixed docstrings

* Updates after review: filled documentaion file, new docstrings and tests added

* Fixed code style issues

* fixed copied from to refer to function

* updated loading myt5 tokenizer in tests, added sample byte map file to fixtures

* changes after review

* removed redundant copied from

* removed redundant copied from

* optimalization and loading model from hf

* [run_slow] myt5

* [run-slow] myt5

* Updated en documentation for myt5

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-06 10:33:16 +02:00
5ef432e474 [TF] Fix Tensorflow XLA Generation on limited seq_len models (#33903)
* fix tf xla generation on limited seq_len models

* [run-slow] opt

* [run-slow] opt
2024-10-05 16:20:50 +02:00
22e102ad98 Bug fix gguf qwen2moe (#33940)
* fix qwen2moe tensors mapping, add unit tests

* add expert tensor split logic, test refactoring

* small params refactoring

* add comment to tensor reshaping
2024-10-05 16:19:01 +02:00
56be9f1925 add test for Jamba with new model jamba-tiny-dev (#33863)
* add test for jamba with new model

* ruff fix

---------

Co-authored-by: Yehoshua Cohen <yehoshuaco@ai21.com>
2024-10-05 16:03:12 +02:00
a7e4e1a77c Updating char_to_token documentation to note behaviour when trim_offsets is True (#33919)
Updating char_to_token documentation.
2024-10-05 14:13:26 +02:00
612065efeb Paligemma: fix static cache test (#33941)
* fix

* not flaky anymore + style
2024-10-05 09:47:37 +02:00
38f9f10dd9 Cache: revert DynamicCache init for BC (#33861)
* tmp commit

* tmp commit

* make fixup

* missing removal

* fix condition

* fix end-to-end compilation

* if -> elif

* BC

* BC

* use @deprecate_kwarg("num_hidden_layers", version="4.47.0")

* wups the import

* 🥴

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-10-04 22:47:08 +02:00
f92d354823 fix red check-copies (#33964) 2024-10-04 22:45:37 +02:00
f319ba16fa Add Zamba (#30950)
* Update index.md

* Rebase

* Rebase

* Updates from make fixup

* Update zamba.md

* Batched inference

* Update

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update configuration_zamba.py

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update configuration_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba

* Update ZambaForCausalLM

* Update ZambaForCausalLM

* Describe diffs with original mamba layer

* Moved mamba init into `_init_weights`

* Update index.md

* Rebase

* Rebase

* Updates from make fixup

* Update zamba.md

* Batched inference

* Update

* Fix tests

* Fix tests

* Fix tests

* Fix tests

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/model_doc/zamba.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update configuration_zamba.py

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Update configuration_zamba.py

* Update modeling_zamba.py

* Update modeling_zamba.py

* Merge branch 'main' of https://github.com/Zyphra/transformers_zamba

* Update ZambaForCausalLM

* Moved mamba init into `_init_weights`

* Update ZambaForCausalLM

* Describe diffs with original mamba layer

* make fixup fixes

* quality test fixes

* Fix Zamba model path

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* circleci fixes

* Update

* circleci fixes

* fix zamba test from merge

* fix ValueError for disabling mamba kernels

* add HF copyright

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* shared_transf --> shared_transformer

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fixes

* Move attention head dim to config

* Fix circle/ci tests

* Update modeling_zamba.py

* apply GenerationMixin inheritance change from upstream

* apply import ordering

* update needed transformers version for zamba

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add contribution author

* add @slow to avoid CI

* Update src/transformers/models/zamba/modeling_zamba.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Define attention_hidden_size

* Added doc for attention_head_size

* trigger CI

* Fix doc of attention_hidden_size

* [run-slow] zamba

* Fixed shared layer logic, swapped up<->gate in mlp

* shared_transformer -> shared_transf

* reformat HybridLayer __init__

* fix docstrings in zamba config

* added definition of _get_input_ids_and_config

* fixed formatting of _get_input_ids_and_config

---------

Co-authored-by: root <root@node-4.us-southcentral1-a.compute.internal>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: root <root@node-1.us-southcentral1-a.compute.internal>
Co-authored-by: Quentin Anthony <qganthony@yahoo.com>
2024-10-04 22:28:05 +02:00
e3775539c8 PhiMoE (#33363)
* onboard phimoe model

* removed debug code

* added unit tests

* updated docs

* formatted

* fixed unit tests

* fixed test case

* fixed format

* refactored code

* fixed expected outputs in the integration tests

* Added a warning msg

* Addressed comments

* Addressed comments

* fixed test cases

* added paper link

* Addressed comments

* Refactored PhimoeForCausalLM forward fn

* Refactored PhimoeRotaryEmbedding class

* fixed test cases

* fixed testcase

* fixed test case

* Addressed comments

* fixed test cases

* fixed testcases

* Used cache position instead to get the seq len
2024-10-04 21:39:45 +02:00
46579c0e77 hot fix self.position_embeddings->self.position_embedding (#33958) 2024-10-04 21:35:31 +02:00
0d1692a49b Fix attn mask ignore logic in training-time trace (#32613)
* fix attn mask logic for training-time trace

* add test

* fix

* fix

* fix

* fix

* fix

* format

* [run-slow] llama

* avoid accelearate

* [run-slow] llama
2024-10-04 19:00:45 +02:00
614660fdb9 Removed unnecessary transpose in Switch Transformer Routing (#33582)
removed switch transformer routing transpose
2024-10-04 17:39:03 +02:00
78ef58325c 🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. (#33325)
* intilize new embeddings from normal distrib

* Fix typo in comments

* Fix typo in comments

* Fix style

* Fix variables naming

* Add tests

* Fix style

* code consistency nit

* Add deepspeed support

* Add deepspeed support

* Conver embeddings weights to float32 before computations

* Add deepspeed tests

* Cover when vocab_size is smaller than embedding_size

* Style fix

* Add tests for vocab_size smaller than hiddin_size

* Style fix

* Nits in tests

* Nits in tests

* Check for deepspeed before importing it

* Increase vocab_size for positive definite covariance matrix test

* Add warning

* Add multivariate_resizing flag and implement resizing for lm_heads

* Fix typo

* Fix wrong bias indexing

* Fix bias is zero check

* remove multivariate_resizing flag from tests

* Intialize bias from old bias normal distribution

* Fixup

* Code usability

* Use mean_resizing instead of multivariate_resizing

* Fix up

* Fix comments and docs
2024-10-04 16:29:55 +02:00
b916efcb3c Enables CPU AWQ model with IPEX version. (#33460)
* enable cpu awq ipex linear

* add doc for cpu awq with ipex kernel

* add tests for cpu awq

* fix code style

* fix doc and tests

* Update docs/source/en/quantization/awq.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/autoawq/test_awq.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix comments

* fix log

* fix log

* fix style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-10-04 16:25:10 +02:00
de4112e4d2 Add a section on writing tool templates to the chat template docs (#33924)
* Add a section on writing tool templates to the chat template docs

* Small cleanups
2024-10-04 14:40:44 +01:00
2e719e35fd [PR run-slow] (#33939)
* force latest torch

* Update .github/workflows/self-pr-slow-ci.yml

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2024-10-04 14:46:15 +02:00
061c2c4c38 Ignore keys on validate_rope (#33753)
* ignore keys on check rope

* add tests

* fix tests, so maybe better leave at logger lvl
2024-10-04 12:39:37 +02:00
4a173b88b5 [i18n-ru] Fixes typo in the README_ru.md (#33882) 2024-10-04 11:21:38 +02:00
b6a01df6e9 [Doc]: Broken link in Kubernetes doc (#33879)
* add relative path in .md and redirects to conf.py

* add redirects to conf.py and update .md

* modify links in .md
2024-10-04 11:20:56 +02:00
124713c32b Fix distil whisper segment computation (#33920)
* Fix distil whisper segment computation

* [run-slow] whisper
2024-10-04 11:18:01 +02:00
2bd4d5897d Minor error condition bug fix (#33781)
* Error condition bug fix

* Update error message

* Update src/transformers/models/qwen2_vl/modeling_qwen2_vl.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Making change in the rest of the repo

* Formatting

* Formatting with ruff

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2024-10-04 08:25:32 +02:00
550673a70c Remove logits.float() (#33902)
* Remove logits.float() if not computing loss

* Remove warning about 4.46 logits dtype change if not computing loss
2024-10-04 08:21:12 +02:00
074aa3b3fd Uniformize kwargs for Idefics/2 processors (#32568)
* Add uniformize idefics processor kwargs and tests

* Uniformize idefics2 processor kwargs

* add image_processor tests idefics

* add BC args order change idefics2 processor and update doc

* Add support for multiple images per prompt in image-text-to-text mode idefics

* Fix processor input args in idefics tests

* improve test processing common, remove unnecessary tests, update process uniformization

* fix doctrings idefics

* fix tests processors idefics/2
2024-10-03 18:08:24 +02:00
b0c5660e88 Config: lower save_pretrained exception to warning (#33906)
* lower to warning

* msg

* make fixup

* rm extra comma
2024-10-03 16:45:14 +01:00
15a4d24805 Add support for weights_only flag when loading state_dict (#32481)
* Add support for `weights_only` flag when loading state_dict

Summary:
This is to enable loading a state_dict with wrapper tensor subclasses (used in torchao to
for quantized weights)

Test Plan:
tested locally with torchao weights, also need https://github.com/huggingface/transformers/pull/32306:
```
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TorchAoConfig
from torchao.utils import benchmark_model
import torchao

DEVICE_TYPE = "cuda"

def init_model_and_benchmark(model_id, torch_dtype=torch.bfloat16, quantization_config=None):
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    if quantization_config is not None:
        model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.\bfloat16, quantization_config=quantization_config)
    else:
        model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.\bfloat16, weights_only=False)

    # sanity check: run the model
    input_text = "What are we having for dinner?"
    input_ids = tokenizer(input_text, return_tensors="pt").to(DEVICE_TYPE)
    output = model.generate(**input_ids, max_new_tokens=1000)
    print(tokenizer.decode(output[0], skip_special_tokens=True))

    NUM_WARMUP = 1
    NUM_RUNS = 5

    if quantization_config is not None:
        torchao.quantization.utils.recommended_inductor_config_setter()

    model = torch.compile(model, mode="max-autotune")

    benchmark_model(model.generate, NUM_WARMUP, kwargs=input_ids, device_type=DEVICE_TYPE)
    print("running benchmark")
    results = benchmark_model(model.generate, NUM_RUNS, kwargs=input_ids, device_type=DEVICE_TYPE)
    return model, results

model_id = "jerryzh168/test-model"
torchao.quantization.utils.recommended_inductor_config_setter()
bf16_model, bf16_time = init_model_and_benchmark(model_id)
print(f"bf16: {bf16_time}")
```

Reviewers:

Subscribers:

Tasks:

Tags:

* format
2024-10-03 17:03:42 +02:00
a220c5b99f add setter for trainer processor (#33911)
* add setter for trainer processor

* Update src/transformers/trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2024-10-03 16:34:10 +02:00
6500f78c86 [PEFT] Support low_cpu_mem_usage option for PEFT loading adapters (#33725)
* [PEFT] Support low_cpu_mem_usage for PEFT loading

PEFT added support for low_cpu_mem_usage=True when loading adapters in
https://github.com/huggingface/peft/pull/1961. This feature is now
available when installing PEFT v0.13.0. With this PR, this option is
also supported when loading PEFT adapters directly into transformers
models.

Additionally, with this PR,
https://github.com/huggingface/diffusers/pull/9510 will be unblocked,
which implements this option in diffusers.

* Fix typo
2024-10-03 16:15:36 +02:00
bf0ffe3d29 [Tests] Diverse Whisper fixes (#33665)
* fix beam indices in token_timestamps

* fix attention_mask in FA2

* correct translation example with the right example

* correct how somes tests are using outputs + correct num_frames

* fix shortform batch prev cond tests

* make fix-copies

* make fix-copies

* take care of shifting beam indices

* [run-slow] whisper

* [run-slow] whisper
2024-10-03 15:59:01 +02:00
ab97a78130 Fix: use unidic-lite instead of ipadic as the tokenizer dictionary for Japanese (#33372)
* Fix: use unidic-lite instead of ipadic as the tokenizer dictionary of Japanese

Signed-off-by: Kan Takahiro <kan@Kans-Mac-mini.local>

* fix the default name

---------

Signed-off-by: Kan Takahiro <kan@Kans-Mac-mini.local>
Co-authored-by: Kan Takahiro <kan@Kans-Mac-mini.local>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-10-03 15:30:03 +02:00
d29738f5b4 Generate tests: modality-agnostic input preparation (#33685) 2024-10-03 14:01:24 +01:00
f2bf4fcf3d Add SplinterTokenizer unit test (#32652)
* add unit tests for splinter_tokenizer

* add unit test for splinter tokenizer, pass in the question_token to be saved on save_pretrained called

* remove unused import

* remove vocab_splinter.txt, add Copied from, use fmt:on and fmt:off to prevent autoformatting on long lines

* remove all the spaces

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-10-03 14:49:56 +02:00
95a2f5f6c3 Fix module initialization for root module under Zero3 (#33632)
* Use all state dict keys when checking if root module is initialized.

* Apply style corrections

* Add comment explaining change.

* Change comment phrasing.
2024-10-03 14:41:50 +02:00
4df3ccddb7 Migrate the CI runners to the new clusters (#33849)
* try fixing push-ci

* move to new runners

* move benchmark.yml to new runners

* move doctest_job.yml to new runners

* move doctests.yml to new runners

* move push-important-models.yml to new runners

* move self-pr-slow-ci.yml to new runners

* fix typo

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* fix working directory

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* fix working directory

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* improve code

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2024-10-03 14:39:49 +02:00
6f0ce52760 VLM Generate: tag test_static_cache_matches_dynamic as flaky (#33630)
flaky
2024-10-03 12:27:02 +01:00
f1a5f81296 Update an keyerror on _save_check_point prevent confusion of missing … (#33832)
* Update an keyerror on _save_check_point prevent confusion of missing metric keys

* Update grammar error and case sensitive.

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* adding update KeyError on _evaluate function to align with _save_checkpoint function

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-10-03 10:27:49 +02:00
dc8156fdd8 Fix dt proj bias reassigned (#33314)
* When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object.

* When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object.

* When we set self.dt_proj.bias = None, it removes the bias parameter from the model. When we later tried to assign a tensor to self.dt_proj.bias, it caused a TypeError because PyTorch expects a Parameter object.
2024-10-03 09:51:03 +02:00
d7950bff82 uniformize processor Mllama (#33876)
* uniformize processor Mllama

* nit syntax

* nit
2024-10-02 16:50:15 +02:00
62e8c759c3 rename all test_processing_*.py to test_processor_*.py (#33878)
* rename all test_processing_*.py to test_processor_*.py ans fix duplicate test processor paligemma

* fix copies

* fix broken tests

* fix-copies

* fix test processor bridgetower
2024-10-02 16:43:43 +02:00
2f25ab95db Handle Trainer tokenizer kwarg deprecation with decorator (#33887)
* Handle deprecation with decorator

* Fix for seq2seq Trainer
2024-10-02 15:28:20 +01:00
ee71c9853a Optim deformable detr (#33600)
* optimize deformable detr

* fix copies

* remove deformable_detr_basline

* fix hardcoded float16 and .float()

* [run slow] deformable-detr,grounding-dino,mask2former,oneformer,rt-detr

* [run slow] deformable_detr,grounding_dino,mask2former,oneformer,rt_detr
2024-10-02 15:46:27 +02:00
cac4a4876b [Quantization] Switch to optimum-quanto (#31732)
* switch to optimum-quanto rebase squach

* fix import check

* again

* test try-except

* style
2024-10-02 15:14:34 +02:00
b7474f211d Trainer - deprecate tokenizer for processing_class (#32385)
* Trainer - deprecate tokenizer for processing_class

* Extend chage across Seq2Seq trainer and docs

* Add tests

* Update to FutureWarning and add deprecation version
2024-10-02 14:08:46 +01:00
e7c8af7f33 Add sdpa for DistilBert (#33724)
* Add sdpa for DistilBert

* [run_slow] distilbert

* [run_slow] distilbert

* [run_slow] distilbert

* Try without slow tests

* [run_slow] distilbert

* [run_slow] distilbert
2024-10-02 13:55:19 +01:00
614c79a9b0 Fix kwargs passed by AutoQuantizationConfig.from_pretrained (#33798)
fix kwargs

Co-authored-by: kylesayrs <kyle@neuralmagic.com>
2024-10-02 14:12:03 +02:00
b09234cfc1 Allow for nightly packages of compressed_tensors (#33828)
* only check spec

* correct typo in nightly package name
2024-10-02 14:11:44 +02:00
fe484726aa Add falcon gguf (#33437)
* feat(gguf): add falcon q2 k

* fix(gguf): remove useless renaming

* feat(gguf): seperate falcon 7b and 40b

* feat(gguf): apply fixup

* fix(test): error rebase

* feat(gguf): add fp16 weight comparison for falcon

* feat(gguf): test weight of all layers

* test(gguf): add falcon 40b under skip decorator

* feat(gguf): quick example for extracting model size
2024-10-02 14:10:39 +02:00
181c962aab populate quantization_config for kv-cache-scheme only configs (#33874) 2024-10-02 14:06:40 +02:00
e5d14f39ad Don't run reminder bot for now (#33883)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-02 11:51:01 +02:00
50290cf7a0 Uniformize model processors (#31368)
* add initial design for uniform processors + align model

* add uniform processors for altclip + chinese_clip

* add uniform processors for blip + blip2

* fix mutable default 👀

* add configuration test

* handle structured kwargs w defaults + add test

* protect torch-specific test

* fix style

* fix

* rebase

* update processor to generic kwargs + test

* fix style

* add sensible kwargs merge

* update test

* fix assertEqual

* move kwargs merging to processing common

* rework kwargs for type hinting

* just get Unpack from extensions

* run-slow[align]

* handle kwargs passed as nested dict

* add from_pretrained test for nested kwargs handling

* [run-slow]align

* update documentation + imports

* update audio inputs

* protect audio types, silly

* try removing imports

* make things simpler

* simplerer

* move out kwargs test to common mixin

* [run-slow]align

* skip tests for old processors

* [run-slow]align, clip

* !$#@!! protect imports, darn it

* [run-slow]align, clip

* [run-slow]align, clip

* update common processor testing

* add altclip

* add chinese_clip

* add pad_size

* [run-slow]align, clip, chinese_clip, altclip

* remove duplicated tests

* fix

* add blip, blip2, bridgetower

Added tests for bridgetower which override common. Also modified common
tests to force center cropping if existing

* fix

* update doc

* improve documentation for default values

* add model_max_length testing

This parameter depends on tokenizers received.

* Raise if kwargs are specified in two places

* fix

* removed copied from

* match defaults

* force padding

* fix tokenizer test

* clean defaults

* move tests to common

* add missing import

* fix

* adapt bridgetower tests to shortest edge

* uniformize donut processor + tests

* add wav2vec2

* extend common testing to audio processors

* add testing + bert version

* propagate common kwargs to different modalities

* BC order of arguments

* check py version

* revert kwargs merging

* add draft overlap test

* update

* fix blip2 and wav2vec due to updates

* fix copies

* ensure overlapping kwargs do not disappear

* replace .pop by .get to handle duplicated kwargs

* fix copies

* fix missing import

* add clearly wav2vec2_bert to uniformized models

* fix copies

* increase number of features

* fix style

* [run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert

* [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

* fix concatenation

* [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

* Update tests/test_processing_common.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* 🧹

* address comments

* clean up + tests

* [run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-10-02 10:41:08 +02:00
2292be6c1b Fix: typo (#33880)
Update llm_tutorial.md: typo
2024-10-02 09:12:21 +01:00
61ac161a9d Add support for custom inputs and batched inputs in ProcessorTesterMixin (#33711)
* add support for custom inputs and batched inputs in ProcessorTesterMixin

* Fix batch_size behavior ProcessorTesterMixin

* Change format prepare inputs batched

* Remove override test pixtral processor

* Remove unnecessary tests and cleanup after new prepare_inputs functions

* Fix instructBlipVideo image processor
2024-10-01 23:52:03 +02:00
1baa08897d Repo consistency fix after #33339 (#33873)
* Repo consistency fix after #33339

* [run-slow] omdet_turbo
2024-10-01 21:03:15 +01:00
68a2b50069 [Fix] ViViT interpolate_pos_encoding (#33815)
* fix:test_inference_interpolate_pos_encoding

* style:make style;make fixup

* test: add suggestion to test_modeling_vivit

* chore:add suggestions

* style:make style

* [run_slow] vivit

* ci:slow test fix

* [run_slow] vivit
2024-10-01 20:14:35 +01:00
8635802af9 Move weight initilization deformabledetr (#33339)
* fix(copy): fixup copy

* fix(deformable_detr): move weight initialization to the right place

* fix(grounding_dino): move weight initialization to the right place

* fix(rt_detr): move weight initialization to the right place

* [run-slow] deformable_detr, grounding_dino, rt_detr
2024-10-01 20:08:57 +01:00
a43e84cb3b Make ASR pipeline compliant with Hub spec + add tests (#33769)
* Remove max_new_tokens arg

* Add ASR pipeline to testing

* make fixup

* Factor the output test out into a util

* Full error reporting

* Full error reporting

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Small comment

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-10-01 18:15:04 +01:00
0256520794 fix: repair depth estimation multiprocessing (#33759)
* fix: repair depth estimation multiprocessing

* test: add test for multiprocess depth estimation
2024-10-01 17:59:59 +01:00
f205da9660 Avoid using context that is not accessable from external contributors (#33866)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-01 17:42:45 +02:00
0c4c2d7e07 Add include_loss_for_metrics (#33088)
* Add include_loss_for_metrics

* Fix styling

* Initialize inputs and losses to avoid AttributeError

* Ruff styling

* Refactor compute_metrics and update EvalPrediction

* Change Naming

* Added include_for_metrics to group both args

* Fix style

* Change warnings to logger

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-10-01 16:51:41 +02:00
5f9f58fc59 Validate the eval dataset in advance. (#33743)
* Validate the eval dataset in advance.

* format

* format

* format

* Update src/transformers/trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* format

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-10-01 16:45:06 +02:00
f8110a6ddf Raise accelerate dependency error in case of defaulting low_cpu_mem_usage=True (#33830)
Clarify warning, add import check
2024-10-01 16:44:38 +02:00
326b2bad1c This PR contains additional changes for #33143 (#33581)
* fix: Fix optimizer bug in ModelCard

* fix: fix W293

* Fixes in modelcard.py for issue #33143

---------

Co-authored-by: moontidef <53668275+relic-yuexi@users.noreply.github.com>
2024-10-01 16:42:30 +02:00
b1c914e463 Fix device mismatch errors (#33851)
fix device mismatch errors
2024-10-01 15:55:57 +02:00
ac28a23b3d Workaround for bark issue in pipelines (#33824)
* Quick workaround for bark + generation_config issue

* make fixup

* [run slow] bark
2024-10-01 14:40:12 +01:00
acdfdd9387 add attention weight up-cast to float32 in chameleon (#33822)
add attention weight float32 cast  in chameleon
2024-10-01 15:19:16 +02:00
351873a145 fix: skip dropout in eval for flash_attn in various models (#33844)
* fix(m2m_100): skip dropout in eval for flash_attn

* fix(misc): skip dropout in eval for flash attn various models

* chore(m2m_100): copy flash attn from bart

* chore: run make fix-copies

* [run-slow] bart, m2m_100
2024-10-01 14:39:21 +02:00
88d960937c Refactor image features selection in LlaVa (#33696)
* refactor image features selection

* break line

* remove whitespace

* add pr comments: include projection and rename function

* make fix-copies

* fix get_image_feature in vip llava
2024-10-01 14:37:31 +02:00
22266be970 Generate: move llama prepare_inputs_for_generation to GenerationMixin (#33677) 2024-10-01 12:32:54 +01:00
d19ab15421 post reminder comment only once (#33848)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-10-01 12:52:53 +02:00
fbde09c8c9 fix check for hidden size in text model for deepspeed zero3 auto entries (#33829)
* fix check for hidden size in text model for deepspeed zero3 auto entries

* fix typo
2024-10-01 12:28:26 +02:00
808997a634 Fix passing str dtype to static cache (#33741)
Co-authored-by: Guang Yang <guangyang@fb.com>
2024-10-01 09:50:17 +02:00
c269c5c74d Fix Mamba slow path bug with dtype mismatch. (#32691)
* Fix Mamba slow path bug with dtype mismatch.

* Update test_modeling_mamba.py

* Improve style.

* Fix issue with cache position of dtype mismatch test.

* Change test for slow path.

* Revert changes.

* Switch to buggy code and add test to catch it.

* Fix the dtype mismatch bug and add test code to verify it.

* Fix minor bug with test.

* Fix incorrect dtype of model output.

* Fix incorrect dtype of cache.

* Fix incorrect dtype of ssm cache.

* Fix incorrect dtype of conv state.

* Remove assertion for ssm state.

* Add assertion for conv state dtype.

* Fix all issues with dtype mismatch test.
2024-10-01 09:28:40 +02:00
570c89625b Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/lxmert (#33821)
Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-30 21:57:57 +02:00
90dca5a71b minor typo fix (#33784)
fix typo
2024-09-30 21:42:22 +02:00
b77846a6e6 Fix link in gguf.md (#33768)
Change hyphen to underscore for URL in link to convert_hf_to_gguf.py
2024-09-30 20:17:33 +02:00
baa765f813 Fixes for issue #33763 in idefics2 model (#33766) 2024-09-30 18:08:48 +01:00
18c5b216f1 Fix ViT-MAE decoder interpolate (#33330)
* Fix ViT-MAE decoder interpolate

* Add unit test for `interpolate_pos_encoding` w/ custom sizes

* [run_slow] vit_mae
2024-09-30 18:47:13 +02:00
1dba608df9 [modular] fixes! (#33820)
* fix converter for function definitions

* small changes

* no prints

* style
2024-09-30 16:43:55 +02:00
1d29a75a6a Add Slow CI reminder bot (#33506)
* add workflow

* update

* fix

* Update .github/workflows/slow_ci_remainder.yml

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-30 16:26:54 +02:00
f5247aca01 Hqq serialization (#33141)
* HQQ model serialization attempt

* fix hqq dispatch and unexpected keys

* style

* remove check_old_param

* revert to check HQQLinear in quantizer_hqq.py

* revert to check HQQLinear in quantizer_hqq.py

* update HqqConfig default params

* make ci happy

* make ci happy

* revert to HQQLinear check in quantizer_hqq.py

* check hqq_min version 0.2.0

* set axis=1 as default in quantization_config.py

* validate_env with hqq>=0.2.0 version message

* deprecated hqq kwargs message

* make ci happy

* remove run_expected_keys_check hack + bump to 0.2.1 min hqq version

* fix unexpected_keys hqq update

* add pre_quantized check

* add update_expected_keys to base quantizerr

* ci base.py fix?

* ci base.py fix?

* fix "quantization typo" src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix post merge

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-30 14:47:18 +02:00
4d5b458704 Fix typo in documentation (#33805)
fix typo
2024-09-30 12:02:23 +02:00
4bb49d4e00 Enable non-safetensor ser/deser for TorchAoConfig quantized model 🔴 (#33456)
* Enable non-safetensor serialization and deserialization for TorchAoConfig quantized model

Summary:
After https://github.com/huggingface/huggingface_hub/pull/2440 we added non-safetensor serialization and deserialization
in huggingface, with this we can now add the support in transformers

Note that we don't plan to add safetensor serialization due to different goals of wrapper tensor subclass and safetensor
see README for more details

Test Plan:
tested locally

Reviewers:

Subscribers:

Tasks:

Tags:

* formatting

* formatting

* minor fix

* formatting

* address comments

* comments

* minor fix

* update doc

* refactor compressed tensor quantizer
2024-09-30 11:30:29 +02:00
2e24ee4dfa Fix typing in load_balancing_loss_func function of modeling_mixtral.py. (#33641)
* fix return type

* update to union

* fix gate_logits typing

* fix num_experts type

* fix typing

* run fix-copies

* add doc for top_k

* run fix-copies

* empty commit to trigger CI
2024-09-27 18:10:07 +02:00
d3821c4aed Make audio classification pipeline spec-compliant and add test (#33730)
* Make audio classification pipeline spec-compliant and add test

* Check that test actually running in CI

* Try a different pipeline for the CI

* Move the test so it gets triggered

* Move it again, this time into task_tests!

* make fixup

* indentation fix

* comment

* Move everything from testing_utils to test_pipeline_mixin

* Add output testing too

* revert small diff with main

* make fixup

* Clarify comment

* Update tests/pipelines/test_pipelines_audio_classification.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update tests/test_pipeline_mixin.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Rename function and js_args -> hub_args

* Cleanup the spec recursion

* Check keys for all outputs

---------

Co-authored-by: Lucain <lucainp@gmail.com>
2024-09-27 17:01:06 +01:00
4973fc5769 Model addition timeline (#33762)
* Model addition timeline

* Link guide

* Update docs/source/en/add_new_model.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/add_new_model.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Review comments

* Add contact email

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-09-27 17:15:13 +02:00
75cd270e5e Cleanup return_text and return_full_text options in TextGenerationPipeline (#33542)
* Cleanup return_text and return_full_text options in TextGenerationPipeline

* Cleanup return_text and return_full_text options in TextGenerationPipeline

* Cleanup return_text and return_full_text options in TextGenerationPipeline

* Cleanup return_text and return_full_text options in TextGenerationPipeline

* Revert pipeline code, but update docs instead

* Restore pipeline test
2024-09-27 15:01:31 +01:00
0d09c44bd4 remove warning v2 (#33761) 2024-09-27 14:54:28 +02:00
4196590aa0 Bump torch from 1.13.1 to 2.2.0 in /examples/flax/vision (#33748)
Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1 to 2.2.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v1.13.1...v2.2.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-27 13:24:11 +02:00
9d200cfbee Add gguf support for bloom (#33473)
* add bloom arch support for gguf

* apply format

* small refactoring, bug fix in GGUF_TENSOR_MAPPING naming

* optimize bloom GGUF_TENSOR_MAPPING

* implement reverse reshaping for bloom gguf

* add qkv weights test

* add q_8 test for bloom
2024-09-27 12:13:40 +02:00
3e039d3827 Paligemma support for multi-image (#33447)
* upadte

* Update src/transformers/models/paligemma/processing_paligemma.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* update docs

* better example in tests

* support image tokens

* read token

* Update tests/models/paligemma/test_processing_paligemma.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* nit: naming

* Update docs/source/en/model_doc/paligemma.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* conflicts after rebasing

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-09-27 11:23:14 +02:00
55b7a0404e Make siglip examples clearer and error free (#33667)
Update siglip.md

This was already partially fixed relative to the deployed docs. But the partial fix made it inconsistent. Additionally, giving the full text ("This is a photo of...") is likely not the desired output.
2024-09-27 10:33:55 +02:00
7f9a9ca1e0 [MllamaImageProcessing] Update doc (#33747)
* update docstring

* style
2024-09-27 10:27:11 +02:00
5f4420587a [clean_up_tokenization_spaces] Pl bart was failing, updating (#33735)
`clean_up_tokenization_spaces=True` for pl bart
2024-09-27 10:26:51 +02:00
294477aafb Doc and config mismatch for DeBERTa (#33713)
* Update modeling_deberta_v2.py

* Update configuration_deberta.py

* Revert "Update modeling_deberta_v2.py"

* Revert "Update configuration_deberta.py"

* fix the config doc mismatch

---------

Co-authored-by: Fedor Krasnov <fedor.krasnov@gmail.com>
2024-09-27 10:19:46 +02:00
4f29a60bee Update Albumentations Versions (#33704)
update albumentations versions
2024-09-27 10:13:30 +02:00
1ec7a70fef fix trainer tr_loss add error (#33651) 2024-09-27 10:10:03 +02:00
e1b150862e Fix modular model converter unable to generate Processor classes (#33737)
fix: fix wrong file type for processor in `modular_model_converter.py`
2024-09-27 00:00:39 +02:00
e32521bf24 fix: add docstring for image_size in Convnextv2 config (#33734)
add docstring for image_size
2024-09-26 13:56:06 -07:00
6730485b02 clean_up_tokenization_spaces=False if unset (#31938)
* clean_up_tokenization_spaces=False if unset

* deprecate warning

* updating param for old models

* update models

* make fix-copies

* fix-copies and update bert models

* warning msg

* update prophet and clvp

* updating test since space before is arbitrarily removed

* remove warning for 4.45
2024-09-26 19:38:20 +02:00
3557f9a14a Generate: can_generate() recursive check (#33718)
* add recursive check and test warnings

* missing space

* models without can_generate
2024-09-26 18:11:14 +01:00
9f97c39384 Fix position embeddings singular/plural (#33678)
* fix position embeddings

* [run-slow] blip, blip_2, instructblip, instructblipvideo

* fix init

* [run-slow] blip, blip_2, instructblip, instructblipvideo

* fix copies

* [run-slow] blip, blip_2, instructblip, instructblipvideo

* [run-slow] blip, blip_2, instructblip, instructblipvideo

* handle exception where list + tensors are cat'd

* [run-slow] blip, blip_2, instructblip, instructblipvideo

* add missing default

* [run-slow] blip, blip_2, instructblip, instructblipvideo
2024-09-26 19:07:00 +02:00
77b47e6645 Fix docs and docstrings Omdet-Turbo (#33726)
Fix weights path in docs
2024-09-26 12:18:23 -04:00
c716fc0e48 fix: use correct var names for check_tokenizers script (#33702) 2024-09-26 17:24:46 +02:00
46841d3eb2 [MllamaProcessor] Update errors and API with multiple image (#33715)
* update error

* update and add a test

* update

* update
2024-09-26 16:33:25 +02:00
0a21381ba3 Uniformize kwargs for chameleon processor (#32181)
* uniformize kwargs of Chameleon

* fix linter nit

* rm stride default

* add tests for chameleon processor

* fix tests

* add comment on get_component

* rm Chameleon's slow tokenizer

* add check order images text + nit

* update docs and tests

* Fix LlamaTokenizer tests

* fix gated repo access

* fix wrong import

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2024-09-26 10:18:07 -04:00
f2c388e3f9 Add Idefics 3! (#32473)
* Add Idefics 3!

* fixes to make both pipelines identical

* fix for quantized models

* First pass at the review

* remove vocab size from the main config (it's still in the text_config)

* hot fix for merve

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* re-add model_type for text_config

* remove support for old_cache

* remove hidden_size from main config

* rename idefics3 HF repo

* few changes suggested in the PR

* fix to input_data_format computation

* remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion

* improve example

* few improvements from amy's review

* big change to enable processing input images as numpy arrays

* Changes to the code to uniformize processor kwargs

* image processing tests

* image processing tests fixes and some bugs they discovered

* addressed review comments from Yoni

* fix modeling tests

* remove special tokens that are not special

* fixes tests

* skip failing tests - they also fail for idefics2

* added paper and readded the tests with multi gpu, who knows

* Update docs/source/en/model_doc/idefics3.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* review amy until image_processing_idefics3

* last comments from Amy

* review amy

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/modeling_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/idefics3.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* doc improvement - amy review

* fix runtime error during fine-tuning

* amy's review

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/idefics3/modeling_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* ruff

* amy's comment on the order

* ruff ruff

* fix copies

* square images when they are not splitted

* ruff :(

* Update src/transformers/models/idefics3/image_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/idefics3/test_processing_idefics3.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix small bug introduced in refactor

* amy's image processing changes

* fixes peft tests and ruff

* modify to_pil_image from transformers. and review from emanuele.

* add modified to_pil_image

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-09-25 21:28:49 +02:00
f0eabf6c7d Dev release 2024-09-25 20:14:35 +02:00
a55adee890 adding positional encoder changes and tests (#32600)
* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding right pretrained model

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_image_segmentation

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding ruff formatting

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding new interpolate_pos_encoding function

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing interpolate_POS funciton

* adapting output tensor in teests

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* modifying output tensor

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding the correct tensor

* [run_slow]  clipseg

* fixing spaces

* [run_slow]  clipseg

* [run_slow]  clipseg

---------

Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
2024-09-25 19:05:01 +01:00
5581 changed files with 727528 additions and 687377 deletions

View File

@ -7,12 +7,25 @@ parameters:
nightly:
type: boolean
default: false
GHA_Actor:
type: string
default: ""
GHA_Action:
type: string
default: ""
GHA_Event:
type: string
default: ""
GHA_Meta:
type: string
default: ""
jobs:
# Ensure running with CircleCI/huggingface
check_circleci_user:
docker:
- image: python:3.10-slim
resource_class: small
parallelism: 1
steps:
- run: echo $CIRCLE_PROJECT_USERNAME
@ -57,15 +70,15 @@ jobs:
- run:
name: "Prepare pipeline parameters"
command: |
python utils/process_test_artifacts.py
python utils/process_test_artifacts.py
# To avoid too long generated_config.yaml on the continuation orb, we pass the links to the artifacts as parameters.
# Otherwise the list of tests was just too big. Explicit is good but for that it was a limitation.
# We used:
# https://circleci.com/docs/api/v2/index.html#operation/getJobArtifacts : to get the job artifacts
# We could not pass a nested dict, which is why we create the test_file_... parameters for every single job
- store_artifacts:
path: test_preparation/transformed_artifacts.json
- store_artifacts:
@ -99,8 +112,6 @@ jobs:
- run:
name: "Retrieve Artifact Paths"
env:
CIRCLE_TOKEN: ${{ secrets.CI_ARTIFACT_TOKEN }}
command: |
project_slug="gh/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}"
job_number=${CIRCLE_BUILD_NUM}
@ -109,7 +120,7 @@ jobs:
- run:
name: "Prepare pipeline parameters"
command: |
python utils/process_test_artifacts.py
python utils/process_test_artifacts.py
# To avoid too long generated_config.yaml on the continuation orb, we pass the links to the artifacts as parameters.
# Otherwise the list of tests was just too big. Explicit is good but for that it was a limitation.
@ -145,7 +156,7 @@ jobs:
path: ~/transformers/installed.txt
- run: python -c "from transformers import *" || (echo '🚨 import failed, this means you introduced unprotected imports! 🚨'; exit 1)
- run: ruff check examples tests src utils
- run: ruff format tests src utils --check
- run: ruff format examples tests src utils --check
- run: python utils/custom_init_isort.py --check_only
- run: python utils/sort_auto_mappings.py --check_only
- run: python utils/check_doc_toc.py
@ -170,23 +181,34 @@ jobs:
path: ~/transformers/installed.txt
- run: python utils/check_copies.py
- run: python utils/check_modular_conversion.py
- run: python utils/check_table.py
- run: python utils/check_dummies.py
- run: python utils/check_repo.py
- run: python utils/check_inits.py
- run: python utils/check_pipeline_typing.py
- run: python utils/check_config_docstrings.py
- run: python utils/check_config_attributes.py
- run: python utils/check_doctest_list.py
- run: make deps_table_check_updated
- run: python utils/update_metadata.py --check-only
- run: python utils/check_docstrings.py
- run: python utils/check_support_list.py
workflows:
version: 2
setup_and_quality:
when:
not: <<pipeline.parameters.nightly>>
and:
- equal: [<<pipeline.project.git_url>>, https://github.com/huggingface/transformers]
- not: <<pipeline.parameters.nightly>>
jobs:
- check_circleci_user
- check_code_quality
- check_repository_consistency
- fetch_tests
setup_and_quality_2:
when:
not:
equal: [<<pipeline.project.git_url>>, https://github.com/huggingface/transformers]
jobs:
- check_circleci_user
- check_code_quality

View File

@ -16,10 +16,9 @@
import argparse
import copy
import os
import random
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
import glob
from typing import Any, Optional
import yaml
@ -28,36 +27,70 @@ COMMON_ENV_VARIABLES = {
"TRANSFORMERS_IS_CI": True,
"PYTEST_TIMEOUT": 120,
"RUN_PIPELINE_TESTS": False,
"RUN_PT_TF_CROSS_TESTS": False,
"RUN_PT_FLAX_CROSS_TESTS": False,
# will be adjust in `CircleCIJob.to_dict`.
"RUN_FLAKY": True,
"DISABLE_SAFETENSORS_CONVERSION": True,
}
# Disable the use of {"s": None} as the output is way too long, causing the navigation on CircleCI impractical
COMMON_PYTEST_OPTIONS = {"max-worker-restart": 0, "dist": "loadfile", "vvv": None, "rsf":None}
COMMON_PYTEST_OPTIONS = {"max-worker-restart": 0, "vvv": None, "rsfE":None}
DEFAULT_DOCKER_IMAGE = [{"image": "cimg/python:3.8.12"}]
# Strings that commonly appear in the output of flaky tests when they fail. These are used with `pytest-rerunfailures`
# to rerun the tests that match these patterns.
FLAKY_TEST_FAILURE_PATTERNS = [
"OSError", # Machine/connection transient error
"Timeout", # Machine/connection transient error
"ConnectionError", # Connection transient error
"FileNotFoundError", # Raised by `datasets` on Hub failures
"PIL.UnidentifiedImageError", # Raised by `PIL.Image.open` on connection issues
"HTTPError", # Also catches HfHubHTTPError
"AssertionError: Tensor-likes are not close!", # `torch.testing.assert_close`, we might have unlucky random values
# TODO: error downloading tokenizer's `merged.txt` from hub can cause all the exceptions below. Throw and handle
# them under a single message.
"TypeError: expected str, bytes or os.PathLike object, not NoneType",
"TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType",
"Converting from Tiktoken failed",
"KeyError: <class ",
"TypeError: not a string",
]
class EmptyJob:
job_name = "empty"
def to_dict(self):
steps = [{"run": 'ls -la'}]
if self.job_name == "collection_job":
steps.extend(
[
"checkout",
{"run": "pip install requests || true"},
{"run": """while [[ $(curl --location --request GET "https://circleci.com/api/v2/workflow/$CIRCLE_WORKFLOW_ID/job" --header "Circle-Token: $CCI_TOKEN"| jq -r '.items[]|select(.name != "collection_job")|.status' | grep -c "running") -gt 0 ]]; do sleep 5; done || true"""},
{"run": 'python utils/process_circleci_workflow_test_reports.py --workflow_id $CIRCLE_WORKFLOW_ID || true'},
{"store_artifacts": {"path": "outputs"}},
{"run": 'echo "All required jobs have now completed"'},
]
)
return {
"docker": copy.deepcopy(DEFAULT_DOCKER_IMAGE),
"steps":["checkout"],
"resource_class": "small",
"steps": steps,
}
@dataclass
class CircleCIJob:
name: str
additional_env: Dict[str, Any] = None
docker_image: List[Dict[str, str]] = None
install_steps: List[str] = None
additional_env: dict[str, Any] = None
docker_image: list[dict[str, str]] = None
install_steps: list[str] = None
marker: Optional[str] = None
parallelism: Optional[int] = 0
pytest_num_workers: int = 12
pytest_options: Dict[str, Any] = None
resource_class: Optional[str] = "2xlarge"
tests_to_run: Optional[List[str]] = None
pytest_num_workers: int = 8
pytest_options: dict[str, Any] = None
resource_class: Optional[str] = "xlarge"
tests_to_run: Optional[list[str]] = None
num_test_files_per_worker: Optional[int] = 10
# This should be only used for doctest job!
command_timeout: Optional[int] = None
@ -76,7 +109,9 @@ class CircleCIJob:
self.docker_image[0]["image"] = f"{self.docker_image[0]['image']}:dev"
print(f"Using {self.docker_image} docker image")
if self.install_steps is None:
self.install_steps = ["uv venv && uv pip install ."]
self.install_steps = ["uv pip install ."]
# Use a custom patched pytest to force exit the process at the end, to avoid `Too long with no output (exceeded 10m0s): context deadline exceeded`
self.install_steps.append("uv pip install git+https://github.com/ydshieh/pytest.git@8.4.1-ydshieh")
if self.pytest_options is None:
self.pytest_options = {}
if isinstance(self.tests_to_run, str):
@ -95,6 +130,14 @@ class CircleCIJob:
def to_dict(self):
env = COMMON_ENV_VARIABLES.copy()
if self.job_name != "tests_hub":
# fmt: off
# not critical
env.update({"HF_TOKEN": "".join(["h", "f", "_", "H", "o", "d", "V", "u", "M", "q", "b", "R", "m", "t", "b", "z", "F", "Q", "O", "Q", "A", "J", "G", "D", "l", "V", "Q", "r", "R", "N", "w", "D", "M", "V", "C", "s", "d"])})
# fmt: on
# Do not run tests decorated by @is_flaky on pull requests
env['RUN_FLAKY'] = os.environ.get("CIRCLE_PULL_REQUEST", "") == ""
env.update(self.additional_env)
job = {
@ -112,7 +155,9 @@ class CircleCIJob:
# Examples special case: we need to download NLTK files in advance to avoid cuncurrency issues
timeout_cmd = f"timeout {self.command_timeout} " if self.command_timeout else ""
marker_cmd = f"-m '{self.marker}'" if self.marker is not None else ""
additional_flags = f" -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
junit_flags = " -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
joined_flaky_patterns = "|".join(FLAKY_TEST_FAILURE_PATTERNS)
repeat_on_failure_flags = f"--reruns 5 --reruns-delay 2 --only-rerun '({joined_flaky_patterns})'"
parallel = f' << pipeline.parameters.{self.job_name}_parallelism >> '
steps = [
"checkout",
@ -133,18 +178,38 @@ class CircleCIJob:
"command": """dpkg-query --show --showformat='${Installed-Size}\t${Package}\n' | sort -rh | head -25 | sort -h | awk '{ package=$2; sub(".*/", "", package); printf("%.5f GB %s\n", $1/1024/1024, package)}' || true"""}
},
{"run": {"name": "Create `test-results` directory", "command": "mkdir test-results"}},
{"run": {"name": "Get files to test", "command":f'curl -L -o {self.job_name}_test_list.txt <<pipeline.parameters.{self.job_name}_test_list>>' if self.name != "pr_documentation_tests" else 'echo "Skipped"'}},
{"run": {"name": "Get files to test", "command":f'curl -L -o {self.job_name}_test_list.txt <<pipeline.parameters.{self.job_name}_test_list>> --header "Circle-Token: $CIRCLE_TOKEN"' if self.name != "pr_documentation_tests" else 'echo "Skipped"'}},
{"run": {"name": "Split tests across parallel nodes: show current parallel tests",
"command": f"TESTS=$(circleci tests split --split-by=timings {self.job_name}_test_list.txt) && echo $TESTS > splitted_tests.txt && echo $TESTS | tr ' ' '\n'" if self.parallelism else f"awk '{{printf \"%s \", $0}}' {self.job_name}_test_list.txt > splitted_tests.txt"
}
},
# During the CircleCI docker images build time, we might already (or not) download the data.
# If it's done already, the files are inside the directory `/test_data/`.
{"run": {"name": "fetch hub objects before pytest", "command": "cp -r /test_data/* . 2>/dev/null || true; python3 utils/fetch_hub_objects_for_ci.py"}},
{"run": {"name": "download and unzip hub cache", "command": 'curl -L -o huggingface-cache.tar.gz https://huggingface.co/datasets/hf-internal-testing/hf_hub_cache/resolve/main/huggingface-cache.tar.gz && apt-get install pigz && tar --use-compress-program="pigz -d -p 8" -xf huggingface-cache.tar.gz && mv -n hub/* /root/.cache/huggingface/hub/ && ls -la /root/.cache/huggingface/hub/'}},
{"run": {
"name": "Run tests",
"command": f"({timeout_cmd} python3 -m pytest {marker_cmd} -n {self.pytest_num_workers} {additional_flags} {' '.join(pytest_flags)} $(cat splitted_tests.txt) | tee tests_output.txt)"}
"command": f"({timeout_cmd} python3 -m pytest {marker_cmd} -n {self.pytest_num_workers} {junit_flags} {repeat_on_failure_flags} {' '.join(pytest_flags)} $(cat splitted_tests.txt) | tee tests_output.txt)"}
},
{"run": {"name": "Expand to show skipped tests", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}},
{"run": {"name": "Failed tests: show reasons", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}},
{"run": {"name": "Errors", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}},
{"run":
{
"name": "Check for test crashes",
"when": "always",
"command": """if [ ! -f tests_output.txt ]; then
echo "ERROR: tests_output.txt does not exist - tests may not have run properly"
exit 1
elif grep -q "crashed and worker restarting disabled" tests_output.txt; then
echo "ERROR: Worker crash detected in test output"
echo "Found: crashed and worker restarting disabled"
exit 1
else
echo "Tests output file exists and no worker crashes detected"
fi"""
},
},
{"run": {"name": "Expand to show skipped tests", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}},
{"run": {"name": "Failed tests: show reasons", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}},
{"run": {"name": "Errors", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}},
{"store_test_results": {"path": "test-results"}},
{"store_artifacts": {"path": "test-results/junit.xml"}},
{"store_artifacts": {"path": "reports"}},
@ -163,147 +228,79 @@ class CircleCIJob:
# JOBS
torch_and_tf_job = CircleCIJob(
"torch_and_tf",
docker_image=[{"image":"huggingface/transformers-torch-tf-light"}],
additional_env={"RUN_PT_TF_CROSS_TESTS": True},
marker="is_pt_tf_cross_test",
pytest_options={"rA": None, "durations": 0},
)
torch_and_flax_job = CircleCIJob(
"torch_and_flax",
additional_env={"RUN_PT_FLAX_CROSS_TESTS": True},
docker_image=[{"image":"huggingface/transformers-torch-jax-light"}],
marker="is_pt_flax_cross_test",
pytest_options={"rA": None, "durations": 0},
)
torch_job = CircleCIJob(
"torch",
docker_image=[{"image": "huggingface/transformers-torch-light"}],
marker="not generate",
parallelism=6,
pytest_num_workers=8
)
generate_job = CircleCIJob(
"generate",
docker_image=[{"image": "huggingface/transformers-torch-light"}],
# networkx==3.3 (after #36957) cause some issues
# TODO: remove this once it works directly
install_steps=["uv pip install ."],
marker="generate",
parallelism=6,
pytest_num_workers=8
)
tokenization_job = CircleCIJob(
"tokenization",
docker_image=[{"image": "huggingface/transformers-torch-light"}],
parallelism=8,
pytest_num_workers=16
)
processor_job = CircleCIJob(
"processors",
docker_image=[{"image": "huggingface/transformers-torch-light"}],
parallelism=8,
pytest_num_workers=6
)
tf_job = CircleCIJob(
"tf",
docker_image=[{"image":"huggingface/transformers-tf-light"}],
parallelism=6,
pytest_num_workers=16,
)
flax_job = CircleCIJob(
"flax",
docker_image=[{"image":"huggingface/transformers-jax-light"}],
parallelism=6,
pytest_num_workers=16
)
pipelines_torch_job = CircleCIJob(
"pipelines_torch",
additional_env={"RUN_PIPELINE_TESTS": True},
docker_image=[{"image":"huggingface/transformers-torch-light"}],
marker="is_pipeline_test",
parallelism=4
parallelism=4,
)
pipelines_tf_job = CircleCIJob(
"pipelines_tf",
additional_env={"RUN_PIPELINE_TESTS": True},
docker_image=[{"image":"huggingface/transformers-tf-light"}],
marker="is_pipeline_test",
parallelism=4
)
custom_tokenizers_job = CircleCIJob(
"custom_tokenizers",
additional_env={"RUN_CUSTOM_TOKENIZERS": True},
docker_image=[{"image": "huggingface/transformers-custom-tokenizers"}],
)
examples_torch_job = CircleCIJob(
"examples_torch",
additional_env={"OMP_NUM_THREADS": 8},
docker_image=[{"image":"huggingface/transformers-examples-torch"}],
# TODO @ArthurZucker remove this once docker is easier to build
install_steps=["uv venv && uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],
pytest_num_workers=8,
install_steps=["uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],
pytest_num_workers=4,
)
examples_tensorflow_job = CircleCIJob(
"examples_tensorflow",
additional_env={"OMP_NUM_THREADS": 8},
docker_image=[{"image":"huggingface/transformers-examples-tf"}],
pytest_num_workers=16,
)
hub_job = CircleCIJob(
"hub",
additional_env={"HUGGINGFACE_CO_STAGING": True},
docker_image=[{"image":"huggingface/transformers-torch-light"}],
install_steps=[
'uv venv && uv pip install .',
'uv pip install .',
'git config --global user.email "ci@dummy.com"',
'git config --global user.name "ci"',
],
marker="is_staging_test",
pytest_num_workers=2,
resource_class="medium",
)
onnx_job = CircleCIJob(
"onnx",
docker_image=[{"image":"huggingface/transformers-torch-tf-light"}],
install_steps=[
"uv venv",
"uv pip install .[torch,tf,testing,sentencepiece,onnxruntime,vision,rjieba]",
],
pytest_options={"k onnx": None},
pytest_num_workers=1,
)
exotic_models_job = CircleCIJob(
"exotic_models",
docker_image=[{"image":"huggingface/transformers-exotic-models"}],
pytest_num_workers=12,
parallelism=4,
pytest_options={"durations": 100},
)
repo_utils_job = CircleCIJob(
"repo_utils",
docker_image=[{"image":"huggingface/transformers-consistency"}],
@ -311,13 +308,14 @@ repo_utils_job = CircleCIJob(
resource_class="large",
)
non_model_job = CircleCIJob(
"non_model",
docker_image=[{"image": "huggingface/transformers-torch-light"}],
# networkx==3.3 (after #36957) cause some issues
# TODO: remove this once it works directly
install_steps=["uv pip install .[serving]"],
marker="not generate",
parallelism=6,
pytest_num_workers=8,
)
@ -333,7 +331,7 @@ doc_test_job = CircleCIJob(
additional_env={"TRANSFORMERS_VERBOSITY": "error", "DATASETS_VERBOSITY": "error", "SKIP_CUDA_DOCTEST": "1"},
install_steps=[
# Add an empty file to keep the test step running correctly even no file is selected to be tested.
"uv venv && pip install .",
"uv pip install .",
"touch dummy.py",
command,
"cat pr_documentation_tests_temp.txt",
@ -345,13 +343,14 @@ doc_test_job = CircleCIJob(
pytest_num_workers=1,
)
REGULAR_TESTS = [torch_and_tf_job, torch_and_flax_job, torch_job, tf_job, flax_job, hub_job, onnx_job, tokenization_job, processor_job, generate_job, non_model_job] # fmt: skip
EXAMPLES_TESTS = [examples_torch_job, examples_tensorflow_job]
PIPELINE_TESTS = [pipelines_torch_job, pipelines_tf_job]
REGULAR_TESTS = [torch_job, hub_job, tokenization_job, processor_job, generate_job, non_model_job] # fmt: skip
EXAMPLES_TESTS = [examples_torch_job]
PIPELINE_TESTS = [pipelines_torch_job]
REPO_UTIL_TESTS = [repo_utils_job]
DOC_TESTS = [doc_test_job]
ALL_TESTS = REGULAR_TESTS + EXAMPLES_TESTS + PIPELINE_TESTS + REPO_UTIL_TESTS + DOC_TESTS + [custom_tokenizers_job] + [exotic_models_job] # fmt: skip
def create_circleci_config(folder=None):
if folder is None:
folder = os.getcwd()
@ -361,19 +360,35 @@ def create_circleci_config(folder=None):
if len(jobs) == 0:
jobs = [EmptyJob()]
print("Full list of job name inputs", {j.job_name + "_test_list":{"type":"string", "default":''} for j in jobs})
else:
print("Full list of job name inputs", {j.job_name + "_test_list":{"type":"string", "default":''} for j in jobs})
# Add a job waiting all the test jobs and aggregate their test summary files at the end
collection_job = EmptyJob()
collection_job.job_name = "collection_job"
jobs = [collection_job] + jobs
config = {
"version": "2.1",
"parameters": {
# Only used to accept the parameters from the trigger
"nightly": {"type": "boolean", "default": False},
"tests_to_run": {"type": "string", "default": ''},
# Only used to accept the parameters from GitHub Actions trigger
"GHA_Actor": {"type": "string", "default": ""},
"GHA_Action": {"type": "string", "default": ""},
"GHA_Event": {"type": "string", "default": ""},
"GHA_Meta": {"type": "string", "default": ""},
"tests_to_run": {"type": "string", "default": ""},
**{j.job_name + "_test_list":{"type":"string", "default":''} for j in jobs},
**{j.job_name + "_parallelism":{"type":"integer", "default":1} for j in jobs},
},
"jobs" : {j.job_name: j.to_dict() for j in jobs},
"workflows": {"version": 2, "run_tests": {"jobs": [j.job_name for j in jobs]}}
"jobs": {j.job_name: j.to_dict() for j in jobs}
}
if "CIRCLE_TOKEN" in os.environ:
# For private forked repo. (e.g. new model addition)
config["workflows"] = {"version": 2, "run_tests": {"jobs": [{j.job_name: {"context": ["TRANSFORMERS_CONTEXT"]}} for j in jobs]}}
else:
# For public repo. (e.g. `transformers`)
config["workflows"] = {"version": 2, "run_tests": {"jobs": [j.job_name for j in jobs]}}
with open(os.path.join(folder, "generated_config.yml"), "w") as f:
f.write(yaml.dump(config, sort_keys=False, default_flow_style=False).replace("' << pipeline", " << pipeline").replace(">> '", " >>"))

View File

@ -1,5 +1,6 @@
import re
import argparse
import re
def parse_pytest_output(file_path):
skipped_tests = {}

View File

@ -16,7 +16,7 @@ body:
id: system-info
attributes:
label: System Info
description: Please share your system info with us. You can run the command `transformers-cli env` and copy-paste its output below.
description: Please share your system info with us. You can run the command `transformers env` and copy-paste its output below.
placeholder: transformers version, platform, python version, ...
validations:
required: true
@ -36,26 +36,37 @@ body:
Models:
- text models: @ArthurZucker
- vision models: @amyeroberts, @qubvel
- speech models: @ylacombe, @eustlb
- text models: @ArthurZucker @Cyrilvallez
- vision models: @yonigozlan @molbap
- audio models: @eustlb @ebezzam @vasqu
- multimodal models: @zucchini-nlp
- graph models: @clefourrier
Library:
- flax: @sanchit-gandhi
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @muellerzr @SunMarc
- trainer: @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker
- CIs: @ydshieh
Integrations:
- deepspeed: HF Trainer/Accelerate: @muellerzr
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc
- quantization: @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
- peft: @BenjaminBossan @githubnemo
Devices/Backends:
- AMD ROCm: @ivarflakstad
- Intel XPU: @IlyasMoutawwakil
- Ascend NPU: @ivarflakstad
Documentation: @stevhliu
@ -63,19 +74,6 @@ body:
- for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- accelerate: [different repo](https://github.com/huggingface/accelerate)
- datasets: [different repo](https://github.com/huggingface/datasets)
- diffusers: [different repo](https://github.com/huggingface/diffusers)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Maintained examples (not research project or legacy):
- Flax: @sanchit-gandhi
- PyTorch: See Models above and tag the person corresponding to the modality of the example.
- TensorFlow: @Rocketknight1
Research projects are not maintained and should be taken as is.
placeholder: "@Username ..."
@ -106,6 +104,7 @@ body:
label: Reproduction
description: |
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
Please include relevant config information with your code, for example your Trainers, TRL, Peft, and DeepSpeed configs.
If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.

View File

@ -23,7 +23,7 @@ Some notes:
* Please translate in a gender-neutral way.
* Add your translations to the folder called `<languageCode>` inside the [source folder](https://github.com/huggingface/transformers/tree/main/docs/source).
* Register your translation in `<languageCode>/_toctree.yml`; please follow the order of the [English version](https://github.com/huggingface/transformers/blob/main/docs/source/en/_toctree.yml).
* Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue. Please ping @stevhliu and @MKhalusova for review.
* Once you're finished, open a pull request and tag this issue by including #issue-number in the description, where issue-number is the number of this issue. Please ping @stevhliu for review.
* 🙋 If you'd like others to help you with the translation, you can also post in the 🤗 [forums](https://discuss.huggingface.co/).
## Get Started section

View File

@ -6,7 +6,7 @@ body:
id: system-info
attributes:
label: System Info
description: Please share your system info with us. You can run the command `transformers-cli env` and copy-paste its output below.
description: Please share your system info with us. You can run the command `transformers env` and copy-paste its output below.
render: shell
placeholder: transformers version, platform, python version, ...
validations:

View File

@ -39,41 +39,40 @@ members/contributors who may be interested in your PR.
Models:
- text models: @ArthurZucker
- vision models: @amyeroberts, @qubvel
- speech models: @ylacombe, @eustlb
- text models: @ArthurZucker @Cyrilvallez
- vision models: @yonigozlan @molbap
- audio models: @eustlb @ebezzam @vasqu
- multimodal models: @zucchini-nlp
- graph models: @clefourrier
Library:
- flax: @sanchit-gandhi
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker
- trainer: @muellerzr and @SunMarc
- chat templates: @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker
- CIs: @ydshieh
Integrations:
- deepspeed: HF Trainer/Accelerate: @muellerzr
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc
- quantization: @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
- peft: @BenjaminBossan @githubnemo
Devices/Backends:
- AMD ROCm: @ivarflakstad
- Intel XPU: @IlyasMoutawwakil
- Ascend NPU: @ivarflakstad
Documentation: @stevhliu
HF projects:
- accelerate: [different repo](https://github.com/huggingface/accelerate)
- datasets: [different repo](https://github.com/huggingface/datasets)
- diffusers: [different repo](https://github.com/huggingface/diffusers)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Maintained examples (not research project or legacy):
- Flax: @sanchit-gandhi
- PyTorch: See Models above and tag the person corresponding to the modality of the example.
- TensorFlow: @Rocketknight1
Research projects are not maintained and should be taken as is.
-->

39
.github/copilot-instructions.md vendored Normal file
View File

@ -0,0 +1,39 @@
# copilot-instructions.md Guide for Hugging Face Transformers
This copilot-instructions.md file provides guidance for code agents working with this codebase.
## Core Project Structure
- `/src/transformers`: This contains the core source code for the library
- `/models`: Code for individual models. Models inherit from base classes in the root `/src/transformers` directory.
- `/tests`: This contains the core test classes for the library. These are usually inherited rather than directly run.
- `/models`: Tests for individual models. Model tests inherit from common tests in the root `/tests` directory.
- `/docs`: This contains the documentation for the library, including guides, tutorials, and API references.
## Coding Conventions for Hugging Face Transformers
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
Many models in the codebase have similar code, but it is not shared by inheritance because we want each model file to be self-contained.
We use two mechanisms to keep this code in sync:
- "Copied from" syntax. Functions or entire classes can have a comment at the top like this: `# Copied from transformers.models.llama.modeling_llama.rotate_half` or `# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->MT5`
These comments are actively checked by the style tools, and copies will automatically be updated when the base code is updated. If you need to update a copied function, you should
either update the base function and use `make fixup` to propagate the change to all copies, or simply remove the `# Copied from` comment if that is inappropriate.
- "Modular" files. These files briefly define models by composing them using inheritance from other models. They are not meant to be used directly. Instead, the style tools
automatically generate a complete modeling file, like `modeling_bert.py`, from the modular file like `modular_bert.py`. If a model has a modular file, the modeling file
should never be edited directly! Instead, changes should be made in the modular file, and then you should run `make fixup` to update the modeling file automatically.
When adding new models, you should prefer `modular` style and inherit as many classes as possible from existing models.
## Testing
After making changes, you should usually run `make fixup` to ensure any copies and modular files are updated, and then test all affected models. This includes both
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.

122
.github/scripts/assign_reviewers.py vendored Normal file
View File

@ -0,0 +1,122 @@
# coding=utf-8
# Copyright 2025 the HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import os
import re
from collections import Counter
from pathlib import Path
import github
from github import Github
def pattern_to_regex(pattern):
if pattern.startswith("/"):
start_anchor = True
pattern = re.escape(pattern[1:])
else:
start_anchor = False
pattern = re.escape(pattern)
# Replace `*` with "any number of non-slash characters"
pattern = pattern.replace(r"\*", "[^/]*")
if start_anchor:
pattern = r"^\/?" + pattern # Allow an optional leading slash after the start of the string
return pattern
def get_file_owners(file_path, codeowners_lines):
# Process lines in reverse (last matching pattern takes precedence)
for line in reversed(codeowners_lines):
# Skip comments and empty lines, strip inline comments
line = line.split('#')[0].strip()
if not line:
continue
# Split into pattern and owners
parts = line.split()
pattern = parts[0]
# Can be empty, e.g. for dummy files with explicitly no owner!
owners = [owner.removeprefix("@") for owner in parts[1:]]
# Check if file matches pattern
file_regex = pattern_to_regex(pattern)
if re.search(file_regex, file_path) is not None:
return owners # Remember, can still be empty!
return [] # Should never happen, but just in case
def pr_author_is_in_hf(pr_author, codeowners_lines):
# Check if the PR author is in the codeowners file
for line in codeowners_lines:
line = line.split('#')[0].strip()
if not line:
continue
# Split into pattern and owners
parts = line.split()
owners = [owner.removeprefix("@") for owner in parts[1:]]
if pr_author in owners:
return True
return False
def main():
script_dir = Path(__file__).parent.absolute()
with open(script_dir / "codeowners_for_review_action") as f:
codeowners_lines = f.readlines()
g = Github(os.environ['GITHUB_TOKEN'])
repo = g.get_repo("huggingface/transformers")
with open(os.environ['GITHUB_EVENT_PATH']) as f:
event = json.load(f)
# The PR number is available in the event payload
pr_number = event['pull_request']['number']
pr = repo.get_pull(pr_number)
pr_author = pr.user.login
if pr_author_is_in_hf(pr_author, codeowners_lines):
print(f"PR author {pr_author} is in codeowners, skipping review request.")
return
existing_reviews = list(pr.get_reviews())
if existing_reviews:
print(f"Already has reviews: {[r.user.login for r in existing_reviews]}")
return
users_requested, teams_requested = pr.get_review_requests()
users_requested = list(users_requested)
if users_requested:
print(f"Reviewers already requested: {users_requested}")
return
locs_per_owner = Counter()
for file in pr.get_files():
owners = get_file_owners(file.filename, codeowners_lines)
for owner in owners:
locs_per_owner[owner] += file.changes
# Assign the top 2 based on locs changed as reviewers, but skip the owner if present
locs_per_owner.pop(pr_author, None)
top_owners = locs_per_owner.most_common(2)
print("Top owners", top_owners)
top_owners = [owner[0] for owner in top_owners]
try:
pr.create_review_request(top_owners)
except github.GithubException as e:
print(f"Failed to request review for {top_owners}: {e}")
if __name__ == "__main__":
main()

View File

@ -0,0 +1,370 @@
# Top-level rules are matched only if nothing else matches
* @Rocketknight1 @ArthurZucker # if no one is pinged based on the other rules, he will do the dispatch
*.md @stevhliu
*tokenization* @ArthurZucker
docs/ @stevhliu
/benchmark/ @McPatate
/docker/ @ydshieh @ArthurZucker
# More high-level globs catch cases when specific rules later don't apply
/src/transformers/models/*/processing* @molbap @yonigozlan
/src/transformers/models/*/image_processing* @yonigozlan
/src/transformers/models/*/image_processing_*_fast* @yonigozlan
# Owners of subsections of the library
/src/transformers/generation/ @gante
/src/transformers/pipeline/ @Rocketknight1 @yonigozlan
/src/transformers/integrations/ @SunMarc @MekkCyber @zach-huggingface
/src/transformers/quantizers/ @SunMarc @MekkCyber
tests/ @ydshieh
tests/generation/ @gante
/src/transformers/models/auto/ @ArthurZucker
/src/transformers/utils/ @ArthurZucker @Rocketknight1
/src/transformers/loss/ @ArthurZucker
/src/transformers/onnx/ @michaelbenayoun
# Specific files come after the sections/globs, so they take priority
/.circleci/config.yml @ArthurZucker @ydshieh
/utils/tests_fetcher.py @ydshieh
trainer.py @zach-huggingface @SunMarc
trainer_utils.py @zach-huggingface @SunMarc
/utils/modular_model_converter.py @Cyrilvallez @ArthurZucker
# Owners of individual models are specific / high priority, and so they come last
# mod* captures modeling and modular files
# Text models
/src/transformers/models/albert/mod*_albert* @ArthurZucker
/src/transformers/models/bamba/mod*_bamba* @ArthurZucker
/src/transformers/models/bart/mod*_bart* @ArthurZucker
/src/transformers/models/barthez/mod*_barthez* @ArthurZucker
/src/transformers/models/bartpho/mod*_bartpho* @ArthurZucker
/src/transformers/models/bert/mod*_bert* @ArthurZucker
/src/transformers/models/bert_generation/mod*_bert_generation* @ArthurZucker
/src/transformers/models/bert_japanese/mod*_bert_japanese* @ArthurZucker
/src/transformers/models/bertweet/mod*_bertweet* @ArthurZucker
/src/transformers/models/big_bird/mod*_big_bird* @ArthurZucker
/src/transformers/models/bigbird_pegasus/mod*_bigbird_pegasus* @ArthurZucker
/src/transformers/models/biogpt/mod*_biogpt* @ArthurZucker
/src/transformers/models/blenderbot/mod*_blenderbot* @ArthurZucker
/src/transformers/models/blenderbot_small/mod*_blenderbot_small* @ArthurZucker
/src/transformers/models/bloom/mod*_bloom* @ArthurZucker
/src/transformers/models/bort/mod*_bort* @ArthurZucker
/src/transformers/models/byt5/mod*_byt5* @ArthurZucker
/src/transformers/models/camembert/mod*_camembert* @ArthurZucker
/src/transformers/models/canine/mod*_canine* @ArthurZucker
/src/transformers/models/codegen/mod*_codegen* @ArthurZucker
/src/transformers/models/code_llama/mod*_code_llama* @ArthurZucker
/src/transformers/models/cohere/mod*_cohere* @ArthurZucker
/src/transformers/models/cohere2/mod*_cohere2* @ArthurZucker
/src/transformers/models/convbert/mod*_convbert* @ArthurZucker
/src/transformers/models/cpm/mod*_cpm* @ArthurZucker
/src/transformers/models/cpmant/mod*_cpmant* @ArthurZucker
/src/transformers/models/ctrl/mod*_ctrl* @ArthurZucker
/src/transformers/models/dbrx/mod*_dbrx* @ArthurZucker
/src/transformers/models/deberta/mod*_deberta* @ArthurZucker
/src/transformers/models/deberta_v2/mod*_deberta_v2* @ArthurZucker
/src/transformers/models/dialogpt/mod*_dialogpt* @ArthurZucker
/src/transformers/models/diffllama/mod*_diffllama* @ArthurZucker
/src/transformers/models/distilbert/mod*_distilbert* @ArthurZucker
/src/transformers/models/dpr/mod*_dpr* @ArthurZucker
/src/transformers/models/electra/mod*_electra* @ArthurZucker
/src/transformers/models/encoder_decoder/mod*_encoder_decoder* @ArthurZucker
/src/transformers/models/ernie/mod*_ernie* @ArthurZucker
/src/transformers/models/ernie_m/mod*_ernie_m* @ArthurZucker
/src/transformers/models/esm/mod*_esm* @ArthurZucker
/src/transformers/models/falcon/mod*_falcon* @ArthurZucker
/src/transformers/models/falcon3/mod*_falcon3* @ArthurZucker
/src/transformers/models/falcon_mamba/mod*_falcon_mamba* @ArthurZucker
/src/transformers/models/fastspeech2_conformer/mod*_fastspeech2_conformer* @ArthurZucker
/src/transformers/models/flan_t5/mod*_flan_t5* @ArthurZucker
/src/transformers/models/flan_ul2/mod*_flan_ul2* @ArthurZucker
/src/transformers/models/flaubert/mod*_flaubert* @ArthurZucker
/src/transformers/models/fnet/mod*_fnet* @ArthurZucker
/src/transformers/models/fsmt/mod*_fsmt* @ArthurZucker
/src/transformers/models/funnel/mod*_funnel* @ArthurZucker
/src/transformers/models/fuyu/mod*_fuyu* @ArthurZucker
/src/transformers/models/gemma/mod*_gemma* @ArthurZucker
/src/transformers/models/gemma2/mod*_gemma2* @ArthurZucker
/src/transformers/models/glm/mod*_glm* @ArthurZucker
/src/transformers/models/openai_gpt/mod*_openai_gpt* @ArthurZucker
/src/transformers/models/gpt_neo/mod*_gpt_neo* @ArthurZucker
/src/transformers/models/gpt_neox/mod*_gpt_neox* @ArthurZucker
/src/transformers/models/gpt_neox_japanese/mod*_gpt_neox_japanese* @ArthurZucker
/src/transformers/models/gptj/mod*_gptj* @ArthurZucker
/src/transformers/models/gpt2/mod*_gpt2* @ArthurZucker
/src/transformers/models/gpt_bigcode/mod*_gpt_bigcode* @ArthurZucker
/src/transformers/models/gptsan_japanese/mod*_gptsan_japanese* @ArthurZucker
/src/transformers/models/gpt_sw3/mod*_gpt_sw3* @ArthurZucker
/src/transformers/models/granite/mod*_granite* @ArthurZucker
/src/transformers/models/granitemoe/mod*_granitemoe* @ArthurZucker
/src/transformers/models/herbert/mod*_herbert* @ArthurZucker
/src/transformers/models/ibert/mod*_ibert* @ArthurZucker
/src/transformers/models/jamba/mod*_jamba* @ArthurZucker
/src/transformers/models/jetmoe/mod*_jetmoe* @ArthurZucker
/src/transformers/models/jukebox/mod*_jukebox* @ArthurZucker
/src/transformers/models/led/mod*_led* @ArthurZucker
/src/transformers/models/llama/mod*_llama* @ArthurZucker @Cyrilvallez
/src/transformers/models/longformer/mod*_longformer* @ArthurZucker
/src/transformers/models/longt5/mod*_longt5* @ArthurZucker
/src/transformers/models/luke/mod*_luke* @ArthurZucker
/src/transformers/models/m2m_100/mod*_m2m_100* @ArthurZucker
/src/transformers/models/madlad_400/mod*_madlad_400* @ArthurZucker
/src/transformers/models/mamba/mod*_mamba* @ArthurZucker
/src/transformers/models/mamba2/mod*_mamba2* @ArthurZucker
/src/transformers/models/marian/mod*_marian* @ArthurZucker
/src/transformers/models/markuplm/mod*_markuplm* @ArthurZucker
/src/transformers/models/mbart/mod*_mbart* @ArthurZucker
/src/transformers/models/mega/mod*_mega* @ArthurZucker
/src/transformers/models/megatron_bert/mod*_megatron_bert* @ArthurZucker
/src/transformers/models/megatron_gpt2/mod*_megatron_gpt2* @ArthurZucker
/src/transformers/models/mistral/mod*_mistral* @ArthurZucker
/src/transformers/models/mixtral/mod*_mixtral* @ArthurZucker
/src/transformers/models/mluke/mod*_mluke* @ArthurZucker
/src/transformers/models/mobilebert/mod*_mobilebert* @ArthurZucker
/src/transformers/models/modernbert/mod*_modernbert* @ArthurZucker
/src/transformers/models/mpnet/mod*_mpnet* @ArthurZucker
/src/transformers/models/mpt/mod*_mpt* @ArthurZucker
/src/transformers/models/mra/mod*_mra* @ArthurZucker
/src/transformers/models/mt5/mod*_mt5* @ArthurZucker
/src/transformers/models/mvp/mod*_mvp* @ArthurZucker
/src/transformers/models/myt5/mod*_myt5* @ArthurZucker
/src/transformers/models/nemotron/mod*_nemotron* @ArthurZucker
/src/transformers/models/nezha/mod*_nezha* @ArthurZucker
/src/transformers/models/nllb/mod*_nllb* @ArthurZucker
/src/transformers/models/nllb_moe/mod*_nllb_moe* @ArthurZucker
/src/transformers/models/nystromformer/mod*_nystromformer* @ArthurZucker
/src/transformers/models/olmo/mod*_olmo* @ArthurZucker
/src/transformers/models/olmo2/mod*_olmo2* @ArthurZucker
/src/transformers/models/olmoe/mod*_olmoe* @ArthurZucker
/src/transformers/models/open_llama/mod*_open_llama* @ArthurZucker
/src/transformers/models/opt/mod*_opt* @ArthurZucker
/src/transformers/models/pegasus/mod*_pegasus* @ArthurZucker
/src/transformers/models/pegasus_x/mod*_pegasus_x* @ArthurZucker
/src/transformers/models/persimmon/mod*_persimmon* @ArthurZucker
/src/transformers/models/phi/mod*_phi* @ArthurZucker
/src/transformers/models/phi3/mod*_phi3* @ArthurZucker
/src/transformers/models/phimoe/mod*_phimoe* @ArthurZucker
/src/transformers/models/phobert/mod*_phobert* @ArthurZucker
/src/transformers/models/plbart/mod*_plbart* @ArthurZucker
/src/transformers/models/prophetnet/mod*_prophetnet* @ArthurZucker
/src/transformers/models/qdqbert/mod*_qdqbert* @ArthurZucker
/src/transformers/models/qwen2/mod*_qwen2* @ArthurZucker
/src/transformers/models/qwen2_moe/mod*_qwen2_moe* @ArthurZucker
/src/transformers/models/rag/mod*_rag* @ArthurZucker
/src/transformers/models/realm/mod*_realm* @ArthurZucker
/src/transformers/models/recurrent_gemma/mod*_recurrent_gemma* @ArthurZucker
/src/transformers/models/reformer/mod*_reformer* @ArthurZucker
/src/transformers/models/rembert/mod*_rembert* @ArthurZucker
/src/transformers/models/retribert/mod*_retribert* @ArthurZucker
/src/transformers/models/roberta/mod*_roberta* @ArthurZucker
/src/transformers/models/roberta_prelayernorm/mod*_roberta_prelayernorm* @ArthurZucker
/src/transformers/models/roc_bert/mod*_roc_bert* @ArthurZucker
/src/transformers/models/roformer/mod*_roformer* @ArthurZucker
/src/transformers/models/rwkv/mod*_rwkv* @ArthurZucker
/src/transformers/models/splinter/mod*_splinter* @ArthurZucker
/src/transformers/models/squeezebert/mod*_squeezebert* @ArthurZucker
/src/transformers/models/stablelm/mod*_stablelm* @ArthurZucker
/src/transformers/models/starcoder2/mod*_starcoder2* @ArthurZucker
/src/transformers/models/switch_transformers/mod*_switch_transformers* @ArthurZucker
/src/transformers/models/t5/mod*_t5* @ArthurZucker
/src/transformers/models/t5v1.1/mod*_t5v1.1* @ArthurZucker
/src/transformers/models/tapex/mod*_tapex* @ArthurZucker
/src/transformers/models/transfo_xl/mod*_transfo_xl* @ArthurZucker
/src/transformers/models/ul2/mod*_ul2* @ArthurZucker
/src/transformers/models/umt5/mod*_umt5* @ArthurZucker
/src/transformers/models/xmod/mod*_xmod* @ArthurZucker
/src/transformers/models/xglm/mod*_xglm* @ArthurZucker
/src/transformers/models/xlm/mod*_xlm* @ArthurZucker
/src/transformers/models/xlm_prophetnet/mod*_xlm_prophetnet* @ArthurZucker
/src/transformers/models/xlm_roberta/mod*_xlm_roberta* @ArthurZucker
/src/transformers/models/xlm_roberta_xl/mod*_xlm_roberta_xl* @ArthurZucker
/src/transformers/models/xlm_v/mod*_xlm_v* @ArthurZucker
/src/transformers/models/xlnet/mod*_xlnet* @ArthurZucker
/src/transformers/models/yoso/mod*_yoso* @ArthurZucker
/src/transformers/models/zamba/mod*_zamba* @ArthurZucker
# Vision models
/src/transformers/models/beit/mod*_beit* @yonigozlan @molbap
/src/transformers/models/bit/mod*_bit* @yonigozlan @molbap
/src/transformers/models/conditional_detr/mod*_conditional_detr* @yonigozlan @molbap
/src/transformers/models/convnext/mod*_convnext* @yonigozlan @molbap
/src/transformers/models/convnextv2/mod*_convnextv2* @yonigozlan @molbap
/src/transformers/models/cvt/mod*_cvt* @yonigozlan @molbap
/src/transformers/models/deformable_detr/mod*_deformable_detr* @yonigozlan @molbap
/src/transformers/models/deit/mod*_deit* @yonigozlan @molbap
/src/transformers/models/depth_anything/mod*_depth_anything* @yonigozlan @molbap
/src/transformers/models/depth_anything_v2/mod*_depth_anything_v2* @yonigozlan @molbap
/src/transformers/models/deta/mod*_deta* @yonigozlan @molbap
/src/transformers/models/detr/mod*_detr* @yonigozlan @molbap
/src/transformers/models/dinat/mod*_dinat* @yonigozlan @molbap
/src/transformers/models/dinov2/mod*_dinov2* @yonigozlan @molbap
/src/transformers/models/dinov2_with_registers/mod*_dinov2_with_registers* @yonigozlan @molbap
/src/transformers/models/dit/mod*_dit* @yonigozlan @molbap
/src/transformers/models/dpt/mod*_dpt* @yonigozlan @molbap
/src/transformers/models/efficientformer/mod*_efficientformer* @yonigozlan @molbap
/src/transformers/models/efficientnet/mod*_efficientnet* @yonigozlan @molbap
/src/transformers/models/focalnet/mod*_focalnet* @yonigozlan @molbap
/src/transformers/models/glpn/mod*_glpn* @yonigozlan @molbap
/src/transformers/models/hiera/mod*_hiera* @yonigozlan @molbap
/src/transformers/models/ijepa/mod*_ijepa* @yonigozlan @molbap
/src/transformers/models/imagegpt/mod*_imagegpt* @yonigozlan @molbap
/src/transformers/models/levit/mod*_levit* @yonigozlan @molbap
/src/transformers/models/mask2former/mod*_mask2former* @yonigozlan @molbap
/src/transformers/models/maskformer/mod*_maskformer* @yonigozlan @molbap
/src/transformers/models/mobilenet_v1/mod*_mobilenet_v1* @yonigozlan @molbap
/src/transformers/models/mobilenet_v2/mod*_mobilenet_v2* @yonigozlan @molbap
/src/transformers/models/mobilevit/mod*_mobilevit* @yonigozlan @molbap
/src/transformers/models/mobilevitv2/mod*_mobilevitv2* @yonigozlan @molbap
/src/transformers/models/nat/mod*_nat* @yonigozlan @molbap
/src/transformers/models/poolformer/mod*_poolformer* @yonigozlan @molbap
/src/transformers/models/pvt/mod*_pvt* @yonigozlan @molbap
/src/transformers/models/pvt_v2/mod*_pvt_v2* @yonigozlan @molbap
/src/transformers/models/regnet/mod*_regnet* @yonigozlan @molbap
/src/transformers/models/resnet/mod*_resnet* @yonigozlan @molbap
/src/transformers/models/rt_detr/mod*_rt_detr* @yonigozlan @molbap
/src/transformers/models/segformer/mod*_segformer* @yonigozlan @molbap
/src/transformers/models/seggpt/mod*_seggpt* @yonigozlan @molbap
/src/transformers/models/superpoint/mod*_superpoint* @yonigozlan @molbap
/src/transformers/models/swiftformer/mod*_swiftformer* @yonigozlan @molbap
/src/transformers/models/swin/mod*_swin* @yonigozlan @molbap
/src/transformers/models/swinv2/mod*_swinv2* @yonigozlan @molbap
/src/transformers/models/swin2sr/mod*_swin2sr* @yonigozlan @molbap
/src/transformers/models/table_transformer/mod*_table_transformer* @yonigozlan @molbap
/src/transformers/models/textnet/mod*_textnet* @yonigozlan @molbap
/src/transformers/models/timm_wrapper/mod*_timm_wrapper* @yonigozlan @molbap
/src/transformers/models/upernet/mod*_upernet* @yonigozlan @molbap
/src/transformers/models/van/mod*_van* @yonigozlan @molbap
/src/transformers/models/vit/mod*_vit* @yonigozlan @molbap
/src/transformers/models/vit_hybrid/mod*_vit_hybrid* @yonigozlan @molbap
/src/transformers/models/vitdet/mod*_vitdet* @yonigozlan @molbap
/src/transformers/models/vit_mae/mod*_vit_mae* @yonigozlan @molbap
/src/transformers/models/vitmatte/mod*_vitmatte* @yonigozlan @molbap
/src/transformers/models/vit_msn/mod*_vit_msn* @yonigozlan @molbap
/src/transformers/models/vitpose/mod*_vitpose* @yonigozlan @molbap
/src/transformers/models/yolos/mod*_yolos* @yonigozlan @molbap
/src/transformers/models/zoedepth/mod*_zoedepth* @yonigozlan @molbap
# Audio models
/src/transformers/models/audio_spectrogram_transformer/mod*_audio_spectrogram_transformer* @eustlb
/src/transformers/models/bark/mod*_bark* @eustlb
/src/transformers/models/clap/mod*_clap* @eustlb
/src/transformers/models/dac/mod*_dac* @eustlb
/src/transformers/models/encodec/mod*_encodec* @eustlb
/src/transformers/models/hubert/mod*_hubert* @eustlb
/src/transformers/models/mctct/mod*_mctct* @eustlb
/src/transformers/models/mimi/mod*_mimi* @eustlb
/src/transformers/models/mms/mod*_mms* @eustlb
/src/transformers/models/moshi/mod*_moshi* @eustlb
/src/transformers/models/musicgen/mod*_musicgen* @eustlb
/src/transformers/models/musicgen_melody/mod*_musicgen_melody* @eustlb
/src/transformers/models/pop2piano/mod*_pop2piano* @eustlb
/src/transformers/models/seamless_m4t/mod*_seamless_m4t* @eustlb
/src/transformers/models/seamless_m4t_v2/mod*_seamless_m4t_v2* @eustlb
/src/transformers/models/sew/mod*_sew* @eustlb
/src/transformers/models/sew_d/mod*_sew_d* @eustlb
/src/transformers/models/speech_to_text/mod*_speech_to_text* @eustlb
/src/transformers/models/speech_to_text_2/mod*_speech_to_text_2* @eustlb
/src/transformers/models/speecht5/mod*_speecht5* @eustlb
/src/transformers/models/unispeech/mod*_unispeech* @eustlb
/src/transformers/models/unispeech_sat/mod*_unispeech_sat* @eustlb
/src/transformers/models/univnet/mod*_univnet* @eustlb
/src/transformers/models/vits/mod*_vits* @eustlb
/src/transformers/models/wav2vec2/mod*_wav2vec2* @eustlb
/src/transformers/models/wav2vec2_bert/mod*_wav2vec2_bert* @eustlb
/src/transformers/models/wav2vec2_conformer/mod*_wav2vec2_conformer* @eustlb
/src/transformers/models/wav2vec2_phoneme/mod*_wav2vec2_phoneme* @eustlb
/src/transformers/models/wavlm/mod*_wavlm* @eustlb
/src/transformers/models/whisper/mod*_whisper* @eustlb
/src/transformers/models/xls_r/mod*_xls_r* @eustlb
/src/transformers/models/xlsr_wav2vec2/mod*_xlsr_wav2vec2* @eustlb
# Video models
/src/transformers/models/timesformer/mod*_timesformer* @Rocketknight1
/src/transformers/models/videomae/mod*_videomae* @Rocketknight1
/src/transformers/models/vivit/mod*_vivit* @Rocketknight1
# Multimodal models
/src/transformers/models/align/mod*_align* @zucchini-nlp
/src/transformers/models/altclip/mod*_altclip* @zucchini-nlp
/src/transformers/models/aria/mod*_aria* @zucchini-nlp
/src/transformers/models/blip/mod*_blip* @zucchini-nlp
/src/transformers/models/blip_2/mod*_blip_2* @zucchini-nlp
/src/transformers/models/bridgetower/mod*_bridgetower* @zucchini-nlp
/src/transformers/models/bros/mod*_bros* @zucchini-nlp
/src/transformers/models/chameleon/mod*_chameleon* @zucchini-nlp
/src/transformers/models/chinese_clip/mod*_chinese_clip* @zucchini-nlp
/src/transformers/models/clip/mod*_clip* @zucchini-nlp
/src/transformers/models/clipseg/mod*_clipseg* @zucchini-nlp
/src/transformers/models/clvp/mod*_clvp* @zucchini-nlp
/src/transformers/models/colpali/mod*_colpali* @zucchini-nlp @yonigozlan
/src/transformers/models/data2vec/mod*_data2vec* @zucchini-nlp
/src/transformers/models/deplot/mod*_deplot* @zucchini-nlp
/src/transformers/models/donut/mod*_donut* @zucchini-nlp
/src/transformers/models/flava/mod*_flava* @zucchini-nlp
/src/transformers/models/git/mod*_git* @zucchini-nlp
/src/transformers/models/grounding_dino/mod*_grounding_dino* @yonigozlan
/src/transformers/models/groupvit/mod*_groupvit* @zucchini-nlp
/src/transformers/models/idefics/mod*_idefics* @zucchini-nlp
/src/transformers/models/idefics2/mod*_idefics2* @zucchini-nlp
/src/transformers/models/idefics3/mod*_idefics3* @zucchini-nlp
/src/transformers/models/instructblip/mod*_instructblip* @zucchini-nlp
/src/transformers/models/instructblipvideo/mod*_instructblipvideo* @zucchini-nlp
/src/transformers/models/kosmos_2/mod*_kosmos_2* @zucchini-nlp
/src/transformers/models/layoutlm/mod*_layoutlm* @NielsRogge
/src/transformers/models/layoutlmv2/mod*_layoutlmv2* @NielsRogge
/src/transformers/models/layoutlmv3/mod*_layoutlmv3* @NielsRogge
/src/transformers/models/layoutxlm/mod*_layoutxlm* @NielsRogge
/src/transformers/models/lilt/mod*_lilt* @zucchini-nlp
/src/transformers/models/llava/mod*_llava* @zucchini-nlp @arthurzucker
/src/transformers/models/llava_next/mod*_llava_next* @zucchini-nlp
/src/transformers/models/llava_next_video/mod*_llava_next_video* @zucchini-nlp
/src/transformers/models/llava_onevision/mod*_llava_onevision* @zucchini-nlp
/src/transformers/models/lxmert/mod*_lxmert* @zucchini-nlp
/src/transformers/models/matcha/mod*_matcha* @zucchini-nlp
/src/transformers/models/mgp_str/mod*_mgp_str* @zucchini-nlp
/src/transformers/models/mllama/mod*_mllama* @zucchini-nlp
/src/transformers/models/nougat/mod*_nougat* @NielsRogge
/src/transformers/models/omdet_turbo/mod*_omdet_turbo* @yonigozlan
/src/transformers/models/oneformer/mod*_oneformer* @zucchini-nlp
/src/transformers/models/owlvit/mod*_owlvit* @yonigozlan
/src/transformers/models/owlv2/mod*_owlv2* @yonigozlan
/src/transformers/models/paligemma/mod*_paligemma* @zucchini-nlp @molbap
/src/transformers/models/perceiver/mod*_perceiver* @zucchini-nlp
/src/transformers/models/pix2struct/mod*_pix2struct* @zucchini-nlp
/src/transformers/models/pixtral/mod*_pixtral* @zucchini-nlp @ArthurZucker
/src/transformers/models/qwen2_audio/mod*_qwen2_audio* @zucchini-nlp @ArthurZucker
/src/transformers/models/qwen2_vl/mod*_qwen2_vl* @zucchini-nlp @ArthurZucker
/src/transformers/models/sam/mod*_sam* @zucchini-nlp @ArthurZucker
/src/transformers/models/siglip/mod*_siglip* @zucchini-nlp
/src/transformers/models/speech_encoder_decoder/mod*_speech_encoder_decoder* @zucchini-nlp
/src/transformers/models/tapas/mod*_tapas* @NielsRogge
/src/transformers/models/trocr/mod*_trocr* @zucchini-nlp
/src/transformers/models/tvlt/mod*_tvlt* @zucchini-nlp
/src/transformers/models/tvp/mod*_tvp* @zucchini-nlp
/src/transformers/models/udop/mod*_udop* @zucchini-nlp
/src/transformers/models/video_llava/mod*_video_llava* @zucchini-nlp
/src/transformers/models/vilt/mod*_vilt* @zucchini-nlp
/src/transformers/models/vipllava/mod*_vipllava* @zucchini-nlp
/src/transformers/models/vision_encoder_decoder/mod*_vision_encoder_decoder* @Rocketknight1
/src/transformers/models/vision_text_dual_encoder/mod*_vision_text_dual_encoder* @Rocketknight1
/src/transformers/models/visual_bert/mod*_visual_bert* @zucchini-nlp
/src/transformers/models/xclip/mod*_xclip* @zucchini-nlp
# Reinforcement learning models
/src/transformers/models/decision_transformer/mod*_decision_transformer* @Rocketknight1
/src/transformers/models/trajectory_transformer/mod*_trajectory_transformer* @Rocketknight1
# Time series models
/src/transformers/models/autoformer/mod*_autoformer* @Rocketknight1
/src/transformers/models/informer/mod*_informer* @Rocketknight1
/src/transformers/models/patchtsmixer/mod*_patchtsmixer* @Rocketknight1
/src/transformers/models/patchtst/mod*_patchtst* @Rocketknight1
/src/transformers/models/time_series_transformer/mod*_time_series_transformer* @Rocketknight1
# Graph models
/src/transformers/models/graphormer/mod*_graphormer* @clefourrier
# Finally, files with no owners that shouldn't generate pings, usually automatically generated and checked in the CI
utils/dummy*

View File

@ -54,7 +54,7 @@ jobs:
- name: Create model files
run: |
. ~/venv/bin/activate
transformers-cli add-new-model-like --config_file tests/fixtures/add_distilbert_like_config.json --path_to_repo .
transformers add-new-model-like --config_file tests/fixtures/add_distilbert_like_config.json --path_to_repo .
make style
make fix-copies

26
.github/workflows/assign-reviewers.yml vendored Normal file
View File

@ -0,0 +1,26 @@
name: Assign PR Reviewers
on:
pull_request_target:
branches:
- main
types: [ready_for_review]
jobs:
assign_reviewers:
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.13'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install PyGithub
- name: Run assignment script
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/assign_reviewers.py

View File

@ -1,42 +1,73 @@
name: Self-hosted runner (benchmark)
on:
schedule:
- cron: "17 2 * * *"
workflow_call:
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
env:
HF_HOME: /mnt/cache
TF_FORCE_GPU_ALLOW_GROWTH: true
jobs:
benchmark:
name: Benchmark
runs-on: [single-gpu, nvidia-gpu, a10, ci]
strategy:
matrix:
# group: [aws-g5-4xlarge-cache, aws-p4d-24xlarge-plus] (A100 runner is not enabled)
group: [aws-g5-4xlarge-cache]
runs-on:
group: ${{ matrix.group }}
if: |
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark') )||
(github.event_name == 'push' && github.ref == 'refs/heads/main')
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
image: huggingface/transformers-pytorch-gpu
options: --gpus all --privileged --ipc host
steps:
- name: Update clone
working-directory: /transformers
- name: Get repo
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha || github.sha }}
- name: Install libpq-dev & psql
run: |
git fetch && git checkout ${{ github.sha }}
apt update
apt install -y libpq-dev postgresql-client
- name: Install benchmark script dependencies
run: python3 -m pip install -r benchmark/requirements.txt
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e ".[torch]"
- name: Benchmark (daily)
if: github.event_name == 'schedule'
working-directory: /transformers
- name: Run database init script
run: |
python3 -m pip install optimum-benchmark>=0.3.0
HF_TOKEN=${{ secrets.TRANSFORMERS_BENCHMARK_TOKEN }} python3 benchmark/benchmark.py --repo_id hf-internal-testing/benchmark_results --path_in_repo $(date +'%Y-%m-%d') --config-dir benchmark/config --config-name generation --commit=${{ github.sha }} backend.model=google/gemma-2b backend.cache_implementation=null,static backend.torch_compile=false,true --multirun
psql -f benchmark/utils/init_db.sql
env:
PGDATABASE: metrics
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
PGUSER: transformers_benchmarks
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}
- name: Benchmark (merged to main event)
if: github.event_name == 'push' && github.ref_name == 'main'
working-directory: /transformers
- name: Run benchmark
run: |
python3 -m pip install optimum-benchmark>=0.3.0
HF_TOKEN=${{ secrets.TRANSFORMERS_BENCHMARK_TOKEN }} python3 benchmark/benchmark.py --repo_id hf-internal-testing/benchmark_results_merge_event --path_in_repo $(date +'%Y-%m-%d') --config-dir benchmark/config --config-name generation --commit=${{ github.sha }} backend.model=google/gemma-2b backend.cache_implementation=null,static backend.torch_compile=false,true --multirun
git config --global --add safe.directory /__w/transformers/transformers
if [ "$GITHUB_EVENT_NAME" = "pull_request" ]; then
commit_id=$(echo "${{ github.event.pull_request.head.sha }}")
elif [ "$GITHUB_EVENT_NAME" = "push" ]; then
commit_id=$GITHUB_SHA
fi
commit_msg=$(git show -s --format=%s | cut -c1-70)
python3 benchmark/benchmarks_entrypoint.py "huggingface/transformers" "$BRANCH_NAME" "$commit_id" "$commit_msg"
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
# Enable this to see debug logs
# HF_HUB_VERBOSITY: debug
# TRANSFORMERS_VERBOSITY: debug
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
PGUSER: transformers_benchmarks
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}

57
.github/workflows/benchmark_v2.yml vendored Normal file
View File

@ -0,0 +1,57 @@
name: Benchmark v2 Framework
on:
workflow_dispatch:
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
jobs:
benchmark-v2:
name: Benchmark v2
runs-on: ${{ inputs.runner }}
if: |
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark')) ||
(github.event_name == 'schedule')
container:
image: ${{ inputs.container_image }}
options: ${{ inputs.container_options }}
steps:
- name: Get repo
uses: actions/checkout@v4
with:
ref: ${{ inputs.commit_sha || github.sha }}
- name: Install benchmark dependencies
run: |
python3 -m pip install -r benchmark_v2/requirements.txt
- name: Reinstall transformers in edit mode
run: |
python3 -m pip uninstall -y transformers
python3 -m pip install -e ".[torch]"
- name: Show installed libraries and their versions
run: |
python3 -m pip list
python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python3 -c "import torch; print(f'CUDA device count: {torch.cuda.device_count()}')" || true
nvidia-smi || true
- name: Run benchmark v2
working-directory: benchmark_v2
run: |
echo "Running benchmarks"
python3 run_benchmarks.py \
--commit-id '${{ inputs.commit_sha || github.sha }}' \
--run-id '${{ inputs.run_id }}' \
--push-to-hub '${{ inputs.benchmark_repo_id}}' \
--token '${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}' \
--log-level INFO
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}

View File

@ -0,0 +1,17 @@
name: Benchmark v2 Scheduled Runner - A10 Single-GPU
on:
workflow_dispatch:
jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: aws-g5-4xlarge-cache-use1-public-80
container_image: huggingface/transformers-pytorch-gpu
container_options: --gpus all --privileged --ipc host --shm-size "16gb"
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit

View File

@ -0,0 +1,17 @@
name: Benchmark v2 Scheduled Runner - MI325 Single-GPU
on:
workflow_dispatch:
jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: amd-mi325-ci-1gpu
container_image: huggingface/transformers-pytorch-amd-gpu
container_options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit

View File

@ -26,7 +26,7 @@ jobs:
strategy:
matrix:
file: ["quality", "consistency", "custom-tokenizers", "torch-light", "tf-light", "exotic-models", "torch-tf-light", "torch-jax-light", "jax-light", "examples-torch", "examples-tf"]
file: ["quality", "consistency", "custom-tokenizers", "torch-light", "exotic-models", "examples-torch"]
continue-on-error: true
steps:
@ -34,11 +34,11 @@ jobs:
name: Set tag
run: |
if ${{contains(github.event.head_commit.message, '[build-ci-image]')}}; then
echo "TAG=huggingface/transformers-${{ matrix.file }}:dev" >> "$GITHUB_ENV"
echo "TAG=huggingface/transformers-${{ matrix.file }}:dev" >> "$GITHUB_ENV"
echo "setting it to DEV!"
else
echo "TAG=huggingface/transformers-${{ matrix.file }}" >> "$GITHUB_ENV"
fi
-
name: Set up Docker Buildx

View File

@ -5,6 +5,7 @@ on:
branches:
- build_ci_docker_image*
repository_dispatch:
workflow_dispatch:
workflow_call:
inputs:
image_postfix:
@ -19,7 +20,7 @@ concurrency:
jobs:
latest-docker:
name: "Latest PyTorch + TensorFlow [dev]"
name: "Latest PyTorch [dev]"
runs-on:
group: aws-general-8-plus
steps:
@ -63,14 +64,14 @@ jobs:
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-all-latest-gpu-push-ci docker build
title: 🤗 Results of the transformers-all-latest-gpu-push-ci docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-torch-deepspeed-docker:
name: "Latest PyTorch + DeepSpeed"
runs-on:
group: aws-general-8-plus
group: aws-g4dn-2xlarge-cache
steps:
-
name: Set up Docker Buildx
@ -99,7 +100,7 @@ jobs:
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER}}
title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu docker build
title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
@ -140,7 +141,7 @@ jobs:
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu-push-ci docker build
title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu-push-ci docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
@ -176,7 +177,7 @@ jobs:
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-doc-builder docker build
title: 🤗 Results of the huggingface/transformers-doc-builder docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
@ -214,28 +215,28 @@ jobs:
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-pytorch-gpudocker build
title: 🤗 Results of the huggingface/transformers-pytorch-gpudocker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch-amd:
name: "Latest PyTorch (AMD) [dev]"
runs-on:
group: aws-general-8-plus
group: aws-highcpu-32-priv
steps:
-
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
-
name: Check out code
uses: actions/checkout@v4
-
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
-
name: Build and push
uses: docker/build-push-action@v5
with:
@ -263,45 +264,7 @@ jobs:
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-pytorch-amd-gpu-push-ci build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-tensorflow:
name: "Latest TensorFlow [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-tensorflow-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-tensorflow-gpu
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-tensorflow-gpu build
title: 🤗 Results of the huggingface/transformers-pytorch-amd-gpu-push-ci build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
@ -310,19 +273,19 @@ jobs:
runs-on:
group: aws-general-8-plus
steps:
-
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
-
name: Check out code
uses: actions/checkout@v4
-
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
-
name: Build and push
uses: docker/build-push-action@v5
with:
@ -350,7 +313,7 @@ jobs:
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-pytorch-deepspeed-amd-gpu build
title: 🤗 Results of the transformers-pytorch-deepspeed-amd-gpu build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
@ -388,6 +351,6 @@ jobs:
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-quantization-latest-gpu build
title: 🤗 Results of the transformers-quantization-latest-gpu build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

View File

@ -2,6 +2,10 @@ name: Build docker images (Nightly CI)
on:
workflow_call:
inputs:
job:
required: true
type: string
push:
branches:
- build_nightly_ci_docker_image*
@ -12,7 +16,8 @@ concurrency:
jobs:
latest-with-torch-nightly-docker:
name: "Nightly PyTorch + Stable TensorFlow"
name: "Nightly PyTorch"
if: inputs.job == 'latest-with-torch-nightly-docker' || inputs.job == ''
runs-on:
group: aws-general-8-plus
steps:
@ -41,8 +46,9 @@ jobs:
nightly-torch-deepspeed-docker:
name: "Nightly PyTorch + DeepSpeed"
if: inputs.job == 'nightly-torch-deepspeed-docker' || inputs.job == ''
runs-on:
group: aws-general-8-plus
group: aws-g4dn-2xlarge-cache
steps:
-
name: Set up Docker Buildx

View File

@ -16,8 +16,20 @@ jobs:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: ar de en es fr hi it ko pt tr zh ja te
languages: en
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
build_other_lang:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: ar de es fr hi it ja ko pt zh
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@ -14,5 +14,4 @@ jobs:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: transformers
languages: ar de en es fr hi it ko pt tr zh ja te
custom_container: huggingface/transformers-doc-builder
languages: en

255
.github/workflows/check_failed_tests.yml vendored Normal file
View File

@ -0,0 +1,255 @@
name: Process failed tests
on:
workflow_call:
inputs:
docker:
required: true
type: string
start_sha:
required: true
type: string
job:
required: true
type: string
slack_report_channel:
required: true
type: string
ci_event:
required: true
type: string
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
check_new_failures:
name: "Find commits for new failing tests"
strategy:
matrix:
run_idx: [1]
runs-on:
group: aws-g5-4xlarge-cache
outputs:
process: ${{ steps.check_file.outputs.process }}
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/download-artifact@v4
with:
name: ci_results_${{ inputs.job }}
path: /transformers/ci_results_${{ inputs.job }}
- name: Check file
id: check_file
working-directory: /transformers
run: |
if [ -f ci_results_${{ inputs.job }}/new_failures.json ]; then
echo "`ci_results_${{ inputs.job }}/new_failures.json` exists, continue ..."
echo "process=true" >> $GITHUB_ENV
echo "process=true" >> $GITHUB_OUTPUT
else
echo "`ci_results_${{ inputs.job }}/new_failures.json` doesn't exist, abort."
echo "process=false" >> $GITHUB_ENV
echo "process=false" >> $GITHUB_OUTPUT
fi
- uses: actions/download-artifact@v4
if: ${{ env.process == 'true' }}
with:
pattern: setup_values*
path: setup_values
merge-multiple: true
- name: Prepare some setup values
if: ${{ env.process == 'true' }}
run: |
if [ -f setup_values/prev_workflow_run_id.txt ]; then
echo "PREV_WORKFLOW_RUN_ID=$(cat setup_values/prev_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "PREV_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
if [ -f setup_values/other_workflow_run_id.txt ]; then
echo "OTHER_WORKFLOW_RUN_ID=$(cat setup_values/other_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "OTHER_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
- name: Update clone
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Get target commit
working-directory: /transformers/utils
if: ${{ env.process == 'true' }}
run: |
echo "END_SHA=$(TOKEN=${{ secrets.ACCESS_REPO_INFO_TOKEN }} python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"], workflow_run_id=os.environ["PREV_WORKFLOW_RUN_ID"]); print(commit)')" >> $GITHUB_ENV
- name: Checkout to `start_sha`
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: git fetch && git checkout ${{ inputs.start_sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
if: ${{ env.process == 'true' }}
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: |
python3 utils/print_env.py
- name: Install pytest-flakefinder
if: ${{ env.process == 'true' }}
run: python3 -m pip install pytest-flakefinder
- name: Show installed libraries and their versions
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: pip freeze
- name: Check failed tests
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: python3 utils/check_bad_commit.py --start_commit ${{ inputs.start_sha }} --end_commit ${{ env.END_SHA }} --file ci_results_${{ inputs.job }}/new_failures.json --output_file new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}.json
- name: Show results
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: |
ls -l new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}.json
cat new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}.json
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}
path: /transformers/new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}.json
process_new_failures_with_commit_info:
name: "process bad commit reports"
needs: check_new_failures
if: needs.check_new_failures.outputs.process == 'true'
runs-on:
group: aws-g5-4xlarge-cache
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/download-artifact@v4
with:
name: ci_results_${{ inputs.job }}
path: /transformers/ci_results_${{ inputs.job }}
- uses: actions/download-artifact@v4
with:
pattern: new_failures_with_bad_commit_${{ inputs.job }}*
path: /transformers/new_failures_with_bad_commit_${{ inputs.job }}
merge-multiple: true
- name: Check files
working-directory: /transformers
run: |
ls -la /transformers
ls -la /transformers/new_failures_with_bad_commit_${{ inputs.job }}
# Currently, we only run with a single runner by using `run_idx: [1]`. We might try to run with multiple runners
# to further reduce the false positive caused by flaky tests, which requires further processing to merge reports.
- name: Merge files
shell: bash
working-directory: /transformers
run: |
cp /transformers/new_failures_with_bad_commit_${{ inputs.job }}/new_failures_with_bad_commit_${{ inputs.job }}_1.json new_failures_with_bad_commit.json
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Process report
shell: bash
working-directory: /transformers
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
JOB_NAME: ${{ inputs.job }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
run: |
python3 utils/process_bad_commit_report.py
- name: Process report
shell: bash
working-directory: /transformers
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
JOB_NAME: ${{ inputs.job }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
run: |
{
echo 'REPORT_TEXT<<EOF'
python3 utils/process_bad_commit_report.py
echo EOF
} >> "$GITHUB_ENV"
- name: Prepare Slack report title
working-directory: /transformers
run: |
pip install slack_sdk
echo "title=$(python3 -c 'import sys; sys.path.append("utils"); from utils.notification_service import job_to_test_map; ci_event = "${{ inputs.ci_event }}"; job = "${{ inputs.job }}"; test_name = job_to_test_map[job]; title = f"New failed tests of {ci_event}" + ":" + f" {test_name}"; print(title)')" >> $GITHUB_ENV
- name: Send processed report
if: ${{ !endsWith(env.REPORT_TEXT, '{}') }}
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: '#${{ inputs.slack_report_channel }}'
# For posting a rich message using Block Kit
payload: |
{
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": "${{ env.title }}"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "${{ env.REPORT_TEXT }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

43
.github/workflows/collated-reports.yml vendored Normal file
View File

@ -0,0 +1,43 @@
name: CI collated reports
on:
workflow_call:
inputs:
job:
required: true
type: string
report_repo_id:
required: true
type: string
machine_type:
required: true
type: string
gpu_name:
description: Name of the GPU used for the job. Its enough that the value contains the name of the GPU, e.g. "noise-h100-more-noise". Case insensitive.
required: true
type: string
jobs:
collated_reports:
name: Collated reports
runs-on: ubuntu-22.04
if: always()
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
- name: Collated reports
shell: bash
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_SHA: ${{ github.sha }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
run: |
pip install huggingface_hub
python3 utils/collated_reports.py \
--path . \
--machine-type ${{ inputs.machine_type }} \
--commit-hash ${{ env.CI_SHA }} \
--job ${{ inputs.job }} \
--report-repo-id ${{ inputs.report_repo_id }} \
--gpu-name ${{ inputs.gpu_name }}

View File

@ -16,7 +16,6 @@ env:
RUN_SLOW: yes
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
jobs:
@ -27,10 +26,11 @@ jobs:
fail-fast: false
matrix:
split_keys: ${{ fromJson(inputs.split_keys) }}
runs-on: [single-gpu, nvidia-gpu, t4, ci]
runs-on:
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers

View File

@ -14,10 +14,11 @@ env:
jobs:
setup:
name: Setup
runs-on: [single-gpu, nvidia-gpu, t4, ci]
runs-on:
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
job_splits: ${{ steps.set-matrix.outputs.job_splits }}
split_keys: ${{ steps.set-matrix.outputs.split_keys }}
@ -85,4 +86,4 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: doc_test_results
path: doc_test_results
path: doc_test_results

157
.github/workflows/get-pr-info.yml vendored Normal file
View File

@ -0,0 +1,157 @@
name: Get PR commit SHA
on:
workflow_call:
inputs:
pr_number:
required: true
type: string
outputs:
PR_HEAD_REPO_FULL_NAME:
description: "The full name of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_FULL_NAME }}
PR_BASE_REPO_FULL_NAME:
description: "The full name of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_FULL_NAME }}
PR_HEAD_REPO_OWNER:
description: "The owner of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}
PR_BASE_REPO_OWNER:
description: "The owner of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_OWNER }}
PR_HEAD_REPO_NAME:
description: "The name of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}
PR_BASE_REPO_NAME:
description: "The name of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_NAME }}
PR_HEAD_REF:
description: "The branch name of the pull request in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REF }}
PR_BASE_REF:
description: "The branch name in the base repository (to merge into)"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REF }}
PR_HEAD_SHA:
description: "The head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_SHA }}
PR_BASE_SHA:
description: "The head sha of the target branch in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_SHA }}
PR_MERGE_COMMIT_SHA:
description: "The sha of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
PR_HEAD_COMMIT_DATE:
description: "The date of the head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_DATE }}
PR_MERGE_COMMIT_DATE:
description: "The date of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
PR_HEAD_COMMIT_TIMESTAMP:
description: "The timestamp of the head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_TIMESTAMP }}
PR_MERGE_COMMIT_TIMESTAMP:
description: "The timestamp of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
PR:
description: "The PR"
value: ${{ jobs.get-pr-info.outputs.PR }}
PR_FILES:
description: "The files touched in the PR"
value: ${{ jobs.get-pr-info.outputs.PR_FILES }}
jobs:
get-pr-info:
runs-on: ubuntu-22.04
name: Get PR commit SHA better
outputs:
PR_HEAD_REPO_FULL_NAME: ${{ steps.pr_info.outputs.head_repo_full_name }}
PR_BASE_REPO_FULL_NAME: ${{ steps.pr_info.outputs.base_repo_full_name }}
PR_HEAD_REPO_OWNER: ${{ steps.pr_info.outputs.head_repo_owner }}
PR_BASE_REPO_OWNER: ${{ steps.pr_info.outputs.base_repo_owner }}
PR_HEAD_REPO_NAME: ${{ steps.pr_info.outputs.head_repo_name }}
PR_BASE_REPO_NAME: ${{ steps.pr_info.outputs.base_repo_name }}
PR_HEAD_REF: ${{ steps.pr_info.outputs.head_ref }}
PR_BASE_REF: ${{ steps.pr_info.outputs.base_ref }}
PR_HEAD_SHA: ${{ steps.pr_info.outputs.head_sha }}
PR_BASE_SHA: ${{ steps.pr_info.outputs.base_sha }}
PR_MERGE_COMMIT_SHA: ${{ steps.pr_info.outputs.merge_commit_sha }}
PR_HEAD_COMMIT_DATE: ${{ steps.pr_info.outputs.head_commit_date }}
PR_MERGE_COMMIT_DATE: ${{ steps.pr_info.outputs.merge_commit_date }}
PR_HEAD_COMMIT_TIMESTAMP: ${{ steps.get_timestamps.outputs.head_commit_timestamp }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ steps.get_timestamps.outputs.merge_commit_timestamp }}
PR: ${{ steps.pr_info.outputs.pr }}
PR_FILES: ${{ steps.pr_info.outputs.files }}
if: ${{ inputs.pr_number != '' }}
steps:
- name: Extract PR details
id: pr_info
uses: actions/github-script@v6
with:
script: |
const { data: pr } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ inputs.pr_number }}
});
const { data: head_commit } = await github.rest.repos.getCommit({
owner: pr.head.repo.owner.login,
repo: pr.head.repo.name,
ref: pr.head.ref
});
const { data: merge_commit } = await github.rest.repos.getCommit({
owner: pr.base.repo.owner.login,
repo: pr.base.repo.name,
ref: pr.merge_commit_sha,
});
const { data: files } = await github.rest.pulls.listFiles({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ inputs.pr_number }}
});
core.setOutput('head_repo_full_name', pr.head.repo.full_name);
core.setOutput('base_repo_full_name', pr.base.repo.full_name);
core.setOutput('head_repo_owner', pr.head.repo.owner.login);
core.setOutput('base_repo_owner', pr.base.repo.owner.login);
core.setOutput('head_repo_name', pr.head.repo.name);
core.setOutput('base_repo_name', pr.base.repo.name);
core.setOutput('head_ref', pr.head.ref);
core.setOutput('base_ref', pr.base.ref);
core.setOutput('head_sha', pr.head.sha);
core.setOutput('base_sha', pr.base.sha);
core.setOutput('merge_commit_sha', pr.merge_commit_sha);
core.setOutput('pr', pr);
core.setOutput('head_commit_date', head_commit.commit.committer.date);
core.setOutput('merge_commit_date', merge_commit.commit.committer.date);
core.setOutput('files', files);
console.log('PR head commit:', {
head_commit: head_commit,
commit: head_commit.commit,
date: head_commit.commit.committer.date
});
console.log('PR merge commit:', {
merge_commit: merge_commit,
commit: merge_commit.commit,
date: merge_commit.commit.committer.date
});
- name: Convert dates to timestamps
id: get_timestamps
run: |
head_commit_date=${{ steps.pr_info.outputs.head_commit_date }}
merge_commit_date=${{ steps.pr_info.outputs.merge_commit_date }}
echo $head_commit_date
echo $merge_commit_date
head_commit_timestamp=$(date -d "$head_commit_date" +%s)
merge_commit_timestamp=$(date -d "$merge_commit_date" +%s)
echo $head_commit_timestamp
echo $merge_commit_timestamp
echo "head_commit_timestamp=$head_commit_timestamp" >> $GITHUB_OUTPUT
echo "merge_commit_timestamp=$merge_commit_timestamp" >> $GITHUB_OUTPUT

36
.github/workflows/get-pr-number.yml vendored Normal file
View File

@ -0,0 +1,36 @@
name: Get PR number
on:
workflow_call:
outputs:
PR_NUMBER:
description: "The extracted PR number"
value: ${{ jobs.get-pr-number.outputs.PR_NUMBER }}
jobs:
get-pr-number:
runs-on: ubuntu-22.04
name: Get PR number
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:
- name: Get PR number
shell: bash
run: |
if [[ "${{ github.event.issue.number }}" != "" && "${{ github.event.issue.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request.number }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.number }}" >> $GITHUB_ENV
else
echo "PR_NUMBER=" >> $GITHUB_ENV
fi
- name: Check PR number
shell: bash
run: |
echo "${{ env.PR_NUMBER }}"
- name: Set PR number
id: set_pr_number
run: echo "PR_NUMBER=${{ env.PR_NUMBER }}" >> "$GITHUB_OUTPUT"

View File

@ -12,12 +12,22 @@ on:
slice_id:
required: true
type: number
runner:
required: true
type: string
docker:
required: true
type: string
commit_sha:
required: false
type: string
report_name_prefix:
required: false
default: run_models_gpu
type: string
runner_type:
required: false
type: string
report_repo_id:
required: false
type: string
env:
HF_HOME: /mnt/cache
@ -28,9 +38,7 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
jobs:
@ -46,6 +54,8 @@ jobs:
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
machine_type: ${{ steps.set_machine_type.outputs.machine_type }}
steps:
- name: Echo input and matrix info
shell: bash
@ -67,7 +77,7 @@ jobs:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -99,14 +109,15 @@ jobs:
run: pip freeze
- name: Set `machine_type` for report and artifact names
id: set_machine_type
working-directory: /transformers
shell: bash
run: |
echo "${{ inputs.machine_type }}"
if [ "${{ inputs.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ inputs.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ inputs.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ inputs.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ inputs.machine_type }}
@ -114,26 +125,58 @@ jobs:
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
echo "machine_type=$machine_type" >> $GITHUB_OUTPUT
- name: Create report directory if it doesn't exist
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
echo "dummy" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/dummy.txt
ls -la /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
run: |
script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports tests/${{ matrix.folders }}" test_outputs.txt
ls -la
# Extract the exit code from the output file
EXIT_CODE=$(tail -1 test_outputs.txt | grep -o 'COMMAND_EXIT_CODE="[0-9]*"' | cut -d'"' -f2)
exit ${EXIT_CODE:-1}
- name: Failure short reports
if: ${{ failure() }}
# This step is only to show information on Github Actions log.
# Always mark this step as successful, even if the report directory or the file `failures_short.txt` in it doesn't exist
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
- name: Captured information
if: ${{ failure() }}
continue-on-error: true
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports"
cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/captured_info.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
- name: Copy test_outputs.txt
if: ${{ always() }}
continue-on-error: true
run: |
cp /transformers/test_outputs.txt /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
collated_reports:
name: Collated Reports
if: ${{ always() }}
needs: run_models_gpu
uses: huggingface/transformers/.github/workflows/collated-reports.yml@main
with:
job: run_models_gpu
report_repo_id: ${{ inputs.report_repo_id }}
gpu_name: ${{ inputs.runner_type }}
machine_type: ${{ needs.run_models_gpu.outputs.machine_type }}
secrets: inherit

View File

@ -1,129 +0,0 @@
name: model jobs
on:
workflow_call:
inputs:
folder_slices:
required: true
type: string
machine_type:
required: true
type: string
slice_id:
required: true
type: number
runner:
required: true
type: string
docker:
required: true
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
jobs:
run_models_gpu:
name: " "
strategy:
max-parallel: 1 # For now, not to parallelize. Can change later if it works well.
fail-fast: false
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on: ['${{ inputs.machine_type }}', self-hosted, amd-gpu, '${{ inputs.runner }}']
container:
image: ${{ inputs.docker }}
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ inputs.folder_slices }}"
echo "${{ matrix.folders }}"
echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Update / Install some packages (for Past CI)
if: ${{ contains(inputs.docker, '-past-') }}
working-directory: /transformers
run: |
python3 -m pip install -U datasets
- name: Update / Install some packages (for Past CI)
if: ${{ contains(inputs.docker, '-past-') && contains(inputs.docker, '-pytorch-') }}
working-directory: /transformers
run: |
python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -rsfE -v --make-reports=${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }} -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
run: |
mkdir -p /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ inputs.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ inputs.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports

View File

@ -0,0 +1,120 @@
name: model jobs
on:
workflow_call:
inputs:
folder_slices:
required: true
type: string
slice_id:
required: true
type: number
runner:
required: true
type: string
machine_type:
required: true
type: string
report_name_prefix:
required: false
default: run_models_gpu
type: string
env:
RUN_SLOW: yes
PT_HPU_LAZY_MODE: 0
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:
run_models_gpu:
name: " "
strategy:
max-parallel: 8
fail-fast: false
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on:
group: ${{ inputs.runner }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ inputs.folder_slices }}"
echo "${{ matrix.folders }}"
echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: python3 utils/print_env.py
- name: Show installed libraries and their versions
run: pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ inputs.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ inputs.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ inputs.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all tests on Gaudi
run: python3 -m pytest -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
run: |
mkdir -p reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
echo "hello" > reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
path: reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports

View File

@ -0,0 +1,68 @@
# Used to notify core maintainers about new model PR being merged
name: New model PR merged notification
on:
push:
branches:
- main
paths:
- 'src/transformers/models/*/modeling_*'
jobs:
notify_new_model:
name: Notify new model
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check new model
shell: bash
run: |
python -m pip install gitpython
python -c 'from utils.pr_slow_ci_models import get_new_model; new_model = get_new_model(diff_with_last_commit=True); print(new_model)' | tee output.txt
echo "NEW_MODEL=$(tail -n 1 output.txt)" >> $GITHUB_ENV
echo "COMMIT_SHA=$(git log -1 --format=%H)" >> $GITHUB_ENV
- name: print commit sha
if: ${{ env.NEW_MODEL != ''}}
shell: bash
run: |
echo "$COMMIT_SHA"
- name: print new model
if: ${{ env.NEW_MODEL != ''}}
shell: bash
run: |
echo "$NEW_MODEL"
- name: Notify
if: ${{ env.NEW_MODEL != ''}}
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: transformers-new-model-notification
# For posting a rich message using Block Kit
payload: |
{
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": "New model!",
"emoji": true
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "<https://github.com/huggingface/transformers/commit/${{ env.COMMIT_SHA }}|New model: ${{ env.NEW_MODEL }}> GH_ArthurZucker, GH_lysandrejik, GH_ydshieh\ncommit SHA: ${{ env.COMMIT_SHA }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

18
.github/workflows/pr-style-bot.yml vendored Normal file
View File

@ -0,0 +1,18 @@
# To run this bot, comment "@bot /style" on a PR
name: Style Bot
on:
issue_comment:
types: [created]
permissions:
pull-requests: write
jobs:
style:
uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@main
with:
python_quality_dependencies: "[quality]"
style_command_type: "default"
secrets:
bot_token: ${{ secrets.HF_STYLE_BOT_ACTION }}

View File

@ -0,0 +1,134 @@
name: PR - build doc via comment
on:
issue_comment:
types:
- created
branches-ignore:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.event.issue.number }}-${{ startsWith(github.event.comment.body, 'build-doc') }}
cancel-in-progress: true
permissions: {}
jobs:
get-pr-number:
name: Get PR number
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}
uses: ./.github/workflows/get-pr-number.yml
get-pr-info:
name: Get PR commit SHA
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
uses: ./.github/workflows/get-pr-info.yml
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
verity_pr_commit:
name: Verity PR commit corresponds to a specific event by comparing timestamps
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
runs-on: ubuntu-22.04
needs: get-pr-info
env:
COMMENT_DATE: ${{ github.event.comment.created_at }}
PR_MERGE_COMMIT_DATE: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
steps:
- run: |
COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
echo "COMMENT_DATE: $COMMENT_DATE"
echo "PR_MERGE_COMMIT_DATE: $PR_MERGE_COMMIT_DATE"
echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
exit -1;
fi
create_run:
name: Create run
needs: [get-pr-number, get-pr-info]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}
permissions:
statuses: write
runs-on: ubuntu-22.04
steps:
- name: Create Run
id: create_run
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.
# See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-pr-info.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Custom doc building job" -f "context=custom-doc-build"
reply_to_comment:
name: Reply to the comment
if: ${{ needs.create_run.result == 'success' }}
needs: [get-pr-number, create_run]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Reply to the comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/issues/${{ needs.get-pr-number.outputs.PR_NUMBER }}/comments \
-f "body=[Building docs for all languages...](${{ env.GITHUB_RUN_URL }})"
build-doc:
name: Build doc
needs: [get-pr-number, get-pr-info]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
package: transformers
languages: ar de en es fr hi it ja ko pt zh
update_run_status:
name: Update Check Run Status
needs: [ get-pr-info, create_run, build-doc ]
permissions:
statuses: write
if: ${{ always() && needs.create_run.result == 'success' }}
runs-on: ubuntu-22.04
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
STATUS_OK: ${{ contains(fromJSON('["skipped", "success"]'), needs.create_run.result) }}
steps:
- name: Get `build-doc` job status
run: |
echo "${{ needs.build-doc.result }}"
echo $STATUS_OK
if [ "$STATUS_OK" = "true" ]; then
echo "STATUS=success" >> $GITHUB_ENV
else
echo "STATUS=failure" >> $GITHUB_ENV
fi
- name: Update PR commit statuses
run: |
echo "${{ needs.build-doc.result }}"
echo "${{ env.STATUS }}"
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-pr-info.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=${{ env.STATUS }}" -f "description=Custom doc building job" -f "context=custom-doc-build"

177
.github/workflows/pr_run_slow_ci.yml vendored Normal file
View File

@ -0,0 +1,177 @@
name: PR slow CI
on:
pull_request_target:
types: [opened, synchronize, reopened]
jobs:
get-pr-number:
name: Get PR number
uses: ./.github/workflows/get-pr-number.yml
get-pr-info:
name: Get PR commit SHA
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
uses: ./.github/workflows/get-pr-info.yml
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
get-jobs:
name: Get test files to run
runs-on: ubuntu-22.04
needs: [get-pr-number, get-pr-info]
outputs:
jobs: ${{ steps.get_jobs.outputs.jobs_to_run }}
steps:
- name: Get repository content
id: repo_content
uses: actions/github-script@v6
with:
script: |
const { data: tests_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
const { data: tests_models_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests/models',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
const { data: tests_quantization_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests/quantization',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
core.setOutput('tests_dir', tests_dir);
core.setOutput('tests_models_dir', tests_models_dir);
core.setOutput('tests_quantization_dir', tests_quantization_dir);
# This checkout to the main branch
- uses: actions/checkout@v4
with:
fetch-depth: "0"
- name: Write pr_files file
run: |
cat > pr_files.txt << 'EOF'
${{ needs.get-pr-info.outputs.PR_FILES }}
EOF
- name: Write tests_dir file
run: |
cat > tests_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_dir }}
EOF
- name: Write tests_models_dir file
run: |
cat > tests_models_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_models_dir }}
EOF
- name: Write tests_quantization_dir file
run: |
cat > tests_quantization_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_quantization_dir }}
EOF
- name: Run script to get jobs to run
id: get_jobs
run: |
python utils/get_pr_run_slow_jobs.py | tee output.txt
echo "jobs_to_run: $(tail -n 1 output.txt)"
echo "jobs_to_run=$(tail -n 1 output.txt)" >> $GITHUB_OUTPUT
send_comment:
# Will delete the previous comment and send a new one if:
# - either the content is changed
# - or the previous comment is 30 minutes or more old
name: Send a comment to suggest jobs to run
if: ${{ needs.get-jobs.outputs.jobs != '' }}
needs: [get-pr-number, get-jobs]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Check and update comment if needed
uses: actions/github-script@v7
env:
BODY: "\n\nrun-slow: ${{ needs.get-jobs.outputs.jobs }}"
with:
script: |
const prNumber = ${{ needs.get-pr-number.outputs.PR_NUMBER }};
const commentPrefix = "**[For maintainers]** Suggested jobs to run (before merge)";
const thirtyMinutesAgo = new Date(Date.now() - 30 * 60 * 1000); // 30 minutes ago
const newBody = `${commentPrefix}${process.env.BODY}`;
// Get all comments on the PR
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber
});
// Find existing comments that start with our prefix
const existingComments = comments.filter(comment =>
comment.user.login === 'github-actions[bot]' &&
comment.body.startsWith(commentPrefix)
);
let shouldCreateNewComment = true;
let commentsToDelete = [];
if (existingComments.length > 0) {
// Get the most recent comment
const mostRecentComment = existingComments
.sort((a, b) => new Date(b.created_at) - new Date(a.created_at))[0];
const commentDate = new Date(mostRecentComment.created_at);
const isOld = commentDate < thirtyMinutesAgo;
const isDifferentContent = mostRecentComment.body !== newBody;
console.log(`Most recent comment created: ${mostRecentComment.created_at}`);
console.log(`Is older than 30 minutes: ${isOld}`);
console.log(`Has different content: ${isDifferentContent}`);
if (isOld || isDifferentContent) {
// Delete all existing comments and create new one
commentsToDelete = existingComments;
console.log(`Will delete ${commentsToDelete.length} existing comment(s) and create new one`);
} else {
// Content is same and comment is recent, skip
shouldCreateNewComment = false;
console.log('Comment is recent and content unchanged, skipping update');
}
} else {
console.log('No existing comments found, will create new one');
}
// Delete old comments if needed
for (const comment of commentsToDelete) {
console.log(`Deleting comment #${comment.id} (created: ${comment.created_at})`);
await github.rest.issues.deleteComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: comment.id
});
}
// Create new comment if needed
if (shouldCreateNewComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: newBody
});
console.log('✅ New comment created');
} else {
console.log(' No comment update needed');
}

View File

@ -4,18 +4,6 @@ on:
push:
branches: [ main ]
env:
OUTPUT_SLACK_CHANNEL_ID: "C06L2SGMEEA"
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
jobs:
get_modified_models:
name: "Get all modified files"
@ -25,118 +13,145 @@ jobs:
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@3f54ebb830831fc121d3263c1857cfbdc310cdb9 #v42
- name: Get changed files using `actions/github-script`
id: get-changed-files
uses: actions/github-script@v7
with:
files: src/transformers/models/**
- name: Run step if only the files listed above change
if: steps.changed-files.outputs.any_changed == 'true'
id: set-matrix
script: |
let files = [];
// Only handle push events
if (context.eventName === 'push') {
const afterSha = context.payload.after;
const branchName = context.payload.ref.replace('refs/heads/', '');
let baseSha;
if (branchName === 'main') {
console.log('Push to main branch, comparing to parent commit');
// Get the parent commit of the pushed commit
const { data: commit } = await github.rest.repos.getCommit({
owner: context.repo.owner,
repo: context.repo.repo,
ref: afterSha
});
baseSha = commit.parents[0]?.sha;
if (!baseSha) {
throw new Error('No parent commit found for the pushed commit');
}
} else {
console.log(`Push to branch ${branchName}, comparing to main`);
baseSha = 'main';
}
const { data: comparison } = await github.rest.repos.compareCommits({
owner: context.repo.owner,
repo: context.repo.repo,
base: baseSha,
head: afterSha
});
// Include added, modified, and renamed files
files = comparison.files
.filter(file => file.status === 'added' || file.status === 'modified' || file.status === 'renamed')
.map(file => file.filename);
}
// Include all files under src/transformers/ (not just models subdirectory)
const filteredFiles = files.filter(file =>
file.startsWith('src/transformers/')
);
core.setOutput('changed_files', filteredFiles.join(' '));
core.setOutput('any_changed', filteredFiles.length > 0 ? 'true' : 'false');
- name: Parse changed files with Python
if: steps.get-changed-files.outputs.any_changed == 'true'
env:
ALL_CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
CHANGED_FILES: ${{ steps.get-changed-files.outputs.changed_files }}
id: set-matrix
run: |
model_arrays=()
for file in $ALL_CHANGED_FILES; do
model_path="${file#*models/}"
model_path="models/${model_path%%/*}"
if grep -qFx "$model_path" utils/important_models.txt; then
# Append the file to the matrix string
model_arrays+=("$model_path")
fi
done
matrix_string=$(printf '"%s", ' "${model_arrays[@]}" | sed 's/, $//')
echo "matrix=[$matrix_string]" >> $GITHUB_OUTPUT
test_modified_files:
python3 - << 'EOF'
import os
import sys
import json
# Add the utils directory to Python path
sys.path.insert(0, 'utils')
# Import the important models list
from important_files import IMPORTANT_MODELS
print(f"Important models: {IMPORTANT_MODELS}")
# Get the changed files from the previous step
changed_files_str = os.environ.get('CHANGED_FILES', '')
changed_files = changed_files_str.split() if changed_files_str else []
# Filter to only Python files
python_files = [f for f in changed_files if f.endswith('.py')]
print(f"Python files changed: {python_files}")
result_models = set()
# Specific files that trigger all models
transformers_utils_files = [
'modeling_utils.py',
'modeling_rope_utils.py',
'modeling_flash_attention_utils.py',
'modeling_attn_mask_utils.py',
'cache_utils.py',
'masking_utils.py',
'pytorch_utils.py'
]
# Single loop through all Python files
for file in python_files:
# Check for files under src/transformers/models/
if file.startswith('src/transformers/models/'):
remaining_path = file[len('src/transformers/models/'):]
if '/' in remaining_path:
model_dir = remaining_path.split('/')[0]
if model_dir in IMPORTANT_MODELS:
result_models.add(model_dir)
print(f"Added model directory: {model_dir}")
# Check for specific files under src/transformers/ or src/transformers/generation/ files
elif file.startswith('src/transformers/generation/') or \
(file.startswith('src/transformers/') and os.path.basename(file) in transformers_utils_files):
print(f"Found core file: {file} - including all important models")
result_models.update(IMPORTANT_MODELS)
break # No need to continue once we include all models
# Convert to sorted list and create matrix
result_list = sorted(list(result_models))
print(f"Final model list: {result_list}")
if result_list:
matrix_json = json.dumps(result_list)
print(f"matrix={matrix_json}")
# Write to GITHUB_OUTPUT
with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
f.write(f"matrix={matrix_json}\n")
else:
print("matrix=[]")
with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
f.write("matrix=[]\n")
EOF
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled.yml
needs: get_modified_models
name: Slow & FA2 tests
runs-on: [single-gpu, nvidia-gpu, a10, ci]
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
if: ${{ needs.get_modified_models.outputs.matrix != '[]' && needs.get_modified_models.outputs.matrix != '' && fromJson(needs.get_modified_models.outputs.matrix)[0] != null }}
strategy:
fail-fast: false
matrix:
model-name: ${{ fromJson(needs.get_modified_models.outputs.matrix) }}
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Install locally transformers & other libs
run: |
apt install sudo
sudo -H pip install --upgrade pip
sudo -H pip uninstall -y transformers
sudo -H pip install -U -e ".[testing]"
MAX_JOBS=4 pip install flash-attn --no-build-isolation
pip install bitsandbytes
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Show installed libraries and their versions
run: pip freeze
- name: Run FA2 tests
id: run_fa2_tests
run:
pytest -rsfE -m "flash_attn_test" --make-reports=${{ matrix.model-name }}_fa2_tests/ tests/${{ matrix.model-name }}/test_modeling_*
- name: "Test suite reports artifacts: ${{ matrix.model-name }}_fa2_tests"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.model-name }}_fa2_tests
path: /transformers/reports/${{ matrix.model-name }}_fa2_tests
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.OUTPUT_SLACK_CHANNEL_ID }}
title: 🤗 Results of the FA2 tests - ${{ matrix.model-name }}
status: ${{ steps.run_fa2_tests.conclusion}}
slack_token: ${{ secrets.CI_SLACK_BOT_TOKEN }}
- name: Run integration tests
id: run_integration_tests
if: always()
run:
pytest -rsfE -k "IntegrationTest" --make-reports=tests_integration_${{ matrix.model-name }} tests/${{ matrix.model-name }}/test_modeling_*
- name: "Test suite reports artifacts: tests_integration_${{ matrix.model-name }}"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: tests_integration_${{ matrix.model-name }}
path: /transformers/reports/tests_integration_${{ matrix.model-name }}
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.OUTPUT_SLACK_CHANNEL_ID }}
title: 🤗 Results of the Integration tests - ${{ matrix.model-name }}
status: ${{ steps.run_integration_tests.conclusion}}
slack_token: ${{ secrets.CI_SLACK_BOT_TOKEN }}
- name: Tailscale # In order to be able to SSH when a test fails
if: ${{ runner.debug == '1'}}
uses: huggingface/tailscale-action@v1
with:
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
waitForSSH: true
benchmark:
name: Benchmark workflow
needs: get_modified_models
if: ${{ needs.get_modified_models.outputs.matrix != '[]' && needs.get_modified_models.outputs.matrix != '' && fromJson(needs.get_modified_models.outputs.matrix)[0] != null }}
uses: ./.github/workflows/benchmark.yml
if: needs.get_modified_models.outputs.matrix != '' && needs.get_modified_models.outputs.matrix != '[]'
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-push"
docker: huggingface/transformers-all-latest-gpu
ci_event: push
report_repo_id: hf-internal-testing/transformers_ci_push
commit_sha: ${{ github.sha }}
models: ${{ needs.get_modified_models.outputs.matrix }}
secrets: inherit

415
.github/workflows/self-comment-ci.yml vendored Normal file
View File

@ -0,0 +1,415 @@
name: PR comment GitHub CI
on:
issue_comment:
types:
- created
branches-ignore:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.event.issue.number }}-${{ startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow') }}
cancel-in-progress: true
permissions: read-all
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
get-pr-number:
runs-on: ubuntu-22.04
name: Get PR number
# For security: only allow team members to run
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:
- name: Get PR number
shell: bash
run: |
if [[ "${{ github.event.issue.number }}" != "" && "${{ github.event.issue.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
else
echo "PR_NUMBER=" >> $GITHUB_ENV
fi
- name: Check PR number
shell: bash
run: |
echo "${{ env.PR_NUMBER }}"
- name: Set PR number
id: set_pr_number
run: echo "PR_NUMBER=${{ env.PR_NUMBER }}" >> "$GITHUB_OUTPUT"
get-sha:
runs-on: ubuntu-22.04
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
outputs:
PR_HEAD_SHA: ${{ steps.get_sha.outputs.PR_HEAD_SHA }}
PR_MERGE_SHA: ${{ steps.get_sha.outputs.PR_MERGE_SHA }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: "0"
ref: "refs/pull/${{needs.get-pr-number.outputs.PR_NUMBER}}/merge"
- name: Get SHA (and verify timestamps against the issue comment date)
id: get_sha
env:
PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
COMMENT_DATE: ${{ github.event.comment.created_at }}
run: |
git fetch origin refs/pull/$PR_NUMBER/head:refs/remotes/pull/$PR_NUMBER/head
git checkout refs/remotes/pull/$PR_NUMBER/head
echo "PR_HEAD_SHA: $(git log -1 --format=%H)"
echo "PR_HEAD_SHA=$(git log -1 --format=%H)" >> "$GITHUB_OUTPUT"
git fetch origin refs/pull/$PR_NUMBER/merge:refs/remotes/pull/$PR_NUMBER/merge
git checkout refs/remotes/pull/$PR_NUMBER/merge
echo "PR_MERGE_SHA: $(git log -1 --format=%H)"
echo "PR_MERGE_SHA=$(git log -1 --format=%H)" >> "$GITHUB_OUTPUT"
PR_MERGE_COMMIT_TIMESTAMP=$(git log -1 --date=unix --format=%cd)
echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
echo "COMMENT_DATE: $COMMENT_DATE"
echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
exit -1;
fi
# use a python script to handle this complex logic
# case 1: `run-slow` (auto. infer with limited number of models, but in particular, new model)
# case 2: `run-slow model_1, model_2`
get-tests:
runs-on: ubuntu-22.04
needs: [get-pr-number, get-sha]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
outputs:
models: ${{ steps.models_to_run.outputs.models }}
quantizations: ${{ steps.models_to_run.outputs.quantizations }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: "0"
ref: "refs/pull/${{needs.get-pr-number.outputs.PR_NUMBER}}/merge"
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
echo "The merged commit SHA is not the same as the verified one! Security issue detected, abort the workflow!";
exit -1;
fi
- name: Get models to test
env:
PR_COMMENT: ${{ github.event.comment.body }}
run: |
python -m pip install GitPython
python utils/pr_slow_ci_models.py --message "$PR_COMMENT" | tee output.txt
echo "models=$(tail -n 1 output.txt)" >> $GITHUB_ENV
python utils/pr_slow_ci_models.py --message "$PR_COMMENT" --quantization | tee output2.txt
echo "quantizations=$(tail -n 1 output2.txt)" >> $GITHUB_ENV
- name: Show models to test
id: models_to_run
run: |
echo "${{ env.models }}"
echo "models=${{ env.models }}" >> $GITHUB_ENV
echo "models=${{ env.models }}" >> $GITHUB_OUTPUT
echo "${{ env.quantizations }}"
echo "quantizations=${{ env.quantizations }}" >> $GITHUB_OUTPUT
reply_to_comment:
name: Reply to the comment
if: ${{ needs.get-tests.outputs.models != '[]' || needs.get-tests.outputs.quantizations != '[]' }}
needs: [get-pr-number, get-tests]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Reply to the comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
MODELS: ${{ needs.get-tests.outputs.models }}
BODY: "\n\nmodels: ${{ needs.get-tests.outputs.models }}\nquantizations: ${{ needs.get-tests.outputs.quantizations }}"
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/issues/${{ needs.get-pr-number.outputs.PR_NUMBER }}/comments \
-f "body=This comment contains run-slow, running the specified jobs: ${{ env.BODY }} ..."
create_run:
name: Create run
if: ${{ needs.get-tests.outputs.models != '[]' || needs.get-tests.outputs.quantizations != '[]' }}
needs: [get-sha, get-tests, reply_to_comment]
permissions:
statuses: write
runs-on: ubuntu-22.04
steps:
- name: Create Run
id: create_run
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.
# See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-sha.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Slow CI job" -f "context=pytest/custom-tests"
run_models_gpu:
name: Run all tests for the model
if: ${{ needs.get-tests.outputs.models != '[]' }}
needs: [get-pr-number, get-sha, get-tests, create_run]
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.models) }}
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ matrix.folders }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Checkout to PR merge commit
working-directory: /transformers
run: |
git fetch origin refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge:refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git checkout refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git log -1 --format=%H
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
working-directory: /transformers
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
echo "The merged commit SHA is not the same as the verified one! Security issue detected, abort the workflow!";
exit -1;
fi
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all tests on GPU
working-directory: /transformers
run: |
export CUDA_VISIBLE_DEVICES="$(python3 utils/set_cuda_devices_for_ci.py --test_folder ${{ matrix.folders }})"
echo $CUDA_VISIBLE_DEVICES
python3 -m pytest -v -rsfE --make-reports=${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Make sure report directory exists
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
run_quantization_torch_gpu:
name: Run all tests for a quantization
if: ${{ needs.get-tests.outputs.quantizations != '[]' }}
needs: [get-pr-number, get-sha, get-tests, create_run]
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.quantizations) }}
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-quantization-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo folder ${{ matrix.folders }}
shell: bash
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'quantization/'/'quantization_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Checkout to PR merge commit
working-directory: /transformers
run: |
git fetch origin refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge:refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git checkout refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git log -1 --format=%H
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
working-directory: /transformers
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
echo "The merged commit SHA is not the same as the verified one! Security issue detected, abort the workflow!";
exit -1;
fi
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run quantization tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Make sure report directory exists
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports
update_run_status:
name: Update Check Run Status
needs: [get-sha, create_run, run_models_gpu, run_quantization_torch_gpu]
permissions:
statuses: write
if: ${{ always() && needs.create_run.result == 'success' }}
runs-on: ubuntu-22.04
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
STATUS_OK: ${{ contains(fromJSON('["skipped", "success"]'), needs.run_models_gpu.result) && contains(fromJSON('["skipped", "success"]'), needs.run_quantization_torch_gpu.result) }}
steps:
- name: Get `run_models_gpu` job status
run: |
echo "${{ needs.run_models_gpu.result }}"
echo "${{ needs.run_quantization_torch_gpu.result }}"
echo $STATUS_OK
if [ "$STATUS_OK" = "true" ]; then
echo "STATUS=success" >> $GITHUB_ENV
else
echo "STATUS=failure" >> $GITHUB_ENV
fi
- name: Update PR commit statuses
run: |
echo "${{ needs.run_models_gpu.result }}"
echo "${{ env.STATUS }}"
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-sha.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=${{ env.STATUS }}" -f "description=Slow CI job" -f "context=pytest/custom-tests"

View File

@ -1,43 +1,56 @@
name: Self-hosted runner (nightly-ci)
name: Nvidia CI with nightly torch
on:
repository_dispatch:
schedule:
- cron: "17 2 * * *"
# triggered when the daily scheduled Nvidia CI is completed.
# This way, we can compare the results more easily.
workflow_run:
workflows: ["Nvidia CI"]
branches: ["main"]
types: [completed]
push:
branches:
- run_nightly_ci*
- run_ci_with_nightly_torch*
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: ""
other_workflow_run_id: ""
jobs:
build_nightly_ci_images:
name: Build Nightly CI Docker Images
if: (github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_nightly_ci'))
build_nightly_torch_ci_images:
name: Build CI Docker Images with nightly torch
uses: ./.github/workflows/build-nightly-ci-docker-images.yml
with:
job: latest-with-torch-nightly-docker
secrets: inherit
setup:
name: Setup
runs-on: ubuntu-22.04
steps:
- name: Setup
run: |
mkdir "setup_values"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: setup_values
path: setup_values
model-ci:
name: Model CI
needs: [build_nightly_ci_images]
needs: build_nightly_torch_ci_images
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-past-future"
runner: ci
docker: huggingface/transformers-all-latest-torch-nightly-gpu
ci_event: Nightly CI
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
needs: [build_nightly_ci_images]
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-past-future"
runner: ci
# test deepspeed nightly build with the latest release torch
docker: huggingface/transformers-pytorch-deepspeed-latest-gpu
ci_event: Nightly CI
working-directory-prefix: /workspace
report_repo_id: hf-internal-testing/transformers_daily_ci_with_torch_nightly
commit_sha: ${{ github.event.workflow_run.head_sha || github.sha }}
secrets: inherit

View File

@ -21,39 +21,6 @@ jobs:
echo "$(python3 -c 'print(int(${{ github.run_number }}) % 10)')"
echo "run_number=$(python3 -c 'print(int(${{ github.run_number }}) % 10)')" >> $GITHUB_OUTPUT
run_past_ci_pytorch_1-13:
name: PyTorch 1.13
needs: get_number
if: needs.get_number.outputs.run_number == 0 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')))
uses: ./.github/workflows/self-past-caller.yml
with:
framework: pytorch
version: "1.13"
sha: ${{ github.sha }}
secrets: inherit
run_past_ci_pytorch_1-12:
name: PyTorch 1.12
needs: get_number
if: needs.get_number.outputs.run_number == 1 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')))
uses: ./.github/workflows/self-past-caller.yml
with:
framework: pytorch
version: "1.12"
sha: ${{ github.sha }}
secrets: inherit
run_past_ci_pytorch_1-11:
name: PyTorch 1.11
needs: get_number
if: needs.get_number.outputs.run_number == 2 && (cancelled() != true) && ((github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_past_ci')))
uses: ./.github/workflows/self-past-caller.yml
with:
framework: pytorch
version: "1.11"
sha: ${{ github.sha }}
secrets: inherit
run_past_ci_tensorflow_2-11:
name: TensorFlow 2.11
needs: get_number

View File

@ -1,135 +0,0 @@
name: PR slow CI
on:
pull_request:
paths:
- "src/transformers/models/*/modeling_*.py"
- "tests/**/test_*.py"
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
jobs:
find_models_to_run:
runs-on: ubuntu-22.04
name: Find models to run slow tests
# Triggered only if the required label `run-slow` is added
if: ${{ contains(github.event.pull_request.labels.*.name, 'run-slow') }}
outputs:
models: ${{ steps.models_to_run.outputs.models }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: "0"
ref: ${{ github.event.pull_request.head.sha }}
- name: Get commit message
run: |
echo "commit_message=$(git show -s --format=%s)" >> $GITHUB_ENV
- name: Get models to run slow tests
run: |
echo "${{ env.commit_message }}"
python -m pip install GitPython
python utils/pr_slow_ci_models.py --commit_message "${{ env.commit_message }}" | tee output.txt
echo "models=$(tail -n 1 output.txt)" >> $GITHUB_ENV
- name: Models to run slow tests
id: models_to_run
run: |
echo "${{ env.models }}"
echo "models=${{ env.models }}" >> $GITHUB_OUTPUT
run_models_gpu:
name: Run all tests for the model
# Triggered only `find_models_to_run` is triggered (label `run-slow` is added) which gives the models to run
# (either a new model PR or via a commit message)
if: ${{ needs.find_models_to_run.outputs.models != '[]' }}
needs: find_models_to_run
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.find_models_to_run.outputs.models) }}
machine_type: [single-gpu, multi-gpu]
runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, ci]
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ matrix.folders }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
run: git fetch && git fetch origin pull/${{ github.event.pull_request.number }}/head:pull/${{ github.event.pull_request.number }}/merge && git checkout pull/${{ github.event.pull_request.number }}/merge
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all tests on GPU
working-directory: /transformers
run: |
export CUDA_VISIBLE_DEVICES="$(python3 utils/set_cuda_devices_for_ci.py --test_folder ${{ matrix.folders }})"
echo $CUDA_VISIBLE_DEVICES
python3 -m pytest -v -rsfE --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Make sure report directory exists
shell: bash
run: |
mkdir -p /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports

View File

@ -1,25 +1,25 @@
name: Self-hosted runner (AMD mi210 CI caller)
on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi210
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi210
secrets: inherit
name: Self-hosted runner (AMD mi210 CI caller)
on:
#workflow_run:
# workflows: ["Self-hosted runner (push-caller)"]
# branches: ["main"]
# types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi210
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi210
secrets: inherit

View File

@ -1,25 +1,25 @@
name: Self-hosted runner (AMD mi250 CI caller)
on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi250
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi250
secrets: inherit
name: Self-hosted runner (AMD mi250 CI caller)
on:
#workflow_run:
# workflows: ["Self-hosted runner (push-caller)"]
# branches: ["main"]
# types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi250
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi250
secrets: inherit

View File

@ -1,25 +0,0 @@
name: Self-hosted runner (AMD mi300 CI caller)
on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi300
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && (startsWith(github.ref_name, 'run_amd_push_ci_caller') || startsWith(github.ref_name, 'mi300-ci'))))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi300
secrets: inherit

View File

@ -14,7 +14,6 @@ env:
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 60
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
jobs:

View File

@ -25,7 +25,7 @@ jobs:
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v41
uses: tj-actions/changed-files@1c8e6069583811afb28f97afeaf8e7da80c6be5c
- name: Was setup changed
id: was_changed
@ -51,4 +51,4 @@ jobs:
needs: build-docker-containers
steps:
- name: Trigger push CI via workflow_run
run: echo "Trigger push CI via workflow_run"
run: echo "Trigger push CI via workflow_run"

View File

@ -24,7 +24,6 @@ env:
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 60
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
jobs:
@ -32,11 +31,12 @@ jobs:
name: Setup
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, push-ci]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
test_map: ${{ steps.set-matrix.outputs.test_map }}
@ -131,11 +131,12 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [single-gpu]
runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, push-ci]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
@ -162,6 +163,23 @@ jobs:
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /transformers
run: |
@ -203,19 +221,19 @@ jobs:
- name: Run all non-slow selected tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ env.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
run: cat /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
name: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}
run_tests_multi_gpu:
name: Model tests
@ -226,8 +244,9 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [multi-gpu]
runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, push-ci]
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -257,6 +276,23 @@ jobs:
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /transformers
run: |
@ -300,19 +336,19 @@ jobs:
MKL_SERVICE_FORCE_INTEL: 1
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ env.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
run: cat /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
name: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}
run_tests_torch_cuda_extensions_single_gpu:
name: Torch CUDA extension tests
@ -321,100 +357,9 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [single-gpu]
runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, push-ci]
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /workspace/transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /workspace/transformers
run: |
python utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /workspace/transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
run_tests_torch_cuda_extensions_multi_gpu:
name: Torch CUDA extension tests
needs: setup
if: contains(fromJson(needs.setup.outputs.matrix), 'deepspeed') || contains(fromJson(needs.setup.outputs.matrix), 'extended')
strategy:
fail-fast: false
matrix:
machine_type: [multi-gpu]
runs-on: ['${{ matrix.machine_type }}', nvidia-gpu, t4, push-ci]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -444,6 +389,23 @@ jobs:
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /workspace/transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
@ -484,19 +446,129 @@ jobs:
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
run: cat /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
run_tests_torch_cuda_extensions_multi_gpu:
name: Torch CUDA extension tests
needs: setup
if: contains(fromJson(needs.setup.outputs.matrix), 'deepspeed') || contains(fromJson(needs.setup.outputs.matrix), 'extended')
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /workspace/transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /workspace/transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /workspace/transformers
run: |
python utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /workspace/transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
send_results:
name: Send results to webhook
@ -575,6 +647,6 @@ jobs:
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install huggingface_hub
pip install slack_sdk
pip install slack_sdk
pip show slack_sdk
python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"

View File

@ -1,55 +0,0 @@
name: Self-hosted runner (AMD mi210 scheduled CI caller)
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled-amd.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi210
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi210
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: ./.github/workflows/self-scheduled-amd.yml
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi210
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi210
secrets: inherit
example-ci:
name: Example CI
uses: ./.github/workflows/self-scheduled-amd.yml
with:
job: run_examples_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi210
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi210
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: ./.github/workflows/self-scheduled-amd.yml
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi210
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi210
secrets: inherit

View File

@ -1,55 +1,59 @@
name: Self-hosted runner (AMD mi250 scheduled CI caller)
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled-amd.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: ./.github/workflows/self-scheduled-amd.yml
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
secrets: inherit
example-ci:
name: Example CI
uses: ./.github/workflows/self-scheduled-amd.yml
with:
job: run_examples_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: ./.github/workflows/self-scheduled-amd.yml
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
secrets: inherit
name: Self-hosted runner (AMD mi250 scheduled CI caller)
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
report_repo_id: optimum-amd/transformers_daily_ci
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
report_repo_id: optimum-amd/transformers_daily_ci
secrets: inherit
example-ci:
name: Example CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_examples_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
report_repo_id: optimum-amd/transformers_daily_ci
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
report_repo_id: optimum-amd/transformers_daily_ci
secrets: inherit

View File

@ -0,0 +1,67 @@
name: Self-hosted runner scale set (AMD mi325 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu scale set: amd-mi325-ci-1gpu
# 2gpu scale set: amd-mi325-ci-2gpu
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
example-ci:
name: Example CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit

View File

@ -0,0 +1,63 @@
name: Self-hosted runner scale set (AMD mi355 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu : amd-mi355-ci-1gpu
# 2gpu : amd-mi355-ci-2gpu
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
example-ci:
name: Example CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit

View File

@ -1,349 +0,0 @@
name: Self-hosted runner (scheduled-amd)
# Note: For the AMD CI, we rely on a caller workflow and on the workflow_call event to trigger the
# CI in order to run it on both MI210 and MI250, without having to use matrix here which pushes
# us towards the limit of allowed jobs on GitHub Actions.
on:
workflow_call:
inputs:
job:
required: true
type: string
slack_report_channel:
required: true
type: string
runner:
required: true
type: string
docker:
required: true
type: string
ci_event:
required: true
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
NUM_SLICES: 2
# Important note: each job (run_tests_single_gpu, run_tests_multi_gpu, run_examples_gpu, run_pipelines_torch_gpu) requires all the previous jobs before running.
# This is done so that we avoid parallelizing the scheduled tests, to leave available
# runners for the push CI that is running on the same machine.
jobs:
check_runner_status:
name: Check Runner Status
runs-on: ubuntu-22.04
steps:
- name: Checkout transformers
uses: actions/checkout@v4
with:
fetch-depth: 2
- name: Check Runner Status
run: python utils/check_self_hosted_runner.py --target_runners hf-amd-mi210-ci-1gpu-1,hf-amd-mi250-ci-1gpu-1,hf-amd-mi300-ci-1gpu-1 --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
check_runners:
name: Check Runners
needs: check_runner_status
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ['${{ matrix.machine_type }}', self-hosted, amd-gpu, '${{ inputs.runner }}']
container:
image: huggingface/transformers-pytorch-amd-gpu
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
setup:
if: contains(fromJSON('["run_models_gpu"]'), inputs.job)
name: Setup
needs: check_runners
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ['${{ matrix.machine_type }}', self-hosted, amd-gpu, '${{ inputs.runner }}']
container:
image: huggingface/transformers-pytorch-amd-gpu
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
steps:
- name: Update clone
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
- name: Cleanup
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- id: set-matrix
name: Identify models to test
working-directory: /transformers/tests
run: |
echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
run_models_gpu:
if: ${{ inputs.job == 'run_models_gpu' }}
name: Single GPU tests
needs: setup
strategy:
max-parallel: 1 # For now, not to parallelize. Can change later if it works well.
fail-fast: false
matrix:
machine_type: [single-gpu, multi-gpu]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs_amd.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner: ${{ inputs.runner }}
docker: ${{ inputs.docker }}
secrets: inherit
run_pipelines_torch_gpu:
if: ${{ inputs.job == 'run_pipelines_torch_gpu' }}
name: PyTorch pipelines
needs: check_runners
strategy:
fail-fast: false
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ['${{ matrix.machine_type }}', self-hosted, amd-gpu, '${{ inputs.runner }}']
container:
image: ${{ inputs.docker }}
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all pipeline tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports tests/pipelines -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_run_pipelines_torch_gpu_test_reports
run_examples_gpu:
if: ${{ inputs.job == 'run_examples_gpu' }}
name: Examples directory
needs: check_runners
strategy:
fail-fast: false
matrix:
machine_type: [single-gpu]
runs-on: ['${{ matrix.machine_type }}', self-hosted, amd-gpu, '${{ inputs.runner }}']
container:
image: ${{ inputs.docker }}
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run examples tests on GPU
working-directory: /transformers
run: |
pip install -r examples/pytorch/_tests_requirements.txt
python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_examples_gpu_test_reports examples/pytorch -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_run_examples_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_examples_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_examples_gpu_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_run_examples_gpu_test_reports
run_torch_cuda_extensions_gpu:
if: ${{ inputs.job == 'run_torch_cuda_extensions_gpu' }}
name: Torch ROCm deepspeed tests
needs: check_runners
strategy:
fail-fast: false
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ['${{ matrix.machine_type }}', self-hosted, amd-gpu, '${{ inputs.runner }}']
container:
image: ${{ inputs.docker }}
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
send_results:
name: Slack Report
needs: [
check_runner_status,
check_runners,
setup,
run_models_gpu,
run_pipelines_torch_gpu,
run_examples_gpu,
run_torch_cuda_extensions_gpu
]
if: ${{ always() }}
uses: ./.github/workflows/slack-report.yml
with:
job: ${{ inputs.job }}
# This would be `skipped` if `setup` is skipped.
setup_status: ${{ needs.setup.result }}
slack_report_channel: ${{ inputs.slack_report_channel }}
# This would be an empty string if `setup` is skipped.
folder_slices: ${{ needs.setup.outputs.folder_slices }}
quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
ci_event: ${{ inputs.ci_event }}
secrets: inherit

View File

@ -1,5 +1,4 @@
name: Self-hosted runner (scheduled)
name: Nvidia CI
on:
repository_dispatch:
@ -7,72 +6,53 @@ on:
- cron: "17 2 * * *"
push:
branches:
- run_scheduled_ci*
- multi_jobs_to_check_bad_commit
workflow_dispatch:
inputs:
prev_workflow_run_id:
description: 'previous workflow run id to compare'
type: string
required: false
default: ""
other_workflow_run_id:
description: 'other workflow run id to compare'
type: string
required: false
default: ""
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: "18548615847"
other_workflow_run_id: ""
jobs:
setup:
name: Setup
runs-on: ubuntu-22.04
steps:
- name: Setup
run: |
mkdir "setup_values"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: setup_values
path: setup_values
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-daily-models"
runner: daily-ci
slack_report_channel: "#transformers-ci-dummy"
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-pipeline-torch"
runner: daily-ci
docker: huggingface/transformers-pytorch-gpu
ci_event: Daily CI
secrets: inherit
tf-pipeline:
name: TF pipeline CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_pipelines_tf_gpu
slack_report_channel: "#transformers-ci-daily-pipeline-tf"
runner: daily-ci
docker: huggingface/transformers-tensorflow-gpu
ci_event: Daily CI
secrets: inherit
example-ci:
name: Example CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_examples_gpu
slack_report_channel: "#transformers-ci-daily-examples"
runner: daily-ci
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-daily-deepspeed"
runner: daily-ci
docker: huggingface/transformers-pytorch-deepspeed-latest-gpu
ci_event: Daily CI
working-directory-prefix: /workspace
secrets: inherit
quantization-ci:
name: Quantization CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_quantization_torch_gpu
slack_report_channel: "#transformers-ci-daily-quantization"
runner: daily-ci
docker: huggingface/transformers-quantization-latest-gpu
ci_event: Daily CI
runner_type: "a10"
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit

View File

@ -0,0 +1,341 @@
name: Self-hosted runner (scheduled-intel-gaudi)
on:
workflow_call:
inputs:
job:
required: true
type: string
slack_report_channel:
required: true
type: string
runner_scale_set:
required: true
type: string
ci_event:
required: true
type: string
report_repo_id:
required: true
type: string
env:
NUM_SLICES: 2
RUN_SLOW: yes
PT_HPU_LAZY_MODE: 0
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:
setup:
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Setup
runs-on: ubuntu-latest
outputs:
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
quantization_matrix: ${{ steps.set-matrix.outputs.quantization_matrix }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- id: set-matrix
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Identify models to test
working-directory: tests
run: |
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
fi
- id: set-matrix-quantization
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
name: Identify quantization method to test
working-directory: tests
run: |
echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ; print(d)')" >> $GITHUB_OUTPUT
run_models_gpu:
if: ${{ inputs.job == 'run_models_gpu' }}
name: " "
needs: setup
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs_intel_gaudi.yml
with:
slice_id: ${{ matrix.slice_id }}
machine_type: ${{ matrix.machine_type }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
secrets: inherit
run_trainer_and_fsdp_gpu:
if: ${{ inputs.job == 'run_trainer_and_fsdp_gpu' }}
name: " "
needs: setup
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs_intel_gaudi.yml
with:
slice_id: ${{ matrix.slice_id }}
machine_type: ${{ matrix.machine_type }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit
run_pipelines_torch_gpu:
if: ${{ inputs.job == 'run_pipelines_torch_gpu' }}
name: Pipelines
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: python3 utils/print_env.py
- name: Show installed libraries and their versions
run: pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all pipeline tests on Intel Gaudi
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports tests/pipelines -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
path: reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
run_examples_gpu:
if: ${{ inputs.job == 'run_examples_gpu' }}
name: Examples directory
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
run: |
pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run examples tests on Intel Gaudi
run: |
pip install -r examples/pytorch/_tests_requirements.txt
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_examples_gpu_test_reports examples/pytorch -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_examples_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_examples_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_examples_gpu_test_reports
path: reports/${{ env.machine_type }}_run_examples_gpu_test_reports
run_torch_cuda_extensions_gpu:
if: ${{ inputs.job == 'run_torch_cuda_extensions_gpu' }}
name: Intel Gaudi deepspeed tests
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.20.0
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
run: |
pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all deepspeed tests on intel Gaudi
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
send_results:
name: Slack Report
needs:
[
setup,
run_models_gpu,
run_examples_gpu,
run_torch_cuda_extensions_gpu,
run_pipelines_torch_gpu,
run_trainer_and_fsdp_gpu,
]
if: ${{ always() }}
uses: ./.github/workflows/slack-report.yml
with:
job: ${{ inputs.job }}
setup_status: ${{ needs.setup.result }}
slack_report_channel: ${{ inputs.slack_report_channel }}
quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
report_repo_id: ${{ inputs.report_repo_id }}
ci_event: ${{ inputs.ci_event }}
secrets: inherit

View File

@ -0,0 +1,67 @@
name: Self-hosted runner (Intel Gaudi3 scheduled CI caller)
on:
repository_dispatch:
workflow_dispatch:
schedule:
- cron: "17 2 * * *"
jobs:
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_models_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
pipeline-ci:
name: Pipeline CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_pipelines_torch_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
example-ci:
name: Example CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_examples_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_torch_cuda_extensions_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
trainer-fsdp-ci:
name: Trainer/FSDP CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_trainer_and_fsdp_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit

View File

@ -1,4 +1,4 @@
name: Self-hosted runner (scheduled)
name: Nvidia CI (job definitions)
# Note that each job's dependencies go into a corresponding docker file.
#
@ -15,9 +15,6 @@ on:
slack_report_channel:
required: true
type: string
runner:
required: true
type: string
docker:
required: true
type: string
@ -28,6 +25,19 @@ on:
default: ''
required: false
type: string
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
runner_type:
required: false
type: string
models:
default: ""
required: false
type: string
env:
HF_HOME: /mnt/cache
@ -38,24 +48,22 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
NUM_SLICES: 2
jobs:
setup:
if: contains(fromJSON('["run_models_gpu", "run_quantization_torch_gpu"]'), inputs.job)
name: Setup
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)
strategy:
matrix:
machine_type: [aws-g4dn-2xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
@ -64,7 +72,7 @@ jobs:
- name: Update clone
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Cleanup
working-directory: /transformers
@ -78,12 +86,17 @@ jobs:
run: pip freeze
- id: set-matrix
if: ${{ inputs.job == 'run_models_gpu' }}
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Identify models to test
working-directory: /transformers/tests
run: |
echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --models '${{ inputs.models }}' --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
fi
- id: set-matrix-quantization
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
@ -103,15 +116,38 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-2xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner: ${{ inputs.runner }}
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
secrets: inherit
run_trainer_and_fsdp_gpu:
if: ${{ inputs.job == 'run_trainer_and_fsdp_gpu' }}
name: " "
needs: setup
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
slice_id: [0, 1]
uses: ./.github/workflows/model_jobs.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit
run_pipelines_torch_gpu:
@ -120,7 +156,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-2xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -129,7 +165,7 @@ jobs:
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -154,9 +190,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -182,91 +218,22 @@ jobs:
name: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
run_pipelines_tf_gpu:
if: ${{ inputs.job == 'run_pipelines_tf_gpu' }}
name: TensorFlow pipelines
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-2xlarge-cache, aws-g4dn-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-tensorflow-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all pipeline tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports tests/pipelines
- name: Failure short reports
if: ${{ always() }}
run: |
cat /transformers/reports/${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports
run_examples_gpu:
if: ${{ inputs.job == 'run_examples_gpu' }}
name: Examples directory
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-2xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -291,9 +258,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -326,7 +293,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-2xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -335,7 +302,7 @@ jobs:
steps:
- name: Update clone
working-directory: ${{ inputs.working-directory-prefix }}/transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: ${{ inputs.working-directory-prefix }}/transformers
@ -366,7 +333,7 @@ jobs:
run: |
python3 -m pip uninstall -y deepspeed
rm -rf DeepSpeed
git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build
git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
@ -383,14 +350,14 @@ jobs:
run: pip freeze
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
working-directory: ${{ inputs.working-directory-prefix }}/transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -425,7 +392,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.quantization_matrix) }}
machine_type: [aws-g4dn-2xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -443,7 +410,7 @@ jobs:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -468,9 +435,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -507,6 +474,7 @@ jobs:
uses: actions/checkout@v4
with:
fetch-depth: 2
ref: ${{ inputs.commit_sha || github.sha }}
- name: Install transformers
run: pip install transformers
@ -542,14 +510,14 @@ jobs:
needs: [
setup,
run_models_gpu,
run_trainer_and_fsdp_gpu,
run_pipelines_torch_gpu,
run_pipelines_tf_gpu,
run_examples_gpu,
run_torch_cuda_extensions_gpu,
run_quantization_torch_gpu,
run_extract_warnings
]
if: ${{ always() }}
if: always() && !cancelled()
uses: ./.github/workflows/slack-report.yml
with:
job: ${{ inputs.job }}
@ -560,5 +528,22 @@ jobs:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
ci_event: ${{ inputs.ci_event }}
report_repo_id: ${{ inputs.report_repo_id }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
secrets: inherit
check_new_failures:
if: ${{ always() && inputs.ci_event == 'Daily CI' && needs.send_results.result == 'success' }}
name: Check new failures
needs: send_results
uses: ./.github/workflows/check_failed_tests.yml
with:
docker: ${{ inputs.docker }}
start_sha: ${{ inputs.commit_sha || github.sha }}
job: ${{ inputs.job }}
slack_report_channel: ${{ inputs.slack_report_channel }}
ci_event: ${{ inputs.ci_event }}
report_repo_id: ${{ inputs.report_repo_id }}
secrets: inherit

View File

@ -21,6 +21,13 @@ on:
ci_event:
required: true
type: string
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
env:
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
@ -29,7 +36,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-22.04
if: always()
if: always() && !cancelled()
steps:
- name: Preliminary job status
shell: bash
@ -38,9 +45,28 @@ jobs:
echo "Setup status: ${{ inputs.setup_status }}"
- uses: actions/checkout@v4
with:
fetch-depth: 2
ref: ${{ inputs.commit_sha || github.sha }}
- uses: actions/download-artifact@v4
- name: Prepare some setup values
run: |
if [ -f setup_values/prev_workflow_run_id.txt ]; then
echo "PREV_WORKFLOW_RUN_ID=$(cat setup_values/prev_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "PREV_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
if [ -f setup_values/other_workflow_run_id.txt ]; then
echo "OTHER_WORKFLOW_RUN_ID=$(cat setup_values/other_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "OTHER_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
- name: Send message to Slack
if: ${{ inputs.job != 'run_quantization_torch_gpu' }}
shell: bash
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
@ -49,20 +75,25 @@ jobs:
SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_EVENT: ${{ inputs.ci_event }}
CI_SHA: ${{ github.sha }}
CI_WORKFLOW_REF: ${{ github.workflow_ref }}
# This `CI_TITLE` would be empty for `schedule` or `workflow_run` events.
CI_TITLE: ${{ github.event.head_commit.message }}
CI_SHA: ${{ inputs.commit_sha || github.sha }}
CI_TEST_JOB: ${{ inputs.job }}
SETUP_STATUS: ${{ inputs.setup_status }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
# For a job that doesn't depend on (i.e. `needs`) `setup`, the value for `inputs.folder_slices` would be an
# empty string, and the called script still get one argument (which is the emtpy string).
run: |
sudo apt-get install -y curl
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
python utils/notification_service.py "${{ inputs.folder_slices }}"
if [ "${{ inputs.quantization_matrix }}" != "" ]; then
python utils/notification_service.py "${{ inputs.quantization_matrix }}"
else
python utils/notification_service.py "${{ inputs.folder_slices }}"
fi
# Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack.
- name: Failure table artifacts
@ -70,32 +101,3 @@ jobs:
with:
name: ci_results_${{ inputs.job }}
path: ci_results_${{ inputs.job }}
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
- name: Send message to Slack for quantization workflow
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }}
CI_EVENT: ${{ inputs.ci_event }}
CI_SHA: ${{ github.sha }}
CI_TEST_JOB: ${{ inputs.job }}
SETUP_STATUS: ${{ inputs.setup_status }}
# We pass `needs.setup.outputs.quantization_matrix` as the argument. A processing in `notification_service_quantization.py` to change
# `quantization/bnb` to `quantization_bnb` is required, as the artifact names use `_` instead of `/`.
run: |
sudo apt-get install -y curl
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
python utils/notification_service_quantization.py "${{ inputs.quantization_matrix }}"
# Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack.
- name: Failure table artifacts
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
uses: actions/upload-artifact@v4
with:
name: ci_results_${{ inputs.job }}
path: ci_results_${{ inputs.job }}

View File

@ -5,7 +5,7 @@ on:
inputs:
runner_type:
description: 'Type of runner to test (a10 or t4)'
required: true
required: true
docker_image:
description: 'Name of the Docker image'
required: true
@ -15,20 +15,50 @@ on:
env:
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
RUN_PT_TF_CROSS_TESTS: 1
jobs:
get_runner:
name: "Get runner to use"
runs-on: ubuntu-22.04
outputs:
RUNNER: ${{ steps.set_runner.outputs.RUNNER }}
steps:
- name: Get runner to use
shell: bash
env:
NUM_GPUS: ${{ github.event.inputs.num_gpus }}
RUNNER_TYPE: ${{ github.event.inputs.runner_type }}
run: |
if [[ "$NUM_GPUS" == "single" && "$RUNNER_TYPE" == "t4" ]]; then
echo "RUNNER=aws-g4dn-4xlarge-cache" >> $GITHUB_ENV
elif [[ "$NUM_GPUS" == "multi" && "$RUNNER_TYPE" == "t4" ]]; then
echo "RUNNER=aws-g4dn-12xlarge-cache" >> $GITHUB_ENV
elif [[ "$NUM_GPUS" == "single" && "$RUNNER_TYPE" == "a10" ]]; then
echo "RUNNER=aws-g5-4xlarge-cache" >> $GITHUB_ENV
elif [[ "$NUM_GPUS" == "multi" && "$RUNNER_TYPE" == "a10" ]]; then
echo "RUNNER=aws-g5-12xlarge-cache" >> $GITHUB_ENV
else
echo "RUNNER=" >> $GITHUB_ENV
fi
- name: Set runner to use
id: set_runner
run: |
echo ${{ env.RUNNER }}
echo "RUNNER=${{ env.RUNNER }}" >> $GITHUB_OUTPUT
ssh_runner:
name: "SSH"
runs-on: ["${{ github.event.inputs.num_gpus }}-gpu", nvidia-gpu, "${{ github.event.inputs.runner_type }}", ci]
needs: get_runner
runs-on:
group: ${{ needs.get_runner.outputs.RUNNER }}
container:
image: ${{ github.event.inputs.docker_image }}
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -49,7 +79,7 @@ jobs:
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: NVIDIA-SMI
run: |
nvidia-smi
@ -57,9 +87,11 @@ jobs:
- name: Store Slack infos
#because the SSH can be enabled dynamically if the workflow failed, so we need to store slack infos to be able to retrieve them during the waitforssh step
shell: bash
env:
GITHUB_ACTOR: ${{ github.actor }}
run: |
echo "${{ github.actor }}"
github_actor=${{ github.actor }}
echo "$GITHUB_ACTOR"
github_actor=$GITHUB_ACTOR
github_actor=${github_actor/'-'/'_'}
echo "$github_actor"
echo "github_actor=$github_actor" >> $GITHUB_ENV

View File

@ -16,3 +16,5 @@ jobs:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main
with:
extra_args: --results=verified,unknown

View File

@ -19,7 +19,7 @@ jobs:
- name: Setup environment
run: |
pip install --upgrade pip
pip install datasets pandas==2.0.3
pip install datasets pandas
pip install .[torch,tf,flax]
- name: Update metadata

8
.gitignore vendored
View File

@ -13,6 +13,7 @@ tests/fixtures/cached_*_text.txt
logs/
lightning_logs/
lang_code_data/
reports/
# Distribution / packaging
.Python
@ -97,6 +98,7 @@ celerybeat-schedule
# Environments
.env
.venv
.venv*
env/
venv/
ENV/
@ -167,3 +169,9 @@ tags
# ruff
.ruff_cache
# modular conversion
*.modular_backup
# Cursor IDE files
.cursor/

39
AGENTS.md Normal file
View File

@ -0,0 +1,39 @@
# AGENTS.md Guide for Hugging Face Transformers
This AGENTS.md file provides guidance for code agents working with this codebase.
## Core Project Structure
- `/src/transformers`: This contains the core source code for the library
- `/models`: Code for individual models. Models inherit from base classes in the root `/src/transformers` directory.
- `/tests`: This contains the core test classes for the library. These are usually inherited rather than directly run.
- `/models`: Tests for individual models. Model tests inherit from common tests in the root `/tests` directory.
- `/docs`: This contains the documentation for the library, including guides, tutorials, and API references.
## Coding Conventions for Hugging Face Transformers
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
Many models in the codebase have similar code, but it is not shared by inheritance because we want each model file to be self-contained.
We use two mechanisms to keep this code in sync:
- "Copied from" syntax. Functions or entire classes can have a comment at the top like this: `# Copied from transformers.models.llama.modeling_llama.rotate_half` or `# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->MT5`
These comments are actively checked by the style tools, and copies will automatically be updated when the base code is updated. If you need to update a copied function, you should
either update the base function and use `make fixup` to propagate the change to all copies, or simply remove the `# Copied from` comment if that is inappropriate.
- "Modular" files. These files briefly define models by composing them using inheritance from other models. They are not meant to be used directly. Instead, the style tools
automatically generate a complete modeling file, like `modeling_bert.py`, from the modular file like `modular_bert.py`. If a model has a modular file, the modeling file
should never be edited directly! Instead, changes should be made in the modular file, and then you should run `make fixup` to update the modeling file automatically.
When adding new models, you should prefer `modular` style.
## Testing
After making changes, you should usually run `make fixup` to ensure any copies and modular files are updated, and then test all affected models. This includes both
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.

View File

@ -68,8 +68,7 @@ already reported** (use the search bar on GitHub under Issues). Your issue shoul
Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:
* Your **OS type and version** and **Python**, **PyTorch** and
**TensorFlow** versions when applicable.
* Your **OS type and version** and **Python**, and **PyTorch** versions when applicable.
* A short, self-contained, code snippet that allows us to reproduce the bug in
less than 30s.
* The *full* traceback if an exception is raised.
@ -78,7 +77,7 @@ Once you've confirmed the bug hasn't already been reported, please include the f
To get the OS and software versions automatically, run the following command:
```bash
transformers-cli env
transformers env
```
You can also run the same command from the root of the repository:
@ -132,7 +131,7 @@ You will need basic `git` proficiency to contribute to
manual. Type `git --help` in a shell and enjoy! If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
You'll need **[Python 3.8](https://github.com/huggingface/transformers/blob/main/setup.py#L449)** or above to contribute to 🤗 Transformers. Follow the steps below to start contributing:
You'll need **[Python 3.9](https://github.com/huggingface/transformers/blob/main/setup.py#L449)** or above to contribute to 🤗 Transformers. Follow the steps below to start contributing:
1. Fork the [repository](https://github.com/huggingface/transformers) by
clicking on the **[Fork](https://github.com/huggingface/transformers/fork)** button on the repository's page. This creates a copy of the code
@ -165,8 +164,7 @@ You'll need **[Python 3.8](https://github.com/huggingface/transformers/blob/main
mode with the `-e` flag.
Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a
failure with this command. If that's the case make sure to install the Deep Learning framework you are working with
(PyTorch, TensorFlow and/or Flax) then do:
failure with this command. If that's the case make sure to install Pytorch then do:
```bash
pip install -e ".[quality]"
@ -221,10 +219,10 @@ You'll need **[Python 3.8](https://github.com/huggingface/transformers/blob/main
[Checks on a Pull Request](https://huggingface.co/docs/transformers/pr_checks) guide.
If you're modifying documents under the `docs/source` directory, make sure the documentation can still be built. This check will also run in the CI when you open a pull request. To run a local check
make sure you install the documentation builder:
make sure you install the [documentation builder](https://github.com/huggingface/doc-builder).
```bash
pip install ".[docs]"
pip install hf-doc-builder
```
Run the following command from the root of the repository:
@ -280,13 +278,14 @@ are working on it).<br>
useful to avoid duplicated work, and to differentiate it from PRs ready to be merged.<br>
☐ Make sure existing tests pass.<br>
☐ If adding a new feature, also add tests for it.<br>
- If you are adding a new model, make sure you use
- If you are adding a new model, make sure you use
`ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)` to trigger the common tests.
- If you are adding new `@slow` tests, make sure they pass using
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/models/my_new_model/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests and make sure
- If you are adding a new tokenizer, write tests and make sure
`RUN_SLOW=1 python -m pytest tests/models/{your_model_name}/test_tokenization_{your_model_name}.py` passes.
- CircleCI does not run the slow tests, but GitHub Actions does every night!<br>
- CircleCI does not run the slow tests, but GitHub Actions does every night!<br>
☐ All public methods must have informative docstrings (see
[`modeling_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py)
@ -342,9 +341,8 @@ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/t
```
Like the slow tests, there are other environment variables available which are not enabled by default during testing:
- `RUN_CUSTOM_TOKENIZERS`: Enables tests for custom tokenizers.
- `RUN_PT_FLAX_CROSS_TESTS`: Enables tests for PyTorch + Flax integration.
- `RUN_PT_TF_CROSS_TESTS`: Enables tests for TensorFlow + PyTorch integration.
More environment variables and additional information can be found in the [testing_utils.py](https://github.com/huggingface/transformers/blob/main/src/transformers/testing_utils.py).

View File

@ -26,7 +26,7 @@ There are two main venues to receive support: [the forums](https://discuss.huggi
[The user forums](https://discuss.huggingface.co/) are supported by the wide community of the library users and backed up by developers when needed.
If you have a difficulty with deploying this library or some questions, or you'd like to discuss a new feature, please first consider discussing those things at the forums. Only when you feel your subject matter has been crystalized and you still need support from the library developers do proceed to file an [issue](https://github.com/huggingface/transformers/issues).
If you have a difficulty with deploying this library or some questions, or you'd like to discuss a new feature, please first consider discussing those things at the forums. Only when you feel your subject matter has been crystallized and you still need support from the library developers do proceed to file an [issue](https://github.com/huggingface/transformers/issues).
In particular all "Please explain" questions or objectively very user-specific feature requests belong to the forums. Here are some example of such questions:
@ -38,7 +38,6 @@ In particular all "Please explain" questions or objectively very user-specific f
* "How to train T5 on De->En translation?"
## The GitHub Issues
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
@ -154,7 +153,7 @@ You are not required to read the following guidelines before opening an issue. H
cd examples/seq2seq
torchrun --nproc_per_node=2 ./finetune_trainer.py \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \
--output_dir output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation \
@ -247,7 +246,6 @@ You are not required to read the following guidelines before opening an issue. H
Try not use italics and bold text too much as these often make the text more difficult to read.
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
@ -257,15 +255,14 @@ You are not required to read the following guidelines before opening an issue. H
1. https://github.com/huggingface/transformers/issues/9257
2. https://github.com/huggingface/transformers/issues/9257#issuecomment-749945162
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:
```
> How big is your gpu cluster?
> How big is your GPU cluster?
Our cluster is made of 256 gpus.
Our cluster is made of 256 GPUs.
```
If you are addressing multiple comments, quote the relevant parts of each before your answer. Some people use the same comment to do multiple replies, others separate them into separate comments. Either way works. The latter approach helps for linking to a specific comment.

View File

@ -3,18 +3,24 @@
# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = src
check_dirs := examples tests src utils
check_dirs := examples tests src utils scripts benchmark benchmark_v2
exclude_folders := ""
modified_only_fixup:
$(eval modified_py_files := $(shell python utils/get_modified_files.py $(check_dirs)))
@if test -n "$(modified_py_files)"; then \
echo "Checking/fixing $(modified_py_files)"; \
ruff check $(modified_py_files) --fix --exclude $(exclude_folders); \
ruff format $(modified_py_files) --exclude $(exclude_folders);\
@current_branch=$$(git branch --show-current); \
if [ "$$current_branch" = "main" ]; then \
echo "On main branch, running 'style' target instead..."; \
$(MAKE) style; \
else \
echo "No library .py files were modified"; \
modified_py_files=$$(python utils/get_modified_files.py $(check_dirs)); \
if [ -n "$$modified_py_files" ]; then \
echo "Checking/fixing files: $${modified_py_files}"; \
ruff check $${modified_py_files} --fix --exclude $(exclude_folders); \
ruff format $${modified_py_files} --exclude $(exclude_folders); \
else \
echo "No library .py files were modified"; \
fi; \
fi
# Update src/transformers/dependency_versions_table.py
@ -37,16 +43,16 @@ autogenerate_code: deps_table_update
repo-consistency:
python utils/check_copies.py
python utils/check_modular_conversion.py
python utils/check_table.py
python utils/check_dummies.py
python utils/check_repo.py
python utils/check_inits.py
python utils/check_pipeline_typing.py
python utils/check_config_docstrings.py
python utils/check_config_attributes.py
python utils/check_doctest_list.py
python utils/update_metadata.py --check-only
python utils/check_docstrings.py
python utils/check_support_list.py
python utils/add_dates.py
# this target runs checks on all files
@ -81,9 +87,9 @@ fixup: modified_only_fixup extra_style_checks autogenerate_code repo-consistency
fix-copies:
python utils/check_copies.py --fix_and_overwrite
python utils/check_modular_conversion.py --fix_and_overwrite
python utils/check_table.py --fix_and_overwrite
python utils/check_modular_conversion.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
python utils/check_pipeline_typing.py --fix_and_overwrite
python utils/check_doctest_list.py --fix_and_overwrite
python utils/check_docstrings.py --fix_and_overwrite

382
README.md
View File

@ -25,6 +25,7 @@ limitations under the License.
</p>
<p align="center">
<a href="https://huggingface.com/models"><img alt="Checkpoints on Hub" src="https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen"></a>
<a href="https://circleci.com/gh/huggingface/transformers"><img alt="Build" src="https://img.shields.io/circleci/build/github/huggingface/transformers/main"></a>
<a href="https://github.com/huggingface/transformers/blob/main/LICENSE"><img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue"></a>
<a href="https://huggingface.co/docs/transformers/index"><img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online"></a>
@ -43,266 +44,279 @@ limitations under the License.
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ja.md">日本語</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_hd.md">हिन्दी</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ru.md">Русский</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Рortuguês</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Português</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_it.md">Italiano</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_vi.md">Tiếng Việt</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ar.md">العربية</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ur.md">اردو</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_bn.md">বাংলা</a> |
</p>
</h4>
<h3 align="center">
<p>State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow</p>
<p>State-of-the-art pretrained models for inference and training</p>
</h3>
<h3 align="center">
<a href="https://hf.co/course"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/course_banner.png"></a>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
</h3>
🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
These models can be applied on:
It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the
pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training
frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM, SGLang, TGI, ...),
and adjacent modeling libraries (llama.cpp, mlx, ...) which leverage the model definition from `transformers`.
* 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, and text generation, in over 100 languages.
* 🖼️ Images, for tasks like image classification, object detection, and segmentation.
* 🗣️ Audio, for tasks like speech recognition and audio classification.
We pledge to help support new state-of-the-art models and democratize their usage by having their model definition be
simple, customizable, and efficient.
Transformer models can also perform tasks on **several modalities combined**, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
There are over 1M+ Transformers [model checkpoints](https://huggingface.co/models?library=transformers&sort=trending) on the [Hugging Face Hub](https://huggingface.com/models) you can use.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
Explore the [Hub](https://huggingface.com/) today to find a model and use Transformers to help you get started right away.
🤗 Transformers is backed by the three most popular deep learning libraries — [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.
## Installation
## Online demos
Transformers works with Python 3.9+, and [PyTorch](https://pytorch.org/get-started/locally/) 2.1+.
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer [private model hosting, versioning, & an inference API](https://huggingface.co/pricing) for public and private models.
Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager.
Here are a few examples:
In Natural Language Processing:
- [Masked word completion with BERT](https://huggingface.co/google-bert/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [Named Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [Text generation with Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
- [Natural Language Inference with RoBERTa](https://huggingface.co/FacebookAI/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/google-t5/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
In Computer Vision:
- [Image classification with ViT](https://huggingface.co/google/vit-base-patch16-224)
- [Object Detection with DETR](https://huggingface.co/facebook/detr-resnet-50)
- [Semantic Segmentation with SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512)
- [Panoptic Segmentation with Mask2Former](https://huggingface.co/facebook/mask2former-swin-large-coco-panoptic)
- [Depth Estimation with Depth Anything](https://huggingface.co/docs/transformers/main/model_doc/depth_anything)
- [Video Classification with VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)
- [Universal Segmentation with OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_dinat_large)
In Audio:
- [Automatic Speech Recognition with Whisper](https://huggingface.co/openai/whisper-large-v3)
- [Keyword Spotting with Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks)
- [Audio Classification with Audio Spectrogram Transformer](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593)
In Multimodal tasks:
- [Table Question Answering with TAPAS](https://huggingface.co/google/tapas-base-finetuned-wtq)
- [Visual Question Answering with ViLT](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa)
- [Image captioning with LLaVa](https://huggingface.co/llava-hf/llava-1.5-7b-hf)
- [Zero-shot Image Classification with SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384)
- [Document Question Answering with LayoutLM](https://huggingface.co/impira/layoutlm-document-qa)
- [Zero-shot Video Classification with X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)
- [Zero-shot Object Detection with OWLv2](https://huggingface.co/docs/transformers/en/model_doc/owlv2)
- [Zero-shot Image Segmentation with CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)
- [Automatic Mask Generation with SAM](https://huggingface.co/docs/transformers/model_doc/sam)
## 100 projects using Transformers
Transformers is more than a toolkit to use pretrained models: it's a community of projects built around it and the
Hugging Face Hub. We want Transformers to enable developers, researchers, students, professors, engineers, and anyone
else to build their dream projects.
In order to celebrate the 100,000 stars of transformers, we have decided to put the spotlight on the
community, and we have created the [awesome-transformers](./awesome-transformers.md) page which lists 100
incredible projects built in the vicinity of transformers.
If you own or use a project that you believe should be part of the list, please open a PR to add it!
## If you are looking for custom support from the Hugging Face team
<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br>
## Quick tour
To immediately use a model on a given input (text, image, audio, ...), we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:
```python
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
```py
# venv
python -m venv .my-env
source .my-env/bin/activate
# uv
uv venv .my-env
source .my-env/bin/activate
```
The second line of code downloads and caches the pretrained model used by the pipeline, while the third evaluates it on the given text. Here, the answer is "positive" with a confidence of 99.97%.
Install Transformers in your virtual environment.
Many tasks have a pre-trained `pipeline` ready to go, in NLP but also in computer vision and speech. For example, we can easily extract detected objects in an image:
```py
# pip
pip install "transformers[torch]"
``` python
>>> import requests
>>> from PIL import Image
>>> from transformers import pipeline
# Download an image with cute cats
>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png"
>>> image_data = requests.get(url, stream=True).raw
>>> image = Image.open(image_data)
# Allocate a pipeline for object detection
>>> object_detector = pipeline('object-detection')
>>> object_detector(image)
[{'score': 0.9982201457023621,
'label': 'remote',
'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}},
{'score': 0.9960021376609802,
'label': 'remote',
'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}},
{'score': 0.9954745173454285,
'label': 'couch',
'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}},
{'score': 0.9988006353378296,
'label': 'cat',
'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}},
{'score': 0.9986783862113953,
'label': 'cat',
'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}]
# uv
uv pip install "transformers[torch]"
```
Here, we get a list of objects detected in the image, with a box surrounding the object and a confidence score. Here is the original image on the left, with the predictions displayed on the right:
Install Transformers from source if you want the latest changes in the library or are interested in contributing. However, the *latest* version may not be stable. Feel free to open an [issue](https://github.com/huggingface/transformers/issues) if you encounter an error.
```shell
git clone https://github.com/huggingface/transformers.git
cd transformers
# pip
pip install '.[torch]'
# uv
uv pip install '.[torch]'
```
## Quickstart
Get started with Transformers right away with the [Pipeline](https://huggingface.co/docs/transformers/pipeline_tutorial) API. The `Pipeline` is a high-level inference class that supports text, audio, vision, and multimodal tasks. It handles preprocessing the input and returns the appropriate output.
Instantiate a pipeline and specify model to use for text generation. The model is downloaded and cached so you can easily reuse it again. Finally, pass some text to prompt the model.
```py
from transformers import pipeline
pipeline = pipeline(task="text-generation", model="Qwen/Qwen2.5-1.5B")
pipeline("the secret to baking a really good cake is ")
[{'generated_text': 'the secret to baking a really good cake is 1) to use the right ingredients and 2) to follow the recipe exactly. the recipe for the cake is as follows: 1 cup of sugar, 1 cup of flour, 1 cup of milk, 1 cup of butter, 1 cup of eggs, 1 cup of chocolate chips. if you want to make 2 cakes, how much sugar do you need? To make 2 cakes, you will need 2 cups of sugar.'}]
```
To chat with a model, the usage pattern is the same. The only difference is you need to construct a chat history (the input to `Pipeline`) between you and the system.
> [!TIP]
> You can also chat with a model directly from the command line.
> ```shell
> transformers chat Qwen/Qwen2.5-0.5B-Instruct
> ```
```py
import torch
from transformers import pipeline
chat = [
{"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},
{"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]
pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
response = pipeline(chat, max_new_tokens=512)
print(response[0]["generated_text"][-1]["content"])
```
Expand the examples below to see how `Pipeline` works for different modalities and tasks.
<details>
<summary>Automatic speech recognition</summary>
```py
from transformers import pipeline
pipeline = pipeline(task="automatic-speech-recognition", model="openai/whisper-large-v3")
pipeline("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}
```
</details>
<details>
<summary>Image classification</summary>
<h3 align="center">
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png" width="400"></a>
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample_post_processed.png" width="400"></a>
<a><img src="https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png"></a>
</h3>
You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/docs/transformers/task_summary).
```py
from transformers import pipeline
In addition to `pipeline`, to download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:
```python
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = AutoModel.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
pipeline = pipeline(task="image-classification", model="facebook/dinov2-small-imagenet1k-1-layer")
pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
[{'label': 'macaw', 'score': 0.997848391532898},
{'label': 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita',
'score': 0.0016551691805943847},
{'label': 'lorikeet', 'score': 0.00018523589824326336},
{'label': 'African grey, African gray, Psittacus erithacus',
'score': 7.85409429227002e-05},
{'label': 'quail', 'score': 5.502637941390276e-05}]
```
And here is the equivalent code for TensorFlow:
```python
>>> from transformers import AutoTokenizer, TFAutoModel
</details>
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-uncased")
<details>
<summary>Visual question answering</summary>
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
<h3 align="center">
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg"></a>
</h3>
```py
from transformers import pipeline
pipeline = pipeline(task="visual-question-answering", model="Salesforce/blip-vqa-base")
pipeline(
image="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg",
question="What is in the image?",
)
[{'answer': 'statue of liberty'}]
```
The tokenizer is responsible for all the preprocessing the pretrained model expects and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.
</details>
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use as usual. [This tutorial](https://huggingface.co/docs/transformers/training) explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune on a new dataset.
## Why should I use transformers?
## Why should I use Transformers?
1. Easy-to-use state-of-the-art models:
- High performance on natural language understanding & generation, computer vision, and audio tasks.
- Low barrier to entry for educators and practitioners.
- High performance on natural language understanding & generation, computer vision, audio, video, and multimodal tasks.
- Low barrier to entry for researchers, engineers, and developers.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 400,000 pretrained models across all modalities.
- Share trained models instead of training from scratch.
- Reduce compute time and production costs.
- Dozens of model architectures with 1M+ pretrained checkpoints across all modalities.
1. Choose the right framework for every part of a model's lifetime:
1. Choose the right framework for every part of a models lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch/JAX frameworks at will.
- Seamlessly pick the right framework for training, evaluation, and production.
- Move a single model between PyTorch/JAX/TF2.0 frameworks at will.
- Pick the right framework for training, evaluation, and production.
1. Easily customize a model or an example to your needs:
- We provide examples for each architecture to reproduce the results published by its original authors.
- Model internals are exposed as consistently as possible.
- Model files can be used independently of the library for quick experiments.
## Why shouldn't I use transformers?
<a target="_blank" href="https://huggingface.co/enterprise">
<img alt="Hugging Face Enterprise Hub" src="https://github.com/user-attachments/assets/247fb16d-d251-4583-96c4-d3d76dda4925">
</a><br>
## Why shouldn't I use Transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library (possibly, [Accelerate](https://huggingface.co/docs/accelerate)).
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/main/examples) are just that: examples. It is expected that they won't work out-of-the-box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
- The training API is optimized to work with PyTorch models provided by Transformers. For generic machine learning loops, you should use another library like [Accelerate](https://huggingface.co/docs/accelerate).
- The [example scripts](https://github.com/huggingface/transformers/tree/main/examples) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
## Installation
## 100 projects using Transformers
### With pip
Transformers is more than a toolkit to use pretrained models, it's a community of projects built around it and the
Hugging Face Hub. We want Transformers to enable developers, researchers, students, professors, engineers, and anyone
else to build their dream projects.
This repository is tested on Python 3.8+, Flax 0.4.1+, PyTorch 1.11+, and TensorFlow 2.6+.
In order to celebrate Transformers 100,000 stars, we wanted to put the spotlight on the
community with the [awesome-transformers](./awesome-transformers.md) page which lists 100
incredible projects built with Transformers.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
If you own or use a project that you believe should be part of the list, please open a PR to add it!
First, create a virtual environment with the version of Python you're going to use and activate it.
## Example models
Then, you will need to install at least one of Flax, PyTorch, or TensorFlow.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) and/or [Flax](https://github.com/google/flax#quick-install) and [Jax](https://github.com/google/jax#installation) installation pages regarding the specific installation command for your platform.
You can test most of our models directly on their [Hub model pages](https://huggingface.co/models).
When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows:
Expand each modality below to see a few example models for various use cases.
```bash
pip install transformers
```
<details>
<summary>Audio</summary>
If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must [install the library from source](https://huggingface.co/docs/transformers/installation#installing-from-source).
- Audio classification with [Whisper](https://huggingface.co/openai/whisper-large-v3-turbo)
- Automatic speech recognition with [Moonshine](https://huggingface.co/UsefulSensors/moonshine)
- Keyword spotting with [Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks)
- Speech to speech generation with [Moshi](https://huggingface.co/kyutai/moshiko-pytorch-bf16)
- Text to audio with [MusicGen](https://huggingface.co/facebook/musicgen-large)
- Text to speech with [Bark](https://huggingface.co/suno/bark)
### With conda
</details>
🤗 Transformers can be installed using conda as follows:
<details>
<summary>Computer vision</summary>
```shell script
conda install conda-forge::transformers
```
- Automatic mask generation with [SAM](https://huggingface.co/facebook/sam-vit-base)
- Depth estimation with [DepthPro](https://huggingface.co/apple/DepthPro-hf)
- Image classification with [DINO v2](https://huggingface.co/facebook/dinov2-base)
- Keypoint detection with [SuperPoint](https://huggingface.co/magic-leap-community/superpoint)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Object detection with [RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd)
- Pose Estimation with [VitPose](https://huggingface.co/usyd-community/vitpose-base-simple)
- Universal segmentation with [OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large)
- Video classification with [VideoMAE](https://huggingface.co/MCG-NJU/videomae-large)
> **_NOTE:_** Installing `transformers` from the `huggingface` channel is deprecated.
</details>
Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.
<details>
<summary>Multimodal</summary>
> **_NOTE:_** On Windows, you may be prompted to activate Developer Mode in order to benefit from caching. If this is not an option for you, please let us know in [this issue](https://github.com/huggingface/huggingface_hub/issues/1062).
- Audio or text to text with [Qwen2-Audio](https://huggingface.co/Qwen/Qwen2-Audio-7B)
- Document question answering with [LayoutLMv3](https://huggingface.co/microsoft/layoutlmv3-base)
- Image or text to text with [Qwen-VL](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
- Image captioning [BLIP-2](https://huggingface.co/Salesforce/blip2-opt-2.7b)
- OCR-based document understanding with [GOT-OCR2](https://huggingface.co/stepfun-ai/GOT-OCR-2.0-hf)
- Table question answering with [TAPAS](https://huggingface.co/google/tapas-base)
- Unified multimodal understanding and generation with [Emu3](https://huggingface.co/BAAI/Emu3-Gen)
- Vision to text with [Llava-OneVision](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-ov-hf)
- Visual question answering with [Llava](https://huggingface.co/llava-hf/llava-1.5-7b-hf)
- Visual referring expression segmentation with [Kosmos-2](https://huggingface.co/microsoft/kosmos-2-patch14-224)
## Model architectures
</details>
**[All the model checkpoints](https://huggingface.co/models)** provided by 🤗 Transformers are seamlessly integrated from the huggingface.co [model hub](https://huggingface.co/models), where they are uploaded directly by [users](https://huggingface.co/users) and [organizations](https://huggingface.co/organizations).
<details>
<summary>NLP</summary>
Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen)
- Masked word completion with [ModernBERT](https://huggingface.co/answerdotai/ModernBERT-base)
- Named entity recognition with [Gemma](https://huggingface.co/google/gemma-2-2b)
- Question answering with [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)
- Summarization with [BART](https://huggingface.co/facebook/bart-large-cnn)
- Translation with [T5](https://huggingface.co/google-t5/t5-base)
- Text generation with [Llama](https://huggingface.co/meta-llama/Llama-3.2-1B)
- Text classification with [Qwen](https://huggingface.co/Qwen/Qwen2.5-0.5B)
🤗 Transformers currently provides the following architectures: see [here](https://huggingface.co/docs/transformers/model_summary) for a high-level summary of each them.
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/docs/transformers/index#supported-frameworks).
These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the [documentation](https://github.com/huggingface/transformers/tree/main/examples).
## Learn more
| Section | Description |
|-|-|
| [Documentation](https://huggingface.co/docs/transformers/) | Full API documentation and tutorials |
| [Task summary](https://huggingface.co/docs/transformers/task_summary) | Tasks supported by 🤗 Transformers |
| [Preprocessing tutorial](https://huggingface.co/docs/transformers/preprocessing) | Using the `Tokenizer` class to prepare data for the models |
| [Training and fine-tuning](https://huggingface.co/docs/transformers/training) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/main/examples) | Example scripts for fine-tuning models on a wide range of tasks |
| [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing) | Upload and share your fine-tuned models with the community |
</details>
## Citation

View File

@ -14,7 +14,7 @@ Models uploaded on the Hugging Face Hub come in different formats. We heavily re
models in the [`safetensors`](https://github.com/huggingface/safetensors) format (which is the default prioritized
by the transformers library), as developed specifically to prevent arbitrary code execution on your system.
To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
To avoid loading models from unsafe formats (e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
### Remote code
@ -27,13 +27,6 @@ These models require the `trust_remote_code=True` parameter to be set when using
the content of the modeling files when using this argument. We recommend setting a revision in order to ensure you
protect yourself from updates on the repository.
#### Tools
Through the `Agent` framework, remote tools can be downloaded to be used by the Agent. You're to specify these tools
yourself, but please keep in mind that their code will be run on your machine if the Agent chooses to run them.
Please inspect the code of the tools before passing them to the Agent to protect your runtime and local setup.
## Reporting a Vulnerability
Feel free to submit vulnerability reports to [security@huggingface.co](mailto:security@huggingface.co), where someone from the HF security team will review and recommend next steps. If reporting a vulnerability specific to open source, please note [Huntr](https://huntr.com) is a vulnerability disclosure program for open source software.

View File

@ -6,7 +6,7 @@ developers, researchers, students, professors, engineers, and anyone else to bui
In this list, we showcase incredibly impactful and novel projects that have pushed the field forward. We celebrate
100 of these projects as we reach the milestone of 100k stars as a community; but we're very open to pull requests
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
to add it.
## [gpt4all](https://github.com/nomic-ai/gpt4all)
@ -15,7 +15,7 @@ to add it.
Keywords: Open-source, LLaMa, GPT-J, instruction, assistant
## [recommenders](https://github.com/microsoft/recommenders)
## [recommenders](https://github.com/recommenders-team/recommenders)
This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. It goes over several aspects required to build efficient recommendation systems: data preparation, modeling, evaluation, model selection & optimization, as well as operationalization
@ -29,7 +29,7 @@ Keywords: inpainting, SD, Stable Diffusion
## [flair](https://github.com/flairNLP/flair)
FLAIR is a powerful PyTorch NLP framework, convering several important tasks: NER, sentiment-analysis, part-of-speech tagging, text and document embeddings, among other things.
FLAIR is a powerful PyTorch NLP framework, covering several important tasks: NER, sentiment-analysis, part-of-speech tagging, text and document embeddings, among other things.
Keywords: NLP, text embedding, document embedding, biomedical, NER, PoS, sentiment-analysis
@ -39,17 +39,17 @@ MindsDB is a low-code ML platform, which automates and integrates several ML fra
Keywords: Database, low-code, AI table
## [langchain](https://github.com/hwchase17/langchain)
## [langchain](https://github.com/langchain-ai/langchain)
[langchain](https://github.com/hwchase17/langchain) is aimed at assisting in the development of apps merging both LLMs and other sources of knowledge. The library allows chaining calls to applications, creating a sequence across many tools.
[langchain](https://github.com/langchain-ai/langchain) is aimed at assisting in the development of apps merging both LLMs and other sources of knowledge. The library allows chaining calls to applications, creating a sequence across many tools.
Keywords: LLMs, Large Language Models, Agents, Chains
## [LlamaIndex](https://github.com/jerryjliu/llama_index)
## [LlamaIndex](https://github.com/run-llama/llama_index)
[LlamaIndex](https://github.com/jerryjliu/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retreival mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retrieval mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
## [ParlAI](https://github.com/facebookresearch/ParlAI)
@ -146,9 +146,9 @@ Keywords: Framework, simplicity, NLP
Keywords: LLM, Agents, HF Hub
## [transformers.js](https://xenova.github.io/transformers.js/)
## [transformers.js](https://github.com/huggingface/transformers.js/)
[transformers.js](https://xenova.github.io/transformers.js/) is a JavaScript library targeted at running models from transformers directly within the browser.
[transformers.js](https://github.com/huggingface/transformers.js/) is a JavaScript library targeted at running models from transformers directly within the browser.
Keywords: Transformers, JavaScript, browser
@ -257,7 +257,7 @@ Stable-Dreamfusion is a pytorch implementation of the text-to-3D model Dreamfusi
Keywords: Text-to-3D, Stable Diffusion
## [txtai](https://github.com/neuml/txtai)
[txtai](https://github.com/neuml/txtai) is an open-source platform for semantic search and workflows powered by language models. txtai builds embeddings databases, which are a union of vector indexes and relational databases enabling similarity search with SQL. Semantic workflows connect language models together into unified applications.
Keywords: Semantic search, LLM
@ -288,7 +288,7 @@ Keywords: Music understanding, Music generation
## [dalle-flow](https://github.com/jina-ai/dalle-flow)
DALL·E Flow is an interactive workflow for generating high-definition images from a text prompt. Itt leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt.
DALL·E Flow is an interactive workflow for generating high-definition images from a text prompt. It leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt.
The preferred candidate is fed to GLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 via SwinIR.
Keywords: High-definition image generation, Stable Diffusion, DALL-E Mega, GLID-3 XL, CLIP, SwinIR
@ -309,8 +309,8 @@ Keywords: OCR, LaTeX, Math formula
OpenCLIP is an open source implementation of OpenAI's CLIP.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
Specifically, a ResNet-50 model trained with this codebase on OpenAI's 15 million image subset of YFCC achieves 32.7% top-1 accuracy on ImageNet.
@ -437,7 +437,7 @@ Keywords: DALL-E, Russian
Keywords: Knowledge Extraction, Knowledge Graphs
## [Nebuly](https://github.com/nebuly-ai/nebuly)
## [Nebuly](https://github.com/nebuly-ai/optimate)
Nebuly is the next-generation platform to monitor and optimize your AI costs in one place. The platform connects to all your AI cost sources (compute, API providers, AI software licenses, etc) and centralizes them in one place to give you full visibility on a model basis. The platform also provides optimization recommendations and a co-pilot model that can guide during the optimization process. The platform builds on top of the open-source tools allowing you to optimize the different steps of your AI stack to squeeze out the best possible cost performances.
@ -526,7 +526,7 @@ Keywords: Model deployment, CLoud, Mobile, Edge
## [underthesea](https://github.com/undertheseanlp/underthesea)
[underthesea](https://github.com/undertheseanlp/underthesea) is a Vietnamese NLP toolkit. Underthesea is a suite of open source Python modules data sets and tutorials supporting research and development in Vietnamese Natural Language Processing. We provides extremely easy API to quickly apply pretrained NLP models to your Vietnamese text, such as word segmentation, part-of-speech tagging (PoS), named entity recognition (NER), text classification and dependency parsing.
[underthesea](https://github.com/undertheseanlp/underthesea) is a Vietnamese NLP toolkit. Underthesea is a suite of open source Python modules data sets and tutorials supporting research and development in Vietnamese Natural Language Processing. We provide extremely easy API to quickly apply pretrained NLP models to your Vietnamese text, such as word segmentation, part-of-speech tagging (PoS), named entity recognition (NER), text classification and dependency parsing.
Keywords: Vietnamese, NLP
@ -596,7 +596,7 @@ Keywords: Data-Centric AI, Data Quality, Noisy Labels, Outlier Detection, Active
## [BentoML](https://github.com/bentoml/BentoML)
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
All Hugging Face models and pipelines can be seamlessly integrated into BentoML applications, enabling the running of models on the most suitable hardware and independent scaling based on usage.
Keywords: BentoML, Framework, Deployment, AI Applications
@ -606,4 +606,3 @@ Keywords: BentoML, Framework, Deployment, AI Applications
[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
Keywords: PEFT, fine-tuning, LLaMA-2, ChatGLM, Qwen

1
benchmark/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
benchmark_results/

49
benchmark/README.md Normal file
View File

@ -0,0 +1,49 @@
# Benchmarks
You might want to add new benchmarks.
You will need to define a python function named `run_benchmark` in your python file and the file must be located in this `benchmark/` directory.
The expected function signature is the following:
```py
def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):
```
## Writing metrics to the database
`MetricsRecorder` is thread-safe, in the sense of the python [`Thread`](https://docs.python.org/3/library/threading.html#threading.Thread). This means you can start a background thread to do the readings on the device measurements while not blocking the main thread to execute the model measurements.
cf [`llama.py`](./llama.py) to see an example of this in practice.
```py
from benchmarks_entrypoint import MetricsRecorder
import psycopg2
def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):
metrics_recorder = MetricsRecorder(psycopg2.connect("dbname=metrics"), logger, branch, commit_id, commit_msg)
benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
# To collect device measurements
metrics_recorder.collect_device_measurements(
benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
)
# To collect your model measurements
metrics_recorder.collect_model_measurements(
benchmark_id,
{
"model_load_time": model_load_time,
"first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
"second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
"first_eager_generate_time_secs": first_eager_generate_time,
"second_eager_generate_time_secs": second_eager_generate_time,
"time_to_first_token_secs": time_to_first_token,
"time_to_second_token_secs": time_to_second_token,
"time_to_third_token_secs": time_to_third_token,
"time_to_next_token_mean_secs": mean_time_to_next_token,
"first_compile_generate_time_secs": first_compile_generate_time,
"second_compile_generate_time_secs": second_compile_generate_time,
"third_compile_generate_time_secs": third_compile_generate_time,
"fourth_compile_generate_time_secs": fourth_compile_generate_time,
},
)
```

353
benchmark/benches/llama.py Normal file
View File

@ -0,0 +1,353 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
from logging import Logger
from threading import Event, Thread
from time import perf_counter, sleep
# Add the parent directory to Python path to import benchmarks_entrypoint
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import gpustat
import psutil
import psycopg2
from benchmarks_entrypoint import MetricsRecorder
# Optional heavy ML dependencies - only required when actually running the benchmark
try:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, StaticCache
TRANSFORMERS_AVAILABLE = True
except ImportError:
TRANSFORMERS_AVAILABLE = False
torch = None
AutoModelForCausalLM = None
AutoTokenizer = None
GenerationConfig = None
StaticCache = None
os.environ["HF_XET_HIGH_PERFORMANCE"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
# Only set torch precision if torch is available
if TRANSFORMERS_AVAILABLE:
torch.set_float32_matmul_precision("high")
def collect_metrics(benchmark_id, continue_metric_collection, metrics_recorder):
p = psutil.Process(os.getpid())
while not continue_metric_collection.is_set():
with p.oneshot():
cpu_util = p.cpu_percent()
mem_megabytes = p.memory_info().rss / (1024 * 1024)
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_util = gpu_stats[0]["utilization.gpu"]
gpu_mem_megabytes = gpu_stats[0]["memory.used"]
metrics_recorder.collect_device_measurements(
benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
)
sleep(0.01)
def run_benchmark(
logger: Logger,
repository: str,
branch: str,
commit_id: str,
commit_msg: str,
metrics_recorder=None,
num_tokens_to_generate=100,
):
# Check if required ML dependencies are available
if not TRANSFORMERS_AVAILABLE:
logger.error("Transformers and torch are required to run the LLaMA benchmark. Please install them with:")
logger.error("pip install torch transformers")
logger.error("Skipping LLaMA benchmark due to missing dependencies.")
return
continue_metric_collection = Event()
metrics_thread = None
model_id = "meta-llama/Llama-2-7b-hf"
# If no metrics_recorder is provided, create one for backward compatibility
if metrics_recorder is None:
try:
metrics_recorder = MetricsRecorder(
psycopg2.connect("dbname=metrics"), logger, repository, branch, commit_id, commit_msg, True
)
should_close_recorder = True
except Exception as e:
logger.error(f"Failed to create metrics recorder: {e}")
return
else:
should_close_recorder = False
try:
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_name = gpu_stats[0]["name"]
benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
logger.info(f"running benchmark #{benchmark_id} on {gpu_name} for {model_id}")
metrics_thread = Thread(
target=collect_metrics,
args=[benchmark_id, continue_metric_collection, metrics_recorder],
)
metrics_thread.start()
logger.info("started background thread to fetch device metrics")
os.environ["TOKENIZERS_PARALLELISM"] = "false" # silence warnings when compiling
device = "cuda"
logger.info("downloading weights")
# This is to avoid counting download in model load time measurement
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16)
gen_config = GenerationConfig(do_sample=False, top_p=1, temperature=1)
logger.info("loading model")
start = perf_counter()
model = AutoModelForCausalLM.from_pretrained(
model_id, dtype=torch.float16, generation_config=gen_config
).eval()
model.to(device)
torch.cuda.synchronize()
end = perf_counter()
model_load_time = end - start
logger.info(f"loaded model in: {model_load_time}s")
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Why dogs are so cute?"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Specify the max length (including both the prompt and the response)
# When calling `generate` with `cache_implementation="static" later, this is also used to create a `StaticCache` object
# with sequence length = `max_length`. The longer the more you will re-use it
seq_length = inputs["input_ids"].shape[1]
model.generation_config.max_length = seq_length + num_tokens_to_generate
batch_size = inputs["input_ids"].shape[0]
# Copied from the gpt-fast repo
def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization
q = torch.empty_like(probs_sort).exponential_(1)
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
def logits_to_probs(logits, temperature: float = 1.0, top_k: int | None = None):
logits = logits / max(temperature, 1e-5)
if top_k is not None:
v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
pivot = v.select(-1, -1).unsqueeze(-1)
logits = torch.where(logits < pivot, -float("Inf"), logits)
probs = torch.nn.functional.softmax(logits, dim=-1)
return probs
def sample(logits, temperature: float = 1.0, top_k: int | None = None):
probs = logits_to_probs(logits[0, -1], temperature, top_k)
idx_next = multinomial_sample_one_no_sync(probs)
return idx_next, probs
# First eager forward pass
logger.info("running first eager forward pass")
start = perf_counter()
_ = model(**inputs)
torch.cuda.synchronize()
end = perf_counter()
first_eager_fwd_pass_time = end - start
logger.info(f"completed first eager forward pass in: {first_eager_fwd_pass_time}s")
# Second eager forward pass (should be faster)
logger.info("running second eager forward pass")
start = perf_counter()
_ = model(**inputs)
torch.cuda.synchronize()
end = perf_counter()
second_eager_fwd_pass_time = end - start
logger.info(f"completed second eager forward pass in: {second_eager_fwd_pass_time}s")
# First eager generation
logger.info("running first eager generation")
start = perf_counter()
output = model.generate(**inputs)
torch.cuda.synchronize()
end = perf_counter()
first_eager_generate_time = end - start
logger.info(f"completed first eager generation in: {first_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
# Second eager generation (should be faster)
logger.info("running second eager generation")
start = perf_counter()
output = model.generate(**inputs)
torch.cuda.synchronize()
end = perf_counter()
second_eager_generate_time = end - start
logger.info(f"completed second eager generation in: {second_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
logger.info("running generation timing loop")
input_pos = torch.arange(0, seq_length, device=device)
inputs = inputs["input_ids"]
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(inputs, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_first_token = end - start
input_pos = torch.tensor([seq_length], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_second_token = end - start
input_pos = torch.tensor([seq_length + 1], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_third_token = end - start
logger.info("running longer generation timing loop")
total_time = 0
for i in range(20):
input_pos = torch.tensor([seq_length + 2 + i], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
total_time += end - start
mean_time_to_next_token = total_time / 20
logger.info("running compilation benchmarks")
# Now compile the model
model = torch.compile(model, mode="max-autotune", fullgraph=True)
# StaticCache for generation
with torch.device(device):
model.setup_caches(max_batch_size=batch_size, max_seq_len=seq_length + num_tokens_to_generate)
input_pos = torch.arange(0, seq_length, device=device)
inputs = tokenizer(prompt, return_tensors="pt").to(device)["input_ids"]
logger.info("compiling model")
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16, generation_config=gen_config)
model.to(device)
model = torch.compile(model, mode="max-autotune", fullgraph=True)
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 1st call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
first_compile_generate_time = end - start
logger.info(f"completed first compile generation in: {first_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 2nd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
second_compile_generate_time = end - start
logger.info(f"completed second compile generation in: {second_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 3rd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
third_compile_generate_time = end - start
logger.info(f"completed third compile generation in: {third_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 4th call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
fourth_compile_generate_time = end - start
logger.info(f"completed fourth compile generation in: {fourth_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
metrics_recorder.collect_model_measurements(
benchmark_id,
{
"model_load_time": model_load_time,
"first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
"second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
"first_eager_generate_time_secs": first_eager_generate_time,
"second_eager_generate_time_secs": second_eager_generate_time,
"time_to_first_token_secs": time_to_first_token,
"time_to_second_token_secs": time_to_second_token,
"time_to_third_token_secs": time_to_third_token,
"time_to_next_token_mean_secs": mean_time_to_next_token,
"first_compile_generate_time_secs": first_compile_generate_time,
"second_compile_generate_time_secs": second_compile_generate_time,
"third_compile_generate_time_secs": third_compile_generate_time,
"fourth_compile_generate_time_secs": fourth_compile_generate_time,
},
)
except Exception as e:
logger.error(f"Caught exception: {e}")
continue_metric_collection.set()
if metrics_thread is not None:
metrics_thread.join()
# Only close the recorder if we created it locally
if should_close_recorder:
metrics_recorder.close()

View File

@ -31,9 +31,7 @@ from contextlib import contextmanager
from pathlib import Path
from git import Repo
from huggingface_hub import HfApi
from optimum_benchmark import Benchmark
from optimum_benchmark_wrapper import main
@ -90,7 +88,7 @@ def summarize(run_dir, metrics, expand_metrics=False):
model = benchmark.config.backend["model"]
# Ths looks like `benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5`.
# This looks like `benchmark.input_shapes.batch_size=1,benchmark.input_shapes.sequence_length=5`.
# (we rely on the usage of hydra's `${hydra.job.override_dirname}`.)
benchmark_name = re.sub(f"backend.model={model},*", "", report_dir)
benchmark_name = str(Path(benchmark_name).parts[-1])

View File

@ -0,0 +1,502 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import importlib.util
import json
import logging
import os
import sys
import uuid
from datetime import datetime
import pandas as pd
try:
from psycopg2.extensions import register_adapter
from psycopg2.extras import Json
register_adapter(dict, Json)
PSYCOPG2_AVAILABLE = True
except ImportError:
PSYCOPG2_AVAILABLE = False
class ImportModuleException(Exception):
pass
class MetricsRecorder:
def __init__(
self,
connection,
logger: logging.Logger,
repository: str,
branch: str,
commit_id: str,
commit_msg: str,
collect_csv_data: bool = True,
):
self.conn = connection
self.use_database = connection is not None
if self.use_database:
self.conn.autocommit = True
self.logger = logger
self.repository = repository
self.branch = branch
self.commit_id = commit_id
self.commit_msg = commit_msg
self.collect_csv_data = collect_csv_data
# For CSV export - store all data in pandas DataFrames (only if CSV collection is enabled)
if self.collect_csv_data:
# Initialize empty DataFrames with proper schemas
self.benchmarks_df = pd.DataFrame(
columns=[
"benchmark_id",
"repository",
"branch",
"commit_id",
"commit_message",
"metadata",
"created_at",
]
)
self.device_measurements_df = pd.DataFrame(
columns=["benchmark_id", "cpu_util", "mem_megabytes", "gpu_util", "gpu_mem_megabytes", "time"]
)
self.model_measurements_df = pd.DataFrame(
columns=[
"benchmark_id",
"time",
"model_load_time",
"first_eager_forward_pass_time_secs",
"second_eager_forward_pass_time_secs",
"first_eager_generate_time_secs",
"second_eager_generate_time_secs",
"time_to_first_token_secs",
"time_to_second_token_secs",
"time_to_third_token_secs",
"time_to_next_token_mean_secs",
"first_compile_generate_time_secs",
"second_compile_generate_time_secs",
"third_compile_generate_time_secs",
"fourth_compile_generate_time_secs",
]
)
else:
self.benchmarks_df = None
self.device_measurements_df = None
self.model_measurements_df = None
def initialise_benchmark(self, metadata: dict[str, str]) -> str:
"""
Creates a new benchmark, returns the benchmark id (UUID)
"""
# Generate a unique UUID for this benchmark
benchmark_id = str(uuid.uuid4())
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO benchmarks (benchmark_id, repository, branch, commit_id, commit_message, metadata) VALUES (%s, %s, %s, %s, %s, %s)",
(benchmark_id, self.repository, self.branch, self.commit_id, self.commit_msg, metadata),
)
self.logger.debug(f"initialised benchmark #{benchmark_id}")
# Store benchmark data for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame
new_row = pd.DataFrame(
[
{
"benchmark_id": benchmark_id,
"repository": self.repository,
"branch": self.branch,
"commit_id": self.commit_id,
"commit_message": self.commit_msg,
"metadata": json.dumps(metadata),
"created_at": datetime.utcnow().isoformat(),
}
]
)
self.benchmarks_df = pd.concat([self.benchmarks_df, new_row], ignore_index=True)
mode_info = []
if self.use_database:
mode_info.append("database")
if self.collect_csv_data:
mode_info.append("CSV")
mode_str = " + ".join(mode_info) if mode_info else "no storage"
self.logger.debug(f"initialised benchmark #{benchmark_id} ({mode_str} mode)")
return benchmark_id
def collect_device_measurements(self, benchmark_id: str, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes):
"""
Collect device metrics, such as CPU & GPU usage. These are "static", as in you cannot pass arbitrary arguments to the function.
"""
# Store device measurements for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame
new_row = pd.DataFrame(
[
{
"benchmark_id": benchmark_id,
"cpu_util": cpu_util,
"mem_megabytes": mem_megabytes,
"gpu_util": gpu_util,
"gpu_mem_megabytes": gpu_mem_megabytes,
"time": datetime.utcnow().isoformat(),
}
]
)
self.device_measurements_df = pd.concat([self.device_measurements_df, new_row], ignore_index=True)
# Store in database if available
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO device_measurements (benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes) VALUES (%s, %s, %s, %s, %s)",
(benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes),
)
self.logger.debug(
f"collected device measurements for benchmark #{benchmark_id} [CPU util: {cpu_util}, mem MBs: {mem_megabytes}, GPU util: {gpu_util}, GPU mem MBs: {gpu_mem_megabytes}]"
)
def collect_model_measurements(self, benchmark_id: str, measurements: dict[str, float]):
# Store model measurements for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame with flattened measurements
row_data = {"benchmark_id": benchmark_id, "time": datetime.utcnow().isoformat()}
# Flatten the measurements dict into the row
row_data.update(measurements)
new_row = pd.DataFrame([row_data])
self.model_measurements_df = pd.concat([self.model_measurements_df, new_row], ignore_index=True)
# Store in database if available
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"""
INSERT INTO model_measurements (
benchmark_id,
measurements
) VALUES (%s, %s)
""",
(
benchmark_id,
measurements,
),
)
self.logger.debug(f"collected model measurements for benchmark #{benchmark_id}: {measurements}")
def export_to_csv(self, output_dir: str = "benchmark_results"):
"""
Export all collected data to CSV files using pandas DataFrames
"""
if not self.collect_csv_data:
self.logger.warning("CSV data collection is disabled - no CSV files will be generated")
return
if not os.path.exists(output_dir):
os.makedirs(output_dir)
self.logger.info(f"Created output directory: {output_dir}")
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
files_created = []
# Export using pandas DataFrames
self._export_pandas_data(output_dir, timestamp, files_created)
self.logger.info(f"CSV export complete! Created {len(files_created)} files in {output_dir}")
def _export_pandas_data(self, output_dir: str, timestamp: str, files_created: list):
"""
Export CSV files using pandas DataFrames
"""
# Export benchmarks
benchmarks_file = os.path.join(output_dir, f"benchmarks_{timestamp}.csv")
self.benchmarks_df.to_csv(benchmarks_file, index=False)
files_created.append(benchmarks_file)
self.logger.info(f"Exported {len(self.benchmarks_df)} benchmark records to {benchmarks_file}")
# Export device measurements
device_file = os.path.join(output_dir, f"device_measurements_{timestamp}.csv")
self.device_measurements_df.to_csv(device_file, index=False)
files_created.append(device_file)
self.logger.info(f"Exported {len(self.device_measurements_df)} device measurement records to {device_file}")
# Export model measurements (already flattened)
model_file = os.path.join(output_dir, f"model_measurements_{timestamp}.csv")
self.model_measurements_df.to_csv(model_file, index=False)
files_created.append(model_file)
self.logger.info(f"Exported {len(self.model_measurements_df)} model measurement records to {model_file}")
# Create comprehensive summary using pandas operations
summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.csv")
self._create_summary(summary_file)
files_created.append(summary_file)
def _create_summary(self, summary_file: str):
"""
Create a comprehensive summary CSV using pandas operations
"""
if len(self.benchmarks_df) == 0:
# Create empty summary file
summary_df = pd.DataFrame()
summary_df.to_csv(summary_file, index=False)
self.logger.info(f"Created empty benchmark summary at {summary_file}")
return
# Start with benchmarks as the base
summary_df = self.benchmarks_df.copy()
# Add model measurements (join on benchmark_id)
if len(self.model_measurements_df) > 0:
# Drop 'time' column from model measurements to avoid conflicts
model_df = self.model_measurements_df.drop(columns=["time"], errors="ignore")
summary_df = summary_df.merge(model_df, on="benchmark_id", how="left")
# Calculate device measurement aggregates using pandas groupby
if len(self.device_measurements_df) > 0:
device_agg = (
self.device_measurements_df.groupby("benchmark_id")
.agg(
{
"cpu_util": ["mean", "max", "std", "count"],
"mem_megabytes": ["mean", "max", "std"],
"gpu_util": ["mean", "max", "std"],
"gpu_mem_megabytes": ["mean", "max", "std"],
}
)
.round(3)
)
# Flatten column names
device_agg.columns = [f"{col[0]}_{col[1]}" for col in device_agg.columns]
device_agg = device_agg.reset_index()
# Rename count column to be more descriptive
if "cpu_util_count" in device_agg.columns:
device_agg = device_agg.rename(columns={"cpu_util_count": "device_measurement_count"})
# Merge with summary
summary_df = summary_df.merge(device_agg, on="benchmark_id", how="left")
# Export the comprehensive summary
summary_df.to_csv(summary_file, index=False)
self.logger.info(f"Created comprehensive benchmark summary with {len(summary_df)} records at {summary_file}")
def close(self):
if self.use_database and self.conn:
self.conn.close()
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)
formatter = logging.Formatter("[%(levelname)s - %(asctime)s] %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
def parse_arguments() -> tuple[str, str, str, str, bool, str]:
"""
Parse command line arguments for the benchmarking CLI.
"""
parser = argparse.ArgumentParser(description="CLI for benchmarking the huggingface/transformers.")
parser.add_argument(
"repository",
type=str,
help="The repository name on which the benchmarking is performed.",
)
parser.add_argument(
"branch",
type=str,
help="The branch name on which the benchmarking is performed.",
)
parser.add_argument(
"commit_id",
type=str,
help="The commit hash on which the benchmarking is performed.",
)
parser.add_argument(
"commit_msg",
type=str,
help="The commit message associated with the commit, truncated to 70 characters.",
)
parser.add_argument("--csv", action="store_true", default=False, help="Enable CSV output files generation.")
parser.add_argument(
"--csv-output-dir",
type=str,
default="benchmark_results",
help="Directory for CSV output files (default: benchmark_results).",
)
args = parser.parse_args()
# CSV is disabled by default, only enabled when --csv is used
generate_csv = args.csv
return args.repository, args.branch, args.commit_id, args.commit_msg, generate_csv, args.csv_output_dir
def import_from_path(module_name, file_path):
try:
spec = importlib.util.spec_from_file_location(module_name, file_path)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module
except Exception as e:
raise ImportModuleException(f"failed to load python module: {e}")
def create_database_connection():
"""
Try to create a database connection. Returns None if connection fails.
"""
if not PSYCOPG2_AVAILABLE:
logger.warning("psycopg2 not available - running in CSV-only mode")
return None
try:
import psycopg2
conn = psycopg2.connect("dbname=metrics")
logger.info("Successfully connected to database")
return conn
except Exception as e:
logger.warning(f"Failed to connect to database: {e}. Running in CSV-only mode")
return None
def create_global_metrics_recorder(
repository: str, branch: str, commit_id: str, commit_msg: str, generate_csv: bool = False
) -> MetricsRecorder:
"""
Create a global metrics recorder that will be used across all benchmarks.
"""
connection = create_database_connection()
recorder = MetricsRecorder(connection, logger, repository, branch, commit_id, commit_msg, generate_csv)
# Log the storage mode
storage_modes = []
if connection is not None:
storage_modes.append("database")
if generate_csv:
storage_modes.append("CSV")
if not storage_modes:
logger.warning("Running benchmarks with NO data storage (no database connection, CSV disabled)")
logger.warning("Use --csv flag to enable CSV output when database is unavailable")
else:
logger.info(f"Running benchmarks with: {' + '.join(storage_modes)} storage")
return recorder
if __name__ == "__main__":
benchmarks_folder_path = os.path.dirname(os.path.realpath(__file__))
benches_folder_path = os.path.join(benchmarks_folder_path, "benches")
repository, branch, commit_id, commit_msg, generate_csv, csv_output_dir = parse_arguments()
# Create a global metrics recorder
global_metrics_recorder = create_global_metrics_recorder(repository, branch, commit_id, commit_msg, generate_csv)
successful_benchmarks = 0
failed_benchmarks = 0
# Automatically discover all benchmark modules in benches/ folder
benchmark_modules = []
if os.path.exists(benches_folder_path):
logger.debug(f"Scanning for benchmarks in: {benches_folder_path}")
for entry in os.scandir(benches_folder_path):
if not entry.name.endswith(".py"):
continue
if entry.name.startswith("__"): # Skip __init__.py, __pycache__, etc.
continue
# Check if the file has a run_benchmark function
try:
logger.debug(f"checking if benches/{entry.name} has run_benchmark function")
module = import_from_path(entry.name.split(".")[0], entry.path)
if hasattr(module, "run_benchmark"):
benchmark_modules.append(entry.name)
logger.debug(f"discovered benchmark: {entry.name}")
else:
logger.debug(f"skipping {entry.name} - no run_benchmark function found")
except Exception as e:
logger.debug(f"failed to check benches/{entry.name}: {e}")
else:
logger.warning(f"Benches directory not found: {benches_folder_path}")
if benchmark_modules:
logger.info(f"Discovered {len(benchmark_modules)} benchmark(s): {benchmark_modules}")
else:
logger.warning("No benchmark modules found in benches/ directory")
for module_name in benchmark_modules:
module_path = os.path.join(benches_folder_path, module_name)
try:
logger.debug(f"loading: {module_name}")
module = import_from_path(module_name.split(".")[0], module_path)
logger.info(f"running benchmarks in: {module_name}")
# Check if the module has an updated run_benchmark function that accepts metrics_recorder
try:
# Try the new signature first
module.run_benchmark(logger, repository, branch, commit_id, commit_msg, global_metrics_recorder)
except TypeError:
# Fall back to the old signature for backward compatibility
logger.warning(
f"Module {module_name} using old run_benchmark signature - database connection will be created per module"
)
module.run_benchmark(logger, repository, branch, commit_id, commit_msg)
successful_benchmarks += 1
except ImportModuleException as e:
logger.error(e)
failed_benchmarks += 1
except Exception as e:
logger.error(f"error running benchmarks for {module_name}: {e}")
failed_benchmarks += 1
# Export CSV results at the end (if enabled)
try:
if generate_csv:
global_metrics_recorder.export_to_csv(csv_output_dir)
logger.info(f"CSV reports have been generated and saved to the {csv_output_dir} directory")
else:
logger.info("CSV generation disabled - no CSV files created (use --csv to enable)")
logger.info(f"Benchmark run completed. Successful: {successful_benchmarks}, Failed: {failed_benchmarks}")
except Exception as e:
logger.error(f"Failed to export CSV results: {e}")
finally:
global_metrics_recorder.close()

View File

@ -19,7 +19,7 @@ backend:
model: meta-llama/Llama-2-7b-hf
cache_implementation: static
torch_compile: true
torch_dtype: float16
dtype: float16
torch_compile_config:
backend: inductor
mode: reduce-overhead

10
benchmark/default.yml Normal file
View File

@ -0,0 +1,10 @@
apiVersion: 1
providers:
- name: 'Transformers Benchmarks'
orgId: 1
type: file
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /etc/grafana/dashboards

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,17 @@
apiVersion: 1
datasources:
- name: grafana-postgresql-datasource
uid: be28nkzirtb0gd
type: postgres
url: $GRAFANA_POSTGRES_DATASOURCE_URL
user: $GRAFANA_POSTGRES_DATASOURCE_USER
secureJsonData:
password: $GRAFANA_POSTGRES_DATASOURCE_PWD
jsonData:
database: metrics
maxOpenConns: 100
maxIdleConns: 100
maxIdleConnsAuto: true
connMaxLifetime: 14400
postgresVersion: 1000
timescaledb: false

View File

@ -3,7 +3,11 @@ import subprocess
def main(config_dir, config_name, args):
subprocess.run(["optimum-benchmark", "--config-dir", f"{config_dir}", "--config-name", f"{config_name}"] + ["hydra/job_logging=disabled", "hydra/hydra_logging=disabled"] + args)
subprocess.run(
["optimum-benchmark", "--config-dir", f"{config_dir}", "--config-name", f"{config_name}"]
+ ["hydra/job_logging=disabled", "hydra/hydra_logging=disabled"]
+ args
)
if __name__ == "__main__":

View File

@ -0,0 +1,6 @@
gpustat==1.1.1
psutil==6.0.0
psycopg2==2.9.9
torch>=2.4.0
hf_xet
pandas>=1.5.0

2
benchmark_v2/.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
benchmark_results/
benchmark_results_profiles/

138
benchmark_v2/README.md Normal file
View File

@ -0,0 +1,138 @@
# Benchmarking v2
A comprehensive benchmarking framework for transformer models that supports multiple execution modes (eager, compiled, kernelized), detailed performance metrics collection, and structured output format.
## Quick Start
### Running All Benchmarks
```bash
# Run all benchmarks with default settings
python run_benchmarks.py
# Specify output directory
python run_benchmarks.py --output-dir my_results
# Run with custom parameters
python run_benchmarks.py \
--warmup-iterations 5 \
--measurement-iterations 10 \
--num-tokens-to-generate 200
```
### Uploading Results to HuggingFace Dataset
You can automatically upload benchmark results to a HuggingFace Dataset for tracking and analysis:
```bash
# Upload to a public dataset with auto-generated run ID
python run_benchmarks.py --upload-to-hub username/benchmark-results
# Upload with a custom run ID for easy identification
python run_benchmarks.py --upload-to-hub username/benchmark-results --run-id experiment_v1
# Upload with custom HuggingFace token (if not set in environment)
python run_benchmarks.py --upload-to-hub username/benchmark-results --token hf_your_token_here
```
**Dataset Directory Structure:**
```
dataset_name/
├── 2025-01-15/
│ ├── runs/ # Non-scheduled runs (manual, PR, etc.)
│ │ └── 123-1245151651/ # GitHub run number and ID
│ │ └── benchmark_results/
│ │ ├── benchmark_summary_20250115_143022.json
│ │ └── model-name/
│ │ └── model-name_benchmark_20250115_143022.json
│ └── benchmark_results_abc123de/ # Scheduled runs (daily CI)
│ ├── benchmark_summary_20250115_143022.json
│ └── model-name/
│ └── model-name_benchmark_20250115_143022.json
└── 2025-01-16/
└── ...
```
**Authentication for Uploads:**
For uploading results, you need a HuggingFace token with write permissions to the target dataset. You can provide the token in several ways (in order of precedence):
1. Command line: `--token hf_your_token_here`
3. Environment variable: `HF_TOKEN`
### Running Specific Benchmarks
```bash
# Include only specific benchmarks
python run_benchmarks.py --include llama
# Exclude specific benchmarks
python run_benchmarks.py --exclude old_benchmark
## Output Format
Results are saved as JSON files with the following structure:
```json
{
"model_name": "llama_2_7b",
"benchmark_scenarios": [
{
"scenario_name": "eager_variant",
"metadata": {
"timestamp": "2025-01-XX...",
"commit_id": "abc123...",
"hardware_info": {
"gpu_name": "NVIDIA A100",
"gpu_memory_total": 40960,
"cpu_count": 64
},
"config": {
"variant": "eager",
"warmup_iterations": 3,
"measurement_iterations": 5
}
},
"measurements": {
"latency": {
"mean": 2.45,
"median": 2.43,
"std": 0.12,
"min": 2.31,
"max": 2.67,
"p95": 2.61,
"p99": 2.65
},
"time_to_first_token": {
"mean": 0.15,
"std": 0.02
},
"tokens_per_second": {
"mean": 87.3,
"unit": "tokens/sec"
}
},
"gpu_metrics": {
"gpu_utilization_mean": 85.2,
"gpu_memory_used_mean": 12450
}
}
]
}
```
### Debug Mode
```bash
python run_benchmarks.py --log-level DEBUG
```
## Contributing
To add new benchmarks:
1. Create a new file in `benches/`
2. Implement the `ModelBenchmark` interface
3. Add a runner function (`run_<benchmark_name>` or `run_benchmark`)
4. run_benchmarks.py

View File

@ -0,0 +1,215 @@
import hashlib
import json
import logging
from typing import Any
KERNELIZATION_AVAILABLE = False
try:
from kernels import Mode, kernelize # noqa: F401
KERNELIZATION_AVAILABLE = True
except ImportError:
pass
logger = logging.getLogger(__name__)
class BenchmarkConfig:
"""Configuration for a single benchmark scenario."""
def __init__(
self,
warmup_iterations: int = 5,
measurement_iterations: int = 20,
gpu_monitoring: bool = False, # False by default because it slows down the benchmark by a lot
batch_size: int = 1,
sequence_length: int = 128,
num_tokens_to_generate: int = 128,
attn_implementation: str = "eager",
sdpa_backend: str | None = None,
compile_mode: str | None = None,
compile_options: dict[str, Any] | None = None,
kernelize: bool = False,
name: str | None = None,
skip_validity_check: bool = False,
) -> None:
# Benchmark parameters
self.warmup_iterations = warmup_iterations
self.measurement_iterations = measurement_iterations
self.gpu_monitoring = gpu_monitoring
# Input parameters
self.batch_size = batch_size
self.sequence_length = sequence_length
self.num_tokens_to_generate = num_tokens_to_generate
# Generation parameters
self.attn_implementation = attn_implementation
self.sdpa_backend = sdpa_backend
# Optimization parameters
self.compile_mode = compile_mode
self.compile_options = compile_options if compile_options is not None else {}
self.kernelize = kernelize
# Constant parameters
self.dtype = "torch.bfloat16"
self.device = "cuda"
self.check_validity(skip_validity_check)
self.name = name if name is not None else self.infer_name()
def check_validity(self, skip_validity_check: bool = False) -> None:
if skip_validity_check:
return
# Flash attention does not support compile mode, so we turn it off # FIXME: it would be better to support it
is_fa = self.attn_implementation == "flash_attention_2"
is_fa |= self.attn_implementation == "sdpa" and self.sdpa_backend == "flash_attention"
if is_fa:
logger.warning("Flash attention does not support compile mode. Turning off compile mode.")
self.compile_mode = None
@property
def hash(self) -> str:
return hashlib.sha256(json.dumps(self.to_dict()).encode()).hexdigest()
def infer_name(self, compact: bool = True) -> str:
"""Infer a human-readable name for the benchmark config, either compact or verbose."""
if compact:
iter_str = f"w{self.warmup_iterations}_i{self.measurement_iterations}"
gpu_monitor_str = "monitored" if self.gpu_monitoring else "unmonitored"
dimensions_str = f"b{self.batch_size}_s{self.sequence_length}_n{self.num_tokens_to_generate}"
attn_code = self.attn_implementation
attn_code += f"_{self.sdpa_backend}" if self.attn_implementation == "sdpa" else ""
compile_str = f"compiled_{self.compile_mode}" if self.compile_mode is not None else "uncompiled"
kernelize_str = "kernelized" if self.kernelize else "unkernelized"
sep = "-"
else:
iter_str = f"{self.warmup_iterations} warmup, {self.measurement_iterations} iterations"
gpu_monitor_str = ("with" if self.gpu_monitoring else "no") + " GPU monitoring"
dimensions_str = f"batch size {self.batch_size}, sequence length {self.sequence_length}, {self.num_tokens_to_generate} generated tokens"
attn_code = f"{self.attn_implementation} attention"
attn_code += f" with {self.sdpa_backend} backend" if self.attn_implementation == "sdpa" else ""
compile_str = "compiled" if self.compile_mode is not None else "not compiled"
kernelize_str = "kernelized" if self.kernelize else "not kernelized"
sep = ", "
return sep.join([iter_str, gpu_monitor_str, dimensions_str, attn_code, compile_str, kernelize_str])
def to_dict(self) -> dict[str, Any]:
return {
"name": self.name,
"warmup_iterations": self.warmup_iterations,
"measurement_iterations": self.measurement_iterations,
"gpu_monitoring": self.gpu_monitoring,
"batch_size": self.batch_size,
"sequence_length": self.sequence_length,
"num_tokens_to_generate": self.num_tokens_to_generate,
"attn_implementation": self.attn_implementation,
"sdpa_backend": self.sdpa_backend,
"compile_mode": self.compile_mode,
"compile_options": self.compile_options | {}, # to avoid inplace modification of the original dict
"kernelize": self.kernelize,
}
@classmethod
def from_dict(cls, data: dict[str, Any], skip_validity_check: bool = False) -> "BenchmarkConfig":
return cls(
warmup_iterations=data.get("warmup_iterations", 5),
measurement_iterations=data.get("measurement_iterations", 20),
gpu_monitoring=data.get("gpu_monitoring", False),
batch_size=data.get("batch_size", 1),
sequence_length=data.get("sequence_length", 128),
num_tokens_to_generate=data.get("num_tokens_to_generate", 128),
attn_implementation=data.get("attn_implementation", "eager"),
sdpa_backend=data.get("sdpa_backend"),
compile_mode=data.get("compile_mode"),
compile_options=data.get("compile_options"),
kernelize=data.get("kernelize", False),
name=data.get("name"),
skip_validity_check=skip_validity_check,
)
def cross_generate_configs(
attn_impl_and_sdpa_backend: list[tuple[str, str | None]],
compiled_mode: list[str | None],
kernelized: list[bool],
warmup_iterations: int = 5,
measurement_iterations: int = 20,
batch_size: int = 1,
sequence_length: int = 128,
num_tokens_to_generate: int = 128,
gpu_monitoring: bool = False, # this slows down the benchmark by a lot so we disable it by default
) -> list[BenchmarkConfig]:
# Create kwargs common to all configs
kwargs = {
"warmup_iterations": warmup_iterations,
"measurement_iterations": measurement_iterations,
"batch_size": batch_size,
"sequence_length": sequence_length,
"num_tokens_to_generate": num_tokens_to_generate,
"gpu_monitoring": gpu_monitoring,
}
# Cross-generate all combinations of attn_implementation, compiled_mode, and kernelized
configs = []
for attn_implementation, sdpa_backend in list(dict.fromkeys(attn_impl_and_sdpa_backend)):
for cm in list(dict.fromkeys(compiled_mode)):
for kernelize_on in list(dict.fromkeys(kernelized)):
config = BenchmarkConfig(
attn_implementation=attn_implementation,
sdpa_backend=sdpa_backend,
compile_mode=cm,
kernelize=kernelize_on,
**kwargs,
)
configs.append(config)
return configs
def generate_all_configs(
warmup_iterations: int = 5,
measurement_iterations: int = 20,
batch_size: int = 1,
sequence_length: int = 128,
num_tokens_to_generate: int = 128,
gpu_monitoring: bool = False,
) -> list[BenchmarkConfig]:
all_attn_implementations = [
("flash_attention_2", None),
("eager", None),
("sdpa", "math"),
("sdpa", "flash_attention"),
("flex_attention", None),
]
return cross_generate_configs(
attn_impl_and_sdpa_backend=all_attn_implementations,
compiled_mode=[None, "default", "reduce-overhead", "max-autotune", "max-autotune-no-cudagraphs"],
kernelized=[False, KERNELIZATION_AVAILABLE],
warmup_iterations=warmup_iterations,
measurement_iterations=measurement_iterations,
batch_size=batch_size,
sequence_length=sequence_length,
num_tokens_to_generate=num_tokens_to_generate,
gpu_monitoring=gpu_monitoring,
)
def generate_main_configs(
warmup_iterations: int = 5,
measurement_iterations: int = 20,
batch_size: int = 1,
sequence_length: int = 128,
num_tokens_to_generate: int = 128,
gpu_monitoring: bool = False,
) -> list[BenchmarkConfig]:
# Create kwargs common to all configs
kwargs = {
"warmup_iterations": warmup_iterations,
"measurement_iterations": measurement_iterations,
"batch_size": batch_size,
"sequence_length": sequence_length,
"num_tokens_to_generate": num_tokens_to_generate,
"gpu_monitoring": gpu_monitoring,
}
return [ # TODO: test max-autotune instead of default
BenchmarkConfig(attn_implementation="flex_attention", compile_mode="default", **kwargs),
BenchmarkConfig(attn_implementation="eager", compile_mode="default", **kwargs),
BenchmarkConfig(attn_implementation="flash_attention_2", **kwargs),
]

View File

@ -0,0 +1,389 @@
import gc
import json
import logging
import os
import pathlib
import re
import time
from contextlib import nullcontext
from datetime import datetime
from queue import Queue
from typing import Any
import torch
from tqdm import trange
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
CompileConfig,
GenerationConfig,
GenerationMixin,
)
from transformers.generation.streamers import BaseStreamer
from .benchmark_config import BenchmarkConfig
from .data_classes import BenchmarkMetadata, BenchmarkResult, GPURawMetrics, pretty_print_dict
from .hardware_metrics import GPUMonitor
try:
from kernels import Mode, kernelize # noqa: F401
except ImportError:
kernelize = None
Mode = None
DEFAULT_PROMPT = "\n".join([
"The French Revolution was a period of political and societal change in France that began with the Estates General of 1789 and ended with the Coup of 18 Brumaire on 9 November 1799.",
"Many of the revolution's ideas are considered fundamental principles of liberal democracy, and its values remain central to modern French political discourse.",
"It was caused by a combination of social, political, and economic factors which the existing regime proved unable to manage.",
"Financial crisis and widespread social distress led to the convocation of the Estates General in May 1789, its first meeting since 1614.",
"The representatives of the Third Estate broke away and re-constituted themselves as a National Assembly in June.",
"The Storming of the Bastille in Paris on 14 July led to a series of radical measures by the Assembly, including the abolition of feudalism, state control over the Catholic Church in France, and issuing the Declaration of the Rights of Man and of the Citizen.",
"The next three years were dominated by a struggle for political control.",
"King Louis XVI's attempted flight to Varennes in June 1791 further discredited the monarchy, and military defeats after the outbreak of the French Revolutionary Wars in April 1792 led to the insurrection of 10 August 1792.",
"As a result, the monarchy was replaced by the French First Republic in September, followed by the execution of Louis XVI himself in January 1793.",
"After another revolt in June 1793, the constitution was suspended, and political power passed from the National Convention to the Committee of Public Safety, dominated by radical Jacobins led by Maximilien Robespierre.",
"About 16,000 people were sentenced by the Revolutionary Tribunal and executed in the Reign of Terror, which ended in July 1794 with the Thermidorian Reaction.",
"Weakened by external threats and internal opposition, the Committee of Public Safety was replaced in November 1795 by the Directory.",
"Its instability ended in the coup of 18 Brumaire and the establishment of the Consulate, with Napoleon Bonaparte as First Consul.",
]) # fmt: skip
def compact_json_numeric_arrays(data: dict):
# Match arrays that contain only numbers (ints/floats), whitespace, commas, and newlines
pattern = r"\[\s*\n\s*((?:\d+(?:\.\d+)?\s*,\s*)*\d+(?:\.\d+)?)\s*\n\s*\]"
def replace_numeric_array(match):
# Get the array content
content = match.group(1)
# Remove extra whitespace but keep commas
compact_content = re.sub(r"\s+", " ", content).strip()
return f"[{compact_content}]"
return re.sub(pattern, replace_numeric_array, json.dumps(data, indent=4, default=str), flags=re.DOTALL)
def get_git_revision() -> str:
base_path = pathlib.Path(__file__).parent.parent.parent
git_dir = base_path / ".git"
with (git_dir / "HEAD").open("r") as head:
ref = head.readline().split(" ")[-1].strip()
with (git_dir / ref).open("r") as git_hash:
return git_hash.readline().strip()
def get_sdpa_backend(backend_name: str | None) -> torch.nn.attention.SDPBackend | None:
"""Get the SDPA backend enum from string name."""
if backend_name is None:
return None
try:
backend_map = {
"math": torch.nn.attention.SDPBackend.MATH,
"flash_attention": torch.nn.attention.SDPBackend.FLASH_ATTENTION,
"efficient_attention": torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION,
"cudnn_attention": torch.nn.attention.SDPBackend.CUDNN_ATTENTION,
}
return backend_map.get(backend_name.lower())
except AttributeError:
# torch.nn.attention.SDPBackend not available in older torch versions
return None
def flush_memory():
"""Flush GPU memory and run garbage collection."""
gc.collect()
# Dynamo resets
torch._dynamo.reset()
torch._dynamo.reset_code_caches()
if hasattr(torch._inductor, "codecache"):
# Clear FX graph cache
if hasattr(torch._inductor.codecache, "FxGraphCache"):
torch._inductor.codecache.FxGraphCache.clear()
# Clear PyCodeCache
if hasattr(torch._inductor.codecache, "PyCodeCache"):
torch._inductor.codecache.PyCodeCache.cache_clear()
# Clear TritonFuture cache (for async compilation)
if hasattr(torch._inductor.codecache, "TritonFuture"):
if hasattr(torch._inductor.codecache.TritonFuture, "_compile_cache"):
torch._inductor.codecache.TritonFuture._compile_cache.clear()
# Clear CUDA cache
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated()
torch.cuda.reset_peak_memory_stats()
torch.cuda.synchronize()
gc.collect()
class BenchmarkStreamer(BaseStreamer):
def __init__(self, **kwargs) -> None:
self.timestamps = []
self.text_queue = Queue()
def put(self, value):
"""Receives tokens and logs the timestamp of the generation."""
self.timestamps.append(time.perf_counter())
def end(self):
self.timestamps.append(time.perf_counter())
def __iter__(self):
return self
def __next__(self):
value = self.text_queue.get(timeout=self.timeout)
if value == self.stop_signal:
raise StopIteration()
else:
return value
class BenchmarkRunner:
"""Main benchmark runner that coordinates benchmark execution."""
def __init__(self, logger: logging.Logger, output_dir: str | None = None, commit_id: str | None = None) -> None:
# Those stay constant for the whole run
self.logger = logger
if output_dir is None:
output_dir = os.path.join(os.path.dirname(os.path.dirname(__file__)), "benchmark_results")
self.output_dir = output_dir
self.commit_id = get_git_revision() if commit_id is None else commit_id
os.makedirs(self.output_dir, exist_ok=True)
self.profile_dir = None
# Attributes that are reset for each model
self._setup_for = ""
# Attributes that are reset for each run
self.model: GenerationMixin | None = None
def cleanup(self) -> None:
del self.model
self.model = None
flush_memory()
def setup_one_run(self, model_id: str, config: BenchmarkConfig) -> None:
# Some attributes only need to be set once per model
if self._setup_for != model_id:
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
# We set the EOS token to the padding token for open-ended generation
self.tokenizer.eos_token = self.tokenizer.pad_token
self._setup_for = model_id
# Prepare inputs
self.inputs = self.tokenizer(
[DEFAULT_PROMPT for _ in range(config.batch_size)],
return_tensors="pt",
max_length=config.sequence_length,
truncation=True,
return_attention_mask=True,
).to(config.device)
self.inputs["use_cache"] = True
# Prepare generation config
gen_config = GenerationConfig(
do_sample=False, top_p=1.0, temperature=1.0, max_new_tokens=config.num_tokens_to_generate
)
# Prepare compile config
if config.compile_mode is not None:
gen_config.compile_config = CompileConfig(mode=config.compile_mode, options=config.compile_options)
gen_config.cache_implementation = "static"
# Load model
self.logger.debug(f"Loading model {model_id} on device {config.device}...")
dtype = getattr(torch, config.dtype.removeprefix("torch."))
self.model = AutoModelForCausalLM.from_pretrained(
model_id, dtype=dtype, attn_implementation=config.attn_implementation, generation_config=gen_config
)
self.model = self.model.eval().to(config.device)
# Kernelize the model if needed
if config.kernelize:
self.model = kernelize(self.model, mode=Mode.INFERENCE)
def run_one_benchmark(self, model_id: str, config: BenchmarkConfig, num_tokens_to_profile: int = 0) -> None:
sdpa_ctx = nullcontext()
if config.attn_implementation == "sdpa":
sdpa_backend = get_sdpa_backend(config.sdpa_backend)
sdpa_ctx = torch.nn.attention.sdpa_kernel(sdpa_backend)
with sdpa_ctx, torch.no_grad():
self.logger.info(f"Running benchmark scenario: {config.name}")
# Quick validation: try one measurement first to see if this scenario works
flush_memory()
e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics = self.time_generate(
max_new_tokens=1, gpu_monitor=None
)
if e2e_latency < 0:
self.logger.warning(f"Skipping config {config.name}: {e2e_latency = } (no GPU monitoring)")
return None
# Warmup runs
self.logger.info(f"Warming up with {config.warmup_iterations} iterations...")
for _ in trange(config.warmup_iterations):
_ = self.time_generate(max_new_tokens=config.num_tokens_to_generate)
self.logger.info("Warmup over.")
# Measurement runs
result = BenchmarkResult()
self.logger.info(f"Benchmarking with {config.measurement_iterations} iterations.")
for _ in trange(config.measurement_iterations):
e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics = self.time_generate(
max_new_tokens=config.num_tokens_to_generate,
gpu_monitor=(GPUMonitor(logger=self.logger) if config.gpu_monitoring else None),
)
result.accumulate(e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics)
self.logger.info("Benchmarking done. Cleaning up.")
# Profile if needed
if num_tokens_to_profile > 0:
self.profile_generate(num_tokens_to_profile, config.name)
return {
"metadata": BenchmarkMetadata(model_id=model_id, commit_id=self.commit_id),
"measurements": result,
"config": config,
}
def time_generate(
self,
max_new_tokens: int,
gpu_monitor: GPUMonitor | None = None,
) -> tuple[float, list[float], str, GPURawMetrics | None]:
"""Time the latency of a call to model.generate() with the given (inputs) and (max_new_tokens)."""
# Prepare gpu monitoring if needed
if gpu_monitor is not None:
gpu_monitor.start()
# Prepare streamer
streamer = BenchmarkStreamer()
# Generate and time
wall_time_0 = time.perf_counter()
outputs = self.model.generate(
**self.inputs,
max_new_tokens=max_new_tokens,
streamer=streamer,
)
wall_time_1 = time.perf_counter()
# Stop gpu monitoring if needed
gpu_metrics = gpu_monitor.stop_and_collect() if gpu_monitor is not None else None
# Check if generation had the right number of tokens
input_tokens = self.inputs["input_ids"].size(-1)
batch_size, output_tokens = outputs.shape
new_tokens = output_tokens - input_tokens
if new_tokens != max_new_tokens:
raise RuntimeError(f"Generated {new_tokens} tokens, expected {max_new_tokens}")
# Decode outputs
decoded_output = self.tokenizer.decode(outputs[0, input_tokens:], skip_special_tokens=True)
shape_and_decoded_output = f"{tuple(outputs.shape)} | {decoded_output}"
# Compute intermediate quantities
e2e_latency = wall_time_1 - wall_time_0
token_generation_times = [t - wall_time_0 for t in streamer.timestamps[1:]]
return e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics
def profile_generate(self, num_tokens_to_profile: int, config_name: str) -> None:
"""Profile the latency of a call to model.generate() with the given (inputs) and (max_new_tokens)."""
profiler = torch.profiler.profile(
activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
record_shapes=True,
)
with profiler as prof:
_ = self.model.generate(
**self.inputs,
max_new_tokens=num_tokens_to_profile,
)
if self.profile_dir is None:
self.profile_dir = self.output_dir + "_profiles"
os.makedirs(self.profile_dir, exist_ok=True)
prof.export_chrome_trace(f"{self.profile_dir}/{config_name}.json")
def run_benchmarks(
self,
model_id: str,
benchmark_configs: list[BenchmarkConfig],
num_tokens_to_profile: int = 0,
pretty_print_summary: bool = True,
) -> dict[str, Any]:
all_results = {}
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
start_time = time.perf_counter()
n_configs = len(benchmark_configs)
for i, config in enumerate(benchmark_configs):
# Handle SDPA backend if not determined by the config (needs to be done before skipping duplicates)
if config.attn_implementation == "sdpa" and config.sdpa_backend is None:
default_backend = "flash_attention" # FIXME: torch has a _cur_sdpa_kernel_backends but it fails
self.logger.warning(f"No SDPA backend provided, using {default_backend} instead.")
config.sdpa_backend = default_backend
# Skip if already run
if config.hash in all_results:
self.logger.info(f"Skipping duplicate config {config.name} for model {model_id} ({i + 1}/{n_configs})")
continue
# Otherwise, run the benchmark
self.setup_one_run(model_id, config)
self.logger.info(
f"Running benchmark of model {model_id} with scenario: {config.name} ({i + 1}/{n_configs})"
)
# Launch benchmark in a try/except block to avoid stopping the whole run if one benchmark fails
try:
results = self.run_one_benchmark(model_id, config, num_tokens_to_profile)
if results is not None:
all_results[config.hash] = results
except Exception as e:
self.logger.error(f"Error running with scenario: {config.name}:\n{repr(e)}")
# Cleanup model and save results
self.cleanup()
self.save_results(model_id, all_results, timestamp=timestamp)
if pretty_print_summary:
print()
print("=" * 100)
print(f"Finished benchmarks in {time.perf_counter() - start_time:.2f} seconds")
print(f"Total number of benchmarks: {len(all_results)}")
if len(all_results) > 0:
print("First run metadata:")
first_key = list(all_results.keys())[0]
first_metadata = all_results[first_key]["metadata"].to_dict()
hardware_info = first_metadata.pop("hardware_info")
pretty_print_dict(first_metadata | hardware_info, tabs=1)
for result in all_results.values():
print("=" * 100)
print(f"Config: {result['config'].infer_name(compact=False)}\n")
result["measurements"].pprint(batch_size=result["config"].batch_size, tabs=1)
print("=" * 100)
return all_results
def save_results(self, model_name: str, results: dict, timestamp: str = "") -> str:
"""Save benchmark results to JSON file."""
# Create model-specific subdirectory
model_name = model_name.replace("/", "_")
model_dir = os.path.join(self.output_dir, model_name)
os.makedirs(model_dir, exist_ok=True)
# Create filename with timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") if not timestamp else timestamp
filename = f"{model_name}_benchmark_{timestamp}.json"
filepath = os.path.join(model_dir, filename)
# Convert results to dict
converted_results = {}
for cfg_hash in results.keys():
converted_results[cfg_hash] = {
"metadata": results[cfg_hash]["metadata"].to_dict(),
"measurements": results[cfg_hash]["measurements"].to_dict(),
"config": results[cfg_hash]["config"].to_dict(),
}
# Save to JSON file
with open(filepath, "w") as f:
f.write(compact_json_numeric_arrays(converted_results))
self.logger.info(f"Results saved to {filepath}")
return filepath

View File

@ -0,0 +1,160 @@
from dataclasses import dataclass
from datetime import datetime
from typing import Any
import numpy as np
from .hardware_metrics import GPURawMetrics, HardwareInfo
def compute_basic_statistics(measurements: list[float]) -> dict[str, float]:
return {
"avg": np.mean(measurements),
"std": np.std(measurements),
"min": np.min(measurements),
"med": np.median(measurements),
"max": np.max(measurements),
"p95": np.percentile(measurements, 95),
}
def add_unit_to_duration(stats: dict[str, float]) -> dict[str, str]:
for key in list(stats.keys()):
value = stats[key]
if value > 3600:
stats[key] = f"{(value / 3600):.2f}hr"
elif value > 60:
stats[key] = f"{(value / 60):.2f}min"
elif value > 1:
stats[key] = f"{value:.2f}s"
elif value > 1e-3:
stats[key] = f"{(value * 1e3):.2f}ms"
elif value > 1e-6:
stats[key] = f"{(value * 1e6):.2f}us"
else:
stats[key] = f"{(value * 1e9):.2f}ns"
return stats
def equalize_lengths_and_collate(stats: list[dict[str, str]]) -> list[str]:
keys = ["avg", "std", "min", "med", "max", "p95"]
for key in keys:
max_length = max(len(stat[key]) for stat in stats)
for stat in stats:
stat[key] = stat[key].ljust(max_length, " ")
return [" ".join([f"{key}={stat[key]}" for key in keys]) for stat in stats]
def pretty_print_dict(data: dict[str, Any], tabs: int = 0) -> None:
max_key_length = max([len(key) for key in data.keys()])
for key, value in data.items():
tabs_str = " " * tabs
padded_key = key.ljust(max_key_length + 1, ".")
print(f"{tabs_str}{padded_key}: {value}")
@dataclass
class BenchmarkMetadata:
"""Metadata collected for each benchmark run."""
model_id: str
timestamp: str
commit_id: str
hardware_info: HardwareInfo
def __init__(self, model_id: str, commit_id: str):
self.model_id = model_id
self.timestamp = datetime.utcnow().isoformat()
self.commit_id = commit_id
self.hardware_info = HardwareInfo()
def to_dict(self) -> dict[str, Any]:
return {
"timestamp": self.timestamp,
"commit_id": self.commit_id,
"hardware_info": self.hardware_info.to_dict(),
}
class BenchmarkResult:
"""Result from a series of benchmark runs."""
def __init__(self) -> None:
self.e2e_latency = []
self.token_generation_times = [] # time at which each token was generated (relative to start of the generation)
self.shape_and_decoded_outputs = []
self.gpu_metrics = []
def accumulate(
self,
e2e_latency: float,
token_generation_times: list[float],
shape_and_decoded_output: str,
gpu_metrics: GPURawMetrics | None,
) -> None:
self.e2e_latency.append(e2e_latency)
self.token_generation_times.append(token_generation_times)
self.shape_and_decoded_outputs.append(shape_and_decoded_output)
self.gpu_metrics.append(gpu_metrics)
def to_dict(self) -> dict[str, None | int | float]:
# Save GPU metrics as None if it contains only None values
if all(gm is None for gm in self.gpu_metrics):
gpu_metrics = None
else:
gpu_metrics = [gm.to_dict() for gm in self.gpu_metrics]
return {
"e2e_latency": self.e2e_latency,
"token_generation_times": self.token_generation_times,
"shape_and_decoded_outputs": self.shape_and_decoded_outputs,
"gpu_metrics": gpu_metrics,
}
@classmethod
def from_dict(cls, data: dict[str, None | int | float]) -> "BenchmarkResult":
# Handle GPU metrics, which is saved as None if it contains only None values
if data["gpu_metrics"] is None:
gpu_metrics = [None for _ in range(len(data["e2e_latency"]))]
else:
gpu_metrics = [GPURawMetrics.from_dict(gm) for gm in data["gpu_metrics"]]
# Create a new instance and accumulate the data
new_instance = cls()
for i in range(len(data["e2e_latency"])):
new_instance.accumulate(
e2e_latency=data["e2e_latency"][i],
token_generation_times=data["token_generation_times"][i],
shape_and_decoded_output=data["shape_and_decoded_outputs"][i],
gpu_metrics=gpu_metrics[i],
)
return new_instance
def get_measured_ttft(self) -> list[float]:
return [dt[0] for dt in self.token_generation_times if len(dt) > 0]
def get_measured_itl(self) -> list[float]:
return [(dt[-1] - dt[0]) / (len(dt) - 1) for dt in self.token_generation_times if len(dt) > 1]
def get_throughput(self, batch_size: int) -> float:
return [
batch_size * len(dt) / e2e_latency
for e2e_latency, dt in zip(self.e2e_latency, self.token_generation_times)
]
def pprint(self, batch_size: int = 0, tabs: int = 0) -> None:
stats_to_collate = [
add_unit_to_duration(compute_basic_statistics(self.e2e_latency)),
add_unit_to_duration(compute_basic_statistics(self.get_measured_ttft())),
add_unit_to_duration(compute_basic_statistics(self.get_measured_itl())),
]
if batch_size > 0:
throughput_stats = compute_basic_statistics(self.get_throughput(batch_size))
stats_to_collate.append({key: f"{value:.2f}tok/s" for key, value in throughput_stats.items()})
collated_stats = equalize_lengths_and_collate(stats_to_collate)
dict_to_pprint = {
"E2E Latency": collated_stats[0],
"Time to First Token": collated_stats[1],
"Inter-Token Latency": collated_stats[2],
}
if batch_size > 0:
dict_to_pprint["Throughput"] = collated_stats[3]
pretty_print_dict(dict_to_pprint, tabs=tabs)

View File

@ -0,0 +1,171 @@
import json
import logging
import subprocess
import sys
import threading
import time
from dataclasses import dataclass
from enum import Enum
from logging import Logger
import gpustat
import psutil
import torch
# Data class to hold the hardware information
def get_device_name_and_memory_total() -> tuple[str, float]:
"""Returns the name and memory total of GPU 0."""
device_name = torch.cuda.get_device_properties(0).name
device_memory_total = torch.cuda.get_device_properties(0).total_memory / 1024**3
return device_name, device_memory_total
class HardwareInfo:
"""A class to hold information about the hardware."""
def __init__(self) -> None:
# Retrieve GPU stats
try:
self.gpu_name, self.gpu_memory_total_gb = get_device_name_and_memory_total()
except Exception:
self.gpu_name, self.gpu_memory_total_gb = None, None
# Retrieve python, torch and CUDA version
self.python_version = f"{sys.version.split()[0]}"
self.torch_version = torch.__version__
if hasattr(torch, "cuda") and torch.cuda.is_available():
self.cuda_version = torch.version.cuda
else:
self.cuda_version = None
# Retrieve general hardware information
self.cpu_count = psutil.cpu_count()
self.memory_total_mb = int(psutil.virtual_memory().total / (1024 * 1024))
def to_dict(self) -> dict[str, None | int | float | str]:
return {
"gpu_name": self.gpu_name,
"gpu_memory_total_gb": self.gpu_memory_total_gb,
"python_version": self.python_version,
"torch_version": self.torch_version,
}
# Functions to get information about the GPU
def get_amd_gpu_stats() -> tuple[int, float]:
"""Returns the utilization and memory used of an AMD GPU, both in percent"""
rocm_smi_output = subprocess.check_output(["rocm-smi", "--json", "--showuse", "--showmeminfo", "VRAM"])
gpu_stats = json.loads(rocm_smi_output.decode("utf-8"))
gpu_stats = [
(card_id, stats["GPU use (%)"], stats["VRAM Total Used Memory (B)"]) for card_id, stats in gpu_stats.items()
]
gpu_stats.sort(key=lambda x: x[1], reverse=True)
return int(gpu_stats[0][1]), float(gpu_stats[0][2]) / 1024**3
def get_nvidia_gpu_stats() -> tuple[int, float]:
"""Returns the utilization and memory used of an NVIDIA GPU, both in percent"""
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_stats = gpu_stats[0]
return int(gpu_stats["utilization.gpu"]), float(gpu_stats["memory.used"]) / 1024**3
class GPUStatsCollector:
"""A class to get statistics about the GPU. It serves as a wrapper that holds the GPU total memory and its name,
which is used to call the right function to get the utilization and memory used."""
def __init__(self) -> None:
self.device_name, self.device_memory_total = get_device_name_and_memory_total()
# Monkey patch the get_utilization_and_memory_used method based on the GPU type
if "amd" in self.device_name.lower():
self.get_utilization_and_memory_used = get_amd_gpu_stats
elif "nvidia" in self.device_name.lower():
self.get_utilization_and_memory_used = get_nvidia_gpu_stats
else:
raise RuntimeError(f"Unsupported GPU: {self.device_name}")
def get_measurements(self) -> tuple[int, float]:
"""Get the utilization and memory used of the GPU, both in percent"""
raise NotImplementedError("This method is meant to be monkey patched during __init__")
# Simple data classes to hold the raw GPU metrics
class GPUMonitoringStatus(Enum):
"""Status of GPU monitoring."""
SUCCESS = "success"
FAILED = "failed"
NO_GPUS_AVAILABLE = "no_gpus_available"
NO_SAMPLES_COLLECTED = "no_samples_collected"
@dataclass
class GPURawMetrics:
"""Raw values for GPU utilization and memory used."""
utilization: list[float] # in percent
memory_used: list[float] # in GB
timestamps: list[float] # in seconds
timestamp_0: float # in seconds
monitoring_status: GPUMonitoringStatus
def to_dict(self) -> dict[str, None | int | float | str]:
return {
"utilization": self.utilization,
"memory_used": self.memory_used,
"timestamps": self.timestamps,
"timestamp_0": self.timestamp_0,
"monitoring_status": self.monitoring_status.value,
}
# Main class, used to monitor the GPU utilization during benchmark execution
class GPUMonitor:
"""Monitor GPU utilization during benchmark execution."""
def __init__(self, sample_interval_sec: float = 0.1, logger: Logger | None = None):
self.sample_interval_sec = sample_interval_sec
self.logger = logger if logger is not None else logging.getLogger(__name__)
self.num_available_gpus = torch.cuda.device_count()
if self.num_available_gpus == 0:
raise RuntimeError("No GPUs detected by torch.cuda.device_count().")
self.gpu_stats_getter = GPUStatsCollector()
def start(self):
"""Start monitoring GPU metrics."""
# Clear the stop event to enable monitoring
self.stop_event = threading.Event()
self.gpu_utilization = []
self.gpu_memory_used = []
self.timestamps = []
self.thread = threading.Thread(target=self._monitor_loop)
self.thread.start()
self.logger.debug("GPU monitoring started")
def stop_and_collect(self) -> GPURawMetrics:
"""Stop monitoring and return collected metrics."""
self.stop_event.set()
self.thread.join()
if self.gpu_utilization:
timestamp_0 = self.timestamps[0]
metrics = GPURawMetrics(
utilization=self.gpu_utilization,
memory_used=self.gpu_memory_used,
timestamps=[t - timestamp_0 for t in self.timestamps],
timestamp_0=timestamp_0,
monitoring_status=GPUMonitoringStatus.SUCCESS,
)
self.logger.debug(f"GPU monitoring completed: {len(self.gpu_utilization)} samples collected")
else:
metrics = GPURawMetrics(monitoring_status=GPUMonitoringStatus.NO_SAMPLES_COLLECTED)
return metrics
def _monitor_loop(self):
"""Background monitoring loop using threading.Event for communication."""
while not self.stop_event.is_set():
utilization, memory_used = self.gpu_stats_getter.get_utilization_and_memory_used()
self.gpu_utilization.append(utilization)
self.gpu_memory_used.append(memory_used)
self.timestamps.append(time.time())
if self.stop_event.wait(timeout=self.sample_interval_sec):
break

View File

@ -0,0 +1,7 @@
numpy>=1.21.0
psutil>=5.8.0
gpustat>=1.0.0
torch>=2.0.0
transformers>=4.30.0
datasets>=2.10.0
huggingface_hub>=0.16.0

116
benchmark_v2/run_benchmarks.py Executable file
View File

@ -0,0 +1,116 @@
#!/usr/bin/env python3
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Top-level benchmarking script that automatically discovers and runs all benchmarks
in the ./benches directory, organizing outputs into model-specific subfolders.
"""
import argparse
import logging
import sys
import uuid
from framework.benchmark_config import BenchmarkConfig, generate_all_configs, generate_main_configs
from framework.benchmark_runner import BenchmarkRunner
if __name__ == "__main__":
# Parse arguments
parser = argparse.ArgumentParser()
parser.add_argument("--output-dir", type=str, default=None, help="Output dir for benchmark results")
parser.add_argument("--log-level", type=str, choices=["DEBUG", "INFO", "WARNING", "ERROR"], default="INFO")
parser.add_argument("--model-id", type=str, help="Specific model ID to benchmark (if supported by benchmarks)")
parser.add_argument("--warmup", type=int, default=3, help="Number of warmup iterations")
parser.add_argument("--iterations", type=int, default=10, help="Number of measurement iterations")
parser.add_argument("--batch-size", "-b", type=int, nargs="+", help="Batch size")
parser.add_argument("--sequence-length", "-s", type=int, nargs="+", help="Sequence length")
parser.add_argument("--num-tokens-to-generate", "-n", type=int, nargs="+", help="Number of tokens to generate")
parser.add_argument("--cross-generate", action="store_true", help="Cross-generate all combinations of configs")
parser.add_argument("--num-tokens-to-profile", "-p", type=int, default=0, help="Number of tokens to profile")
parser.add_argument("--commit-id", type=str, help="Git commit ID (if not provided, will auto-detect from git)")
args = parser.parse_args()
# Setup logging
benchmark_run_uuid = str(uuid.uuid4())[:8]
numeric_level = getattr(logging, args.log_level.upper())
handlers = [logging.StreamHandler(sys.stdout)]
logging.basicConfig(
level=numeric_level, format="[%(levelname)s - %(asctime)s] %(name)s: %(message)s", handlers=handlers
)
logger = logging.getLogger("benchmark_v2")
logger.info("Starting benchmark discovery and execution")
logger.info(f"Benchmark run UUID: {benchmark_run_uuid}")
logger.info(f"Output directory: {args.output_dir}")
# Error out if one of the arguments is not provided
if len(args.batch_size) * len(args.sequence_length) * len(args.num_tokens_to_generate) == 0:
raise ValueError(
"At least one of the arguments --batch-size, --sequence-length, or --num-tokens-to-generate is required"
)
# If there is only one (batch_size, sequence_length, num_tokens_to_generate), we benchmark across configs
elif len(args.batch_size) * len(args.sequence_length) * len(args.num_tokens_to_generate) == 1:
if args.cross_generate:
benchmark_configs = generate_all_configs(
warmup_iterations=args.warmup,
measurement_iterations=args.iterations,
batch_size=args.batch_size[0],
sequence_length=args.sequence_length[0],
num_tokens_to_generate=args.num_tokens_to_generate[0],
)
else:
benchmark_configs = generate_main_configs(
warmup_iterations=args.warmup,
measurement_iterations=args.iterations,
batch_size=args.batch_size[0],
sequence_length=args.sequence_length[0],
num_tokens_to_generate=args.num_tokens_to_generate[0],
)
# Otherwise, we benchmark across all combinations of dimensions
else:
main_config = generate_main_configs(
warmup_iterations=args.warmup,
measurement_iterations=args.iterations,
batch_size=args.batch_size[0],
sequence_length=args.sequence_length[0],
num_tokens_to_generate=args.num_tokens_to_generate[0],
)[0]
benchmark_configs = []
for num_tokens_to_generate in args.num_tokens_to_generate:
for sequence_length in args.sequence_length:
for batch_size in args.batch_size:
cfg_dict = main_config.to_dict()
cfg_dict["batch_size"] = batch_size
cfg_dict["sequence_length"] = sequence_length
cfg_dict["num_tokens_to_generate"] = num_tokens_to_generate
cfg_dict.pop("name")
benchmark_configs.append(BenchmarkConfig.from_dict(cfg_dict))
runner = BenchmarkRunner(logger, args.output_dir, args.commit_id)
results = runner.run_benchmarks(
args.model_id,
benchmark_configs,
args.num_tokens_to_profile,
pretty_print_summary=True,
)
# runner.save_results(args.model_id, results)

View File

@ -16,6 +16,7 @@
# by pytest before any tests are run
import doctest
import os
import sys
import warnings
from os.path import abspath, dirname, join
@ -23,12 +24,18 @@ from os.path import abspath, dirname, join
import _pytest
import pytest
from transformers.testing_utils import HfDoctestModule, HfDocTestParser
from transformers.testing_utils import (
HfDoctestModule,
HfDocTestParser,
is_torch_available,
patch_testing_methods_to_collect_info,
patch_torch_compile_force_graph,
)
NOT_DEVICE_TESTS = {
"test_tokenization",
"test_processor",
"test_tokenization_mistral_common",
"test_processing",
"test_beam_constraints",
"test_configuration_utils",
@ -46,12 +53,7 @@ NOT_DEVICE_TESTS = {
"test_keep_in_fp32_modules",
"test_gradient_checkpointing_backward_compatibility",
"test_gradient_checkpointing_enable_disable",
"test_save_load_fast_init_from_base",
"test_fast_init_context_manager",
"test_fast_init_tied_embeddings",
"test_save_load_fast_init_to_base",
"test_torch_save_load",
"test_initialization",
"test_forward_signature",
"test_model_get_set_embeddings",
"test_model_main_input_name",
@ -61,17 +63,12 @@ NOT_DEVICE_TESTS = {
"test_load_save_without_tied_weights",
"test_tied_weights_keys",
"test_model_weights_reload_no_missing_tied_weights",
"test_pt_tf_model_equivalence",
"test_mismatched_shapes_have_properly_initialized_weights",
"test_matched_shapes_have_loaded_weights_when_some_mismatched_shapes_exist",
"test_can_load_ignoring_mismatched_shapes",
"test_model_is_small",
"test_tf_from_pt_safetensors",
"test_flax_from_pt_safetensors",
"ModelTest::test_pipeline_", # None of the pipeline tests from PipelineTesterMixin (of which XxxModelTest inherits from) are running on device
"ModelTester::test_pipeline_",
"/repo_utils/",
"/utils/",
"/agents/",
}
# allow having multiple repository checkouts and not needing to remember to rerun
@ -85,17 +82,14 @@ warnings.simplefilter(action="ignore", category=FutureWarning)
def pytest_configure(config):
config.addinivalue_line(
"markers", "is_pt_tf_cross_test: mark test to run only when PT and TF interactions are tested"
)
config.addinivalue_line(
"markers", "is_pt_flax_cross_test: mark test to run only when PT and FLAX interactions are tested"
)
config.addinivalue_line("markers", "is_pipeline_test: mark test to run only when pipelines are tested")
config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment")
config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate")
config.addinivalue_line("markers", "agent_tests: mark the agent tests that are run on their specific schedule")
config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu")
config.addinivalue_line("markers", "torch_compile_test: mark test which tests torch compile functionality")
config.addinivalue_line("markers", "torch_export_test: mark test which tests torch export functionality")
os.environ["DISABLE_SAFETENSORS_CONVERSION"] = "true"
def pytest_collection_modifyitems(items):
@ -140,3 +134,18 @@ class CustomOutputChecker(OutputChecker):
doctest.OutputChecker = CustomOutputChecker
_pytest.doctest.DoctestModule = HfDoctestModule
doctest.DocTestParser = HfDocTestParser
if is_torch_available():
import torch
# The flag below controls whether to allow TF32 on cuDNN. This flag defaults to True.
# We set it to `False` for CI. See https://github.com/pytorch/pytorch/issues/157274#issuecomment-3090791615
torch.backends.cudnn.allow_tf32 = False
# patch `torch.compile`: if `TORCH_COMPILE_FORCE_FULLGRAPH=1` (or values considered as true, e.g. yes, y, etc.),
# the patched version will always run with `fullgraph=True`.
patch_torch_compile_force_graph()
if os.environ.get("PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS", "").lower() in ("yes", "true", "on", "y", "1"):
patch_testing_methods_to_collect_info()

9
docker/README.md Normal file
View File

@ -0,0 +1,9 @@
# Dockers for `transformers`
In this folder you will find various docker files, and some subfolders.
- dockerfiles (ex: `consistency.dockerfile`) present under `~/docker` are used for our "fast" CIs. You should be able to use them for tasks that only need CPU. For example `torch-light` is a very light weights container (703MiB).
- subfolders contain dockerfiles used for our `slow` CIs, which *can* be used for GPU tasks, but they are **BIG** as they were not specifically designed for a single model / single task. Thus the `~/docker/transformers-pytorch-gpu` includes additional dependencies to allow us to run ALL model tests (say `librosa` or `tesseract`, which you do not need to run LLMs)
Note that in both case, you need to run `uv pip install -e .`, which should take around 5 seconds. We do it outside the dockerfile for the need of our CI: we checkout a new branch each time, and the `transformers` code is thus updated.
We are open to contribution, and invite the community to create dockerfiles with potential arguments that properly choose extras depending on the model's dependencies! :hugs:

View File

@ -4,13 +4,11 @@ USER root
ARG REF=main
RUN apt-get update && apt-get install -y time git g++ pkg-config make git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN pip install --no-cache-dir --upgrade 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
# tensorflow pin matching setup.py
RUN pip install uv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN uv pip install --no-cache-dir --upgrade 'torch<2.9' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "tensorflow-cpu<2.16" "tf-keras<2.16"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,quality,testing,torch-speech,vision]"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[quality,testing,torch-speech,vision]"
RUN git lfs install
RUN pip uninstall -y transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,9 +1,10 @@
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake wget xz-utils build-essential g++5 libprotobuf-dev protobuf-compiler
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake wget xz-utils build-essential g++5 libprotobuf-dev protobuf-compiler git-lfs curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN wget https://github.com/ku-nlp/jumanpp/releases/download/v2.0.0-rc3/jumanpp-2.0.0-rc3.tar.xz
RUN tar xvf jumanpp-2.0.0-rc3.tar.xz
@ -14,13 +15,21 @@ RUN mv catch.hpp ../libs/
RUN cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local
RUN make install -j 10
WORKDIR /
RUN uv pip install --no-cache --upgrade 'torch' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "transformers[ja,testing,sentencepiece,jieba,spacy,ftfy,rjieba]" unidic unidic-lite
RUN uv pip install --no-cache --upgrade 'torch<2.9' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ja,testing,sentencepiece,spacy,ftfy,rjieba]" unidic unidic-lite
# spacy is not used so not tested. Causes to failures. TODO fix later
RUN python3 -m unidic download
RUN pip uninstall -y transformers
RUN uv run python -m unidic download
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
RUN apt remove -y g++ cmake xz-utils libprotobuf-dev protobuf-compiler
RUN apt remove -y g++ cmake xz-utils libprotobuf-dev protobuf-compiler

View File

@ -1,12 +0,0 @@
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git
RUN apt-get install -y g++ cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv
RUN uv pip install --no-cache-dir -U pip setuptools albumentations seqeval
RUN pip install --upgrade --no-cache-dir "transformers[tf-cpu,sklearn,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3"
RUN pip uninstall -y transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,11 +1,19 @@
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git-lfs ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "transformers[sklearn,sentencepiece,vision,testing]" seqeval albumentations jiwer
RUN pip uninstall -y transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch<2.9' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]" seqeval albumentations jiwer
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -2,16 +2,23 @@ FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git libgl1-mesa-glx libgl1 g++ tesseract-ocr
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git libgl1 g++ tesseract-ocr git-lfs curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch<2.9' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps timm accelerate
RUN pip install -U --upgrade-strategy eager --no-cache-dir pytesseract python-Levenshtein opencv-python nltk
RUN uv pip install -U --no-cache-dir pytesseract python-Levenshtein opencv-python nltk
# RUN uv pip install --no-cache-dir natten==0.15.1+torch210cpu -f https://shi-labs.com/natten/wheels
RUN pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[testing, vision]" 'scikit-learn' 'torch-stft' 'nose' 'dataset'
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[testing, vision]" 'scikit-learn' 'torch-stft' 'nose' 'dataset'
# RUN git clone https://github.com/facebookresearch/detectron2.git
# RUN python3 -m pip install --no-cache-dir -e detectron2
RUN pip install 'git+https://github.com/facebookresearch/detectron2.git@92ae9f0b92aba5867824b4f12aa06a22a60a45d3'
RUN pip uninstall -y transformers
RUN uv pip install 'git+https://github.com/facebookresearch/detectron2.git@92ae9f0b92aba5867824b4f12aa06a22a60a45d3' --no-build-isolation
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,10 +0,0 @@
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git g++ cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip install --no-cache-dir "scipy<1.13" "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,testing,sentencepiece,flax-speech,vision]"
RUN pip uninstall -y transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,10 +0,0 @@
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake g++
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3" tensorflow_probability
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -2,10 +2,17 @@ FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch<2.9' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]"
RUN pip uninstall -y transformers
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers

View File

@ -2,8 +2,8 @@ FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y time git
RUN apt-get update && apt-get install -y time git
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv venv
RUN pip install uv
RUN uv pip install --no-cache-dir -U pip setuptools GitPython "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ruff]" urllib3
RUN apt-get install -y jq curl && apt-get clean && rm -rf /var/lib/apt/lists/*
RUN apt-get install -y jq curl && apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,12 +0,0 @@
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ pkg-config openssh-client git
RUN apt-get install -y cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3"
RUN pip uninstall -y transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,16 +0,0 @@
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-deps accelerate
RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir "scipy<1.13" "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,audio,sklearn,sentencepiece,vision,testing]"
# RUN pip install --no-cache-dir "scipy<1.13" "transformers[flax,testing,sentencepiece,flax-speech,vision]"
RUN pip uninstall -y transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -2,10 +2,16 @@ FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git-lfs ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch<2.9' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing,tiktoken]"
RUN pip uninstall -y transformers
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing,tiktoken,num2words,video]"
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers

Some files were not shown because too many files have changed in this diff Show More